New multi-line regexp and new regexp syntax.

2002-06-13 11:15:46 +00:00 · 2002-06-13 11:15:46 +00:00 · 6861f0e327
commit 6861f0e327
parent 292c80bc08
3 changed files with 68 additions and 23 deletions
--- a/etc/NEWS
+++ b/etc/NEWS
@ -569,6 +569,23 @@ comparison.

 ** Etags changes.

+*** New syntax for regular expressions, multi-line regular expressions.
+The syntax --ignore-case-regexp=/REGEX/NAME/ is now undocumented and
+retained only for backward compatibility.  The new equivalent syntax is
+--regex=/REGEX/NAME/i.  More generally, it is --regex=/REGEX/NAME/MODS,
+where `/NAME' is optional, as usual, and MODS is a string of 0 or more
+characters among `i' (ignore case), `m' (multi-line) and `s'
+(single-line).  The `m' and `s' modifiers behave as in Perl regular
+expressions: `m' allows regexps to match more than one line, while `s'
+(which implies `m') means that `.' matches newlines.  The ability to
+span newlines allows writing of much more powerful regular expressions
+and rapid prototyping for tagging new languages.
+
+*** Regular expressions can use char escape sequences as in Gcc
+The escaped character sequence \a, \b, \d, \e, \f, \n, \r, \t, \v,
+respectively, stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
+CR, TAB, VT,
+
 *** In Prolog, etags creates tags for rules in addition to predicates.

 *** In Perl, packages are tags.
@ -596,9 +613,6 @@ be used (only once) in place of a file name on the command line.  Etags
 will read from standard input and mark the produced tags as belonging to
 the file FILE.

-*** Regular expressions can use char escape sequences as in Gcc
-These are the escapes \a, \b, \d, \e, \f, \n, \r, \t, \v.
-
 +++
 ** The command line option --no-windows has been changed to
 --no-window-system.  The old one still works, but is deprecated.
--- a/etc/etags.1
+++ b/etc/etags.1
@ -22,7 +22,6 @@ etags, ctags \- generate tag file for Emacs, vi
 [\|\-\-ignore\-indentation\|] [\|\-\-language=\fIlanguage\fP\|]
 [\|\-\-members\|] [\|\-\-output=\fItagfile\fP\|]
 [\|\-\-regex=\fIregexp\fP\|] [\|\-\-no\-regex\|]
-[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
 [\|\-\-help\|] [\|\-\-version\|]
 \fIfile\fP .\|.\|.

@ -36,7 +35,6 @@ etags, ctags \- generate tag file for Emacs, vi
 [\|\-\-globals\|] [\|\-\-ignore\-indentation\|]
 [\|\-\-language=\fIlanguage\fP\|] [\|\-\-members\|]
 [\|\-\-output=\fItagfile\fP\|] [\|\-\-regex=\fIregexp\fP\|]
-[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
 [\|\-\-typedefs\|] [\|\-\-typedefs\-and\-c++\|]
 [\|\-\-update\|] [\|\-\-no\-warn\|]
 [\|\-\-help\|] [\|\-\-version\|]
@ -149,27 +147,32 @@ Explicit name of file for tag table; overrides default \fBTAGS\fP or
 \fBtags\fP.   (But ignored with \fB\-v\fP or \fB\-x\fP.)
 .TP
 \fB\-r\fP \fIregexp\fP, \fB\-\-regex=\fIregexp\fP
-.TP
-\fB\-\-ignore\-case\-regex=\fIregexp\fP
-Make tags based on regexp matching for each line of the files
-following this option, in addition to the tags made with the standard
-parsing based on language.  When using \fB\-\-regex\fP, case is
-significant, while it is not with \fB\-\-ignore\-case\-regex\fP. May
-be freely intermixed with filenames and the \fB\-R\fP option.  The
-regexps are cumulative, i.e. each option will add to the previous
-ones.  The regexps are of the form:
+
+Make tags based on regexp matching for the files following this option,
+in addition to the tags made with the standard parsing based on
+language. May be freely intermixed with filenames and the \fB\-R\fP
+option.  The regexps are cumulative, i.e. each such option will add to
+the previous ones.  The regexps are of the form:
 .br
-	\fB/\fP\fItagregexp\fP[\fB/\fP\fInameregexp\fP]\fB/\fP
+	\fB/\fP\fItagregexp/\fP[\fInameregexp\fP\fB/\fP]\fImodifiers\fP
 .br

-where \fItagregexp\fP is used to match the lines that must be tagged.
-It should not match useless characters.  If the match is
-such that more characters than needed are unavoidably matched by
-\fItagregexp\fP, it may be useful to add a \fInameregexp\fP, to
-narrow down the tag scope.  \fBctags\fP ignores regexps without a
-\fInameregexp\fP.  The syntax of regexps is the same as in emacs.
-The following character escape sequences are supported:
-\\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v.
+where \fItagregexp\fP is used to match the tag.  It should not match
+useless characters.  If the match is such that more characters than
+needed are unavoidably matched by \fItagregexp\fP, it may be useful to
+add a \fInameregexp\fP, to narrow down the tag scope.  \fBctags\fP
+ignores regexps without a \fInameregexp\fP.  The syntax of regexps is
+the same as in emacs.  The following character escape sequences are
+supported: \\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v, which
+respectively stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
+CR, TAB, VT.
+.br
+The \fImodifiers\fP are a sequence of 0 or more characters among
+\fIi\fP, which means to ignore case when matching; \fIm\fP, which means
+that the \fItagregexp\fP will be matched against the whole file contents
+at once, rather than line by line, and the matching sequence can match
+multiple lines; and \fIs\fP, which implies \fIm\fP and means that the
+dot character in \fItagregexp\fP matches the newline char as well.

 .br
 Here are some examples.  All the regexps are quoted to protect them
--- a/lib-src/ChangeLog
+++ b/lib-src/ChangeLog
@ -1,3 +1,31 @@
+2002-06-12  Francesco Potorti`  <pot@gnu.org>
+
+	* etags.c: New multi-line regexp and new regexp syntax.
+	(arg_type): at_icregexp label removed (obsolete).
+	(pattern): New member multi_line for multi-line regexps.
+	(filebuf): A global buffer containing the whole file as a string
+	for multi-line regexp matching.
+	(need_filebuf): Global flag raised if multi-line regexps used.
+	(print_help): Document new regexp modifiers, remove references to
+	obsolete option --ignore-case-regexp.
+	(main): Do not set regexp syntax and translation table here.
+	(main): Treat -c option as a backward compatibility hack.
+	(main, find_entries): Init and free filebuf.
+	(find_entries): Call regex_tag_multiline after the regular parser.
+	(scan_separators): Check for untermintaed regexp and return NULL.
+	(analyse_regex, add_regex): Remove the ignore_case argument, which
+	is now a modifier to the regexp.  All callers changed.
+	(add_regex): Manage the regexp modifiers.
+	(regex_tag_multiline): New function.  Reads from filebuf.
+	(readline_internal): If necessary, copy the whole file into filebuf.
+	(readline): Skip multi-line regexps, leave them to regex_tag_multiline.
+
+2002-06-11  Francesco Potorti`  <pot@gnu.org>
+
+	* etags.c (add_regex): Better check for null regexps.
+	(readline): Check for regex matching null string.
+	(find_entries): Reorganisation.
+
 2002-06-07  Francesco Potorti`  <pot@gnu.org>

 	* etags.c (scan_separators): Support all character escape