New multi-line regexp and new regexp syntax.

This commit is contained in:
Francesco Potortì 2002-06-13 11:15:46 +00:00
parent 292c80bc08
commit 6861f0e327
3 changed files with 68 additions and 23 deletions

View file

@ -569,6 +569,23 @@ comparison.
** Etags changes.
*** New syntax for regular expressions, multi-line regular expressions.
The syntax --ignore-case-regexp=/REGEX/NAME/ is now undocumented and
retained only for backward compatibility. The new equivalent syntax is
--regex=/REGEX/NAME/i. More generally, it is --regex=/REGEX/NAME/MODS,
where `/NAME' is optional, as usual, and MODS is a string of 0 or more
characters among `i' (ignore case), `m' (multi-line) and `s'
(single-line). The `m' and `s' modifiers behave as in Perl regular
expressions: `m' allows regexps to match more than one line, while `s'
(which implies `m') means that `.' matches newlines. The ability to
span newlines allows writing of much more powerful regular expressions
and rapid prototyping for tagging new languages.
*** Regular expressions can use char escape sequences as in Gcc
The escaped character sequence \a, \b, \d, \e, \f, \n, \r, \t, \v,
respectively, stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
CR, TAB, VT,
*** In Prolog, etags creates tags for rules in addition to predicates.
*** In Perl, packages are tags.
@ -596,9 +613,6 @@ be used (only once) in place of a file name on the command line. Etags
will read from standard input and mark the produced tags as belonging to
the file FILE.
*** Regular expressions can use char escape sequences as in Gcc
These are the escapes \a, \b, \d, \e, \f, \n, \r, \t, \v.
+++
** The command line option --no-windows has been changed to
--no-window-system. The old one still works, but is deprecated.

View file

@ -22,7 +22,6 @@ etags, ctags \- generate tag file for Emacs, vi
[\|\-\-ignore\-indentation\|] [\|\-\-language=\fIlanguage\fP\|]
[\|\-\-members\|] [\|\-\-output=\fItagfile\fP\|]
[\|\-\-regex=\fIregexp\fP\|] [\|\-\-no\-regex\|]
[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
[\|\-\-help\|] [\|\-\-version\|]
\fIfile\fP .\|.\|.
@ -36,7 +35,6 @@ etags, ctags \- generate tag file for Emacs, vi
[\|\-\-globals\|] [\|\-\-ignore\-indentation\|]
[\|\-\-language=\fIlanguage\fP\|] [\|\-\-members\|]
[\|\-\-output=\fItagfile\fP\|] [\|\-\-regex=\fIregexp\fP\|]
[\|\-\-ignore\-case\-regex=\fIregexp\fP\|]
[\|\-\-typedefs\|] [\|\-\-typedefs\-and\-c++\|]
[\|\-\-update\|] [\|\-\-no\-warn\|]
[\|\-\-help\|] [\|\-\-version\|]
@ -149,27 +147,32 @@ Explicit name of file for tag table; overrides default \fBTAGS\fP or
\fBtags\fP. (But ignored with \fB\-v\fP or \fB\-x\fP.)
.TP
\fB\-r\fP \fIregexp\fP, \fB\-\-regex=\fIregexp\fP
.TP
\fB\-\-ignore\-case\-regex=\fIregexp\fP
Make tags based on regexp matching for each line of the files
following this option, in addition to the tags made with the standard
parsing based on language. When using \fB\-\-regex\fP, case is
significant, while it is not with \fB\-\-ignore\-case\-regex\fP. May
be freely intermixed with filenames and the \fB\-R\fP option. The
regexps are cumulative, i.e. each option will add to the previous
ones. The regexps are of the form:
Make tags based on regexp matching for the files following this option,
in addition to the tags made with the standard parsing based on
language. May be freely intermixed with filenames and the \fB\-R\fP
option. The regexps are cumulative, i.e. each such option will add to
the previous ones. The regexps are of the form:
.br
\fB/\fP\fItagregexp\fP[\fB/\fP\fInameregexp\fP]\fB/\fP
\fB/\fP\fItagregexp/\fP[\fInameregexp\fP\fB/\fP]\fImodifiers\fP
.br
where \fItagregexp\fP is used to match the lines that must be tagged.
It should not match useless characters. If the match is
such that more characters than needed are unavoidably matched by
\fItagregexp\fP, it may be useful to add a \fInameregexp\fP, to
narrow down the tag scope. \fBctags\fP ignores regexps without a
\fInameregexp\fP. The syntax of regexps is the same as in emacs.
The following character escape sequences are supported:
\\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v.
where \fItagregexp\fP is used to match the tag. It should not match
useless characters. If the match is such that more characters than
needed are unavoidably matched by \fItagregexp\fP, it may be useful to
add a \fInameregexp\fP, to narrow down the tag scope. \fBctags\fP
ignores regexps without a \fInameregexp\fP. The syntax of regexps is
the same as in emacs. The following character escape sequences are
supported: \\a, \\b, \\d, \\e, \\f, \\n, \\r, \\t, \\v, which
respectively stand for the ASCII characters BEL, BS, DEL, ESC, FF, NL,
CR, TAB, VT.
.br
The \fImodifiers\fP are a sequence of 0 or more characters among
\fIi\fP, which means to ignore case when matching; \fIm\fP, which means
that the \fItagregexp\fP will be matched against the whole file contents
at once, rather than line by line, and the matching sequence can match
multiple lines; and \fIs\fP, which implies \fIm\fP and means that the
dot character in \fItagregexp\fP matches the newline char as well.
.br
Here are some examples. All the regexps are quoted to protect them

View file

@ -1,3 +1,31 @@
2002-06-12 Francesco Potorti` <pot@gnu.org>
* etags.c: New multi-line regexp and new regexp syntax.
(arg_type): at_icregexp label removed (obsolete).
(pattern): New member multi_line for multi-line regexps.
(filebuf): A global buffer containing the whole file as a string
for multi-line regexp matching.
(need_filebuf): Global flag raised if multi-line regexps used.
(print_help): Document new regexp modifiers, remove references to
obsolete option --ignore-case-regexp.
(main): Do not set regexp syntax and translation table here.
(main): Treat -c option as a backward compatibility hack.
(main, find_entries): Init and free filebuf.
(find_entries): Call regex_tag_multiline after the regular parser.
(scan_separators): Check for untermintaed regexp and return NULL.
(analyse_regex, add_regex): Remove the ignore_case argument, which
is now a modifier to the regexp. All callers changed.
(add_regex): Manage the regexp modifiers.
(regex_tag_multiline): New function. Reads from filebuf.
(readline_internal): If necessary, copy the whole file into filebuf.
(readline): Skip multi-line regexps, leave them to regex_tag_multiline.
2002-06-11 Francesco Potorti` <pot@gnu.org>
* etags.c (add_regex): Better check for null regexps.
(readline): Check for regex matching null string.
(find_entries): Reorganisation.
2002-06-07 Francesco Potorti` <pot@gnu.org>
* etags.c (scan_separators): Support all character escape