668 lines
97 KiB
HTML
668 lines
97 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>4/taa</title>
|
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
<link href="../inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
</head>
|
|
<body>
|
|
<nav role="navigation">
|
|
<h1><a href="../webs.html">Sources</a></h1>
|
|
<ul>
|
|
<li><a href="../inweb/index.html">inweb</a></li>
|
|
</ul>
|
|
<h2>Foundation</h2>
|
|
<ul>
|
|
<li><a href="../foundation-module/index.html">foundation-module</a></li>
|
|
<li><a href="../foundation-test/index.html">foundation-test</a></li>
|
|
</ul>
|
|
|
|
|
|
</nav>
|
|
<main role="main">
|
|
|
|
<!--Weave of '4/pm' generated by 7-->
|
|
<ul class="crumbs"><li><a href="../webs.html">Source</a></li><li><a href="index.html">foundation</a></li><li><a href="index.html#4">Chapter 4: Text Handling</a></li><li><b>Pattern Matching</b></li></ul><p class="purpose">To provide a limited regular-expression parser.</p>
|
|
|
|
<ul class="toc"><li><a href="#SP1">§1. Character types</a></li><li><a href="#SP3">§3. Simple parsing</a></li><li><a href="#SP6">§6. A Worse PCRE</a></li><li><a href="#SP14">§14. Replacement</a></li></ul><hr class="tocbar">
|
|
|
|
<p class="inwebparagraph"><a id="SP1"></a><b>§1. Character types. </b>We will define white space as spaces and tabs only, since the various kinds
|
|
of line terminator will always be stripped out before this is applied.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::white_space</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">' '</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'\t'</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::white_space is used in <a href="#SP5">§5</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2"></a><b>§2. </b>The presence of <code class="display"><span class="extract">:</span></code> here is perhaps a bit surprising, since it's illegal in
|
|
C and has other meanings in other languages, but it's legal in C-for-Inform
|
|
identifiers.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::identifier_char</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">':'</span><span class="plain">) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'A'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'Z'</span><span class="plain">)) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'a'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'z'</span><span class="plain">)) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'0'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'9'</span><span class="plain">))) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::identifier_char is used in <a href="#SP13">§13</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP3"></a><b>§3. Simple parsing. </b>The following finds the earliest minimal-length substring of a string,
|
|
delimited by two pairs of characters: for example, <code class="display"><span class="extract"><<</span></code> and <code class="display"><span class="extract">>></span></code>. This could
|
|
easily be done as a regular expression using <code class="display"><span class="extract">Regexp::match</span></code>, but the routine
|
|
here is much quicker.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::find_expansion</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">on1</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">on2</span><span class="plain">,</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">off1</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">off2</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">len</span><span class="plain">) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"> < </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">) == </span><span class="identifier">on1</span><span class="plain">) && (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">+1) == </span><span class="identifier">on2</span><span class="plain">)) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">=</span><span class="identifier">i</span><span class="plain">+2; </span><span class="identifier">j</span><span class="plain"> < </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">j</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">) == </span><span class="identifier">off1</span><span class="plain">) && (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">+1) == </span><span class="identifier">off2</span><span class="plain">)) {</span>
|
|
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="identifier">j</span><span class="plain">+2-</span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> -1;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::find_expansion appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4"></a><b>§4. </b>Still more simply:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::find_open_brace</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"> < </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">) == </span><span class="character">'{'</span><span class="plain">)</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> -1;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::find_open_brace appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP5"></a><b>§5. </b>Note that we count the empty string as being white space. Again, this is
|
|
equivalent to <code class="display"><span class="extract">Regexp::match(p, " *")</span></code>, but much faster.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::string_is_white_space</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
|
|
<span class="identifier">LOOP_THROUGH_TEXT</span><span class="plain">(</span><span class="identifier">P</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::white_space</span><span class="plain">(</span><span class="functiontext">Str::get</span><span class="plain">(</span><span class="identifier">P</span><span class="plain">)) == </span><span class="constant">FALSE</span><span class="plain">)</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::string_is_white_space appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP6"></a><b>§6. A Worse PCRE. </b>I originally wanted to call the function in this section <code class="display"><span class="extract">a_better_sscanf</span></code>, then
|
|
thought perhaps <code class="display"><span class="extract">a_worse_PCRE</span></code> would be more true. (PCRE is Philip Hazel's superb
|
|
C implementation of regular-expression parsing, but I didn't need its full strength,
|
|
and I didn't want to complicate the build process by linking to it.)
|
|
</p>
|
|
|
|
<p class="inwebparagraph">This is a very minimal regular expression parser, simply for convenience of parsing
|
|
short texts against particularly simple patterns. Here is an example of use:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain"> = </span><span class="functiontext">Regexp::create_mr</span><span class="plain">();</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::match</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"fish (%d+) ([a-zA-Z_][a-zA-Z0-9_]*) *"</span><span class="plain">) {</span>
|
|
<span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"Fish number: %S\n"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[0]);</span>
|
|
<span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"Fish name: %S\n"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[1]);</span>
|
|
<span class="plain">}</span>
|
|
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph">Note the <code class="display"><span class="extract">L</span></code> at the front of the regex itself: this is a wide string.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">This tries to match the given <code class="display"><span class="extract">text</span></code> to see if it consists of the word fish,
|
|
then any amount of whitespace, then a string of digits which are copied into
|
|
<code class="display"><span class="extract">mr->exp[0]</span></code>, then whitespace again, and then an alphanumeric identifier to be
|
|
copied into <code class="display"><span class="extract">mr->exp[1]</span></code>, and finally optional whitespace. (If no match is
|
|
made, the contents of the found strings are undefined.)
|
|
</p>
|
|
|
|
<p class="inwebparagraph">Note that this differs from, for example, Perl's regular expression matcher
|
|
in several ways. The regular expression syntax is slightly different and in
|
|
general simpler. A match has to be made from start to end, so it's as if there
|
|
were an implicit <code class="display"><span class="extract">^</span></code> at the front and <code class="display"><span class="extract">$</span></code> at the back (in Perl terms). The
|
|
full match text is therefore always the entire text put in, so there's no
|
|
need to record this. In Perl, matching against <code class="display"><span class="extract">m/(.*) plus (.*)/</span></code> would
|
|
set three subexpressions: number 0 would be the whole text matched, number
|
|
1 would be the first bracketed part, number 2 the second. Here, though, the
|
|
corresponding regex would be written <code class="display"><span class="extract">L"(%c*) plus (%c*)"</span></code>, and the bracketed
|
|
terms would be subexpressions 0 and 1.
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain"> </span><span class="constant">5</span><span class="plain"> </span><span class="comment">this many bracketed subexpressions can be extracted</span>
|
|
</pre>
|
|
<p class="inwebparagraph"><a id="SP7"></a><b>§7. </b>The internal state of the matcher is stored as follows:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_position</span><span class="plain"> {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">tpos</span><span class="plain">; </span><span class="comment">position within text being matched</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ppos</span><span class="plain">; </span><span class="comment">position within pattern</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bc</span><span class="plain">; </span><span class="comment">count of bracketed subexpressions so far begun</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl</span><span class="plain">; </span><span class="comment">bracket indentation level</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bracket_nesting</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
|
|
<span class="comment">which subexpression numbers (0, 1, 2, 3) correspond to which nesting</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">brackets_start</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">], </span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
|
|
<span class="comment">positions in text being matched, inclusive</span>
|
|
<span class="plain">} </span><span class="reserved">match_position</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure match_position is private to this section.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP8"></a><b>§8. </b>It may appear that match texts are limited to 64 characters here, but they
|
|
are not. They are simply a little faster to access if short.
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain"> </span><span class="constant">64</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_result</span><span class="plain"> {</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">match_text_storage</span><span class="plain">[</span><span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain">];</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">text_stream</span><span class="plain"> </span><span class="identifier">match_text_struct</span><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">match_result</span><span class="plain">;</span>
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_results</span><span class="plain"> {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_matched_texts</span><span class="plain">;</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_result</span><span class="plain"> </span><span class="identifier">exp_storage</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">exp</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">exp_at</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
|
|
<span class="plain">} </span><span class="reserved">match_results</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure match_result is private to this section.</p>
|
|
|
|
<p class="endnote">The structure match_results is accessed in 3/cla, 8/ws, 8/bf and here.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP9"></a><b>§9. </b>Match result objects are inherently ephemeral, and we can expect to be
|
|
creating them and throwing them away frequently. This must be done
|
|
explicitly. Note that the storage required is on the C stack (unless some
|
|
result strings grow very large), so that it's very quick to allocate and
|
|
deallocate.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">match_results</span><span class="plain"> </span><span class="functiontext">Regexp::create_mr</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
|
|
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain">;</span>
|
|
<span class="identifier">mr</span><span class="plain">.</span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">mr</span><span class="plain">.</span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = -1;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) {</span>
|
|
<span class="identifier">STREAM_CLOSE</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::create_mr is used in <a href="#SP14">§14</a>, 3/cla (<a href="3-cla.html#SP11">§11</a>, <a href="3-cla.html#SP12">§12</a>), 8/ws (<a href="8-ws.html#SP7_3_2">§7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">§7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">§7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">§7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">§7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">§3</a>).</p>
|
|
|
|
<p class="endnote">The function Regexp::dispose_of is used in <a href="#SP10">§10</a>, <a href="#SP14">§14</a>, 3/cla (<a href="3-cla.html#SP11">§11</a>), 8/ws (<a href="8-ws.html#SP7_3_2">§7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">§7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">§7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">§7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">§7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">§3</a>).</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP10"></a><b>§10. </b>So, then: the matcher itself.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rv</span><span class="plain"> = (</span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">NULL</span><span class="plain">, </span><span class="constant">FALSE</span><span class="plain">) >= </span><span class="constant">0</span><span class="plain">)?</span><span class="identifier">TRUE:FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">mr</span><span class="plain">) && (</span><span class="identifier">rv</span><span class="plain"> == </span><span class="constant">FALSE</span><span class="plain">)) </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">rv</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match_from</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">,</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">x</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">allow_partial</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">match_to</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">x</span><span class="plain"> < </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">;</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="identifier">match_to</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &</span><span class="identifier">at</span><span class="plain">, </span><span class="identifier">allow_partial</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">match_to</span><span class="plain"> == -1) {</span>
|
|
<span class="identifier">match_to</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">match_to</span><span class="plain"> - </span><span class="identifier">x</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = -1;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) </span><span class="identifier">STREAM_CLOSE</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-></span><span class="identifier">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain"> =</span>
|
|
<span class="functiontext">Streams::new_buffer</span><span class="plain">(</span>
|
|
<span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_storage</span><span class="plain">);</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain">.</span><span class="element">stream_flags</span><span class="plain"> |= </span><span class="constant">FOR_RE_STRF</span><span class="plain">;</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = &(</span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::match is used in 3/cla (<a href="3-cla.html#SP11">§11</a>, <a href="3-cla.html#SP12">§12</a>), 8/ws (<a href="8-ws.html#SP7_3_2">§7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">§7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">§7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">§7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">§7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">§3</a>).</p>
|
|
|
|
<p class="endnote">The function Regexp::match_from appears nowhere else.</p>
|
|
|
|
<p class="endnote">The function Regexp::prepare is used in <a href="#SP14">§14</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11"></a><b>§11. </b></p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">,</span>
|
|
<span class="reserved">match_position</span><span class="plain"> *</span><span class="identifier">scan_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">allow_partial</span><span class="plain">) {</span>
|
|
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">scan_from</span><span class="plain">) </span><span class="identifier">at</span><span class="plain"> = *</span><span class="identifier">scan_from</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> { </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">tpos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; }</span>
|
|
|
|
<span class="reserved">while</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">)) || (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">])) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">allow_partial</span><span class="plain">) && (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="constant">0</span><span class="plain">)) </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Parentheses in the match pattern set up substrings to extract</span> <span class="cwebmacronumber">11.1</span>><span class="plain">;</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">chcl</span><span class="plain">, </span><span class="comment">what class of characters to match: a <code class="display"><span class="extract">*_CLASS</span></code> value</span>
|
|
<span class="identifier">range_from</span><span class="plain">, </span><span class="identifier">range_to</span><span class="plain">, </span><span class="comment">for <code class="display"><span class="extract">LITERAL_CLASS</span></code> only</span>
|
|
<span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; </span><span class="comment">require a non-match rather than a match</span>
|
|
<<span class="cwebmacro">Extract the character class to match from the pattern</span> <span class="cwebmacronumber">11.2</span>><span class="plain">;</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">, </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="comment">minimum and maximum number of repetitions</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">greedy</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="comment">go for a maximal-length match if possible</span>
|
|
<<span class="cwebmacro">Extract repetition markers from the pattern</span> <span class="cwebmacronumber">11.3</span>><span class="plain">;</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">reps</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Count how many repetitions can be made here</span> <span class="cwebmacronumber">11.4</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">reps</span><span class="plain"> < </span><span class="identifier">rep_from</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> -1;</span>
|
|
|
|
<span class="comment">we can now accept anything from <code class="display"><span class="extract">rep_from</span></code> to <code class="display"><span class="extract">reps</span></code> repetitions</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">rep_from</span><span class="plain"> == </span><span class="identifier">reps</span><span class="plain">) { </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> += </span><span class="identifier">reps</span><span class="plain">; </span><span class="reserved">continue</span><span class="plain">; }</span>
|
|
<<span class="cwebmacro">Try all possible match lengths until we find a match</span> <span class="cwebmacronumber">11.5</span>><span class="plain">;</span>
|
|
|
|
<span class="comment">no match length worked, so no match</span>
|
|
<span class="reserved">return</span><span class="plain"> -1;</span>
|
|
<span class="plain">}</span>
|
|
<<span class="cwebmacro">Copy the bracketed texts found into the global strings</span> <span class="cwebmacronumber">11.6</span>><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">tpos</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::match_r is used in <a href="#SP10">§10</a>, <a href="#SP11_5">§11.5</a>, <a href="#SP14">§14</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_1"></a><b>§11.1. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Parentheses in the match pattern set up substrings to extract</span> <span class="cwebmacronumber">11.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'('</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> < </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] = -1;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> < </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) {</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">;</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">] = -1;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">++; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">++; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
|
|
<span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">')'</span><span class="plain">) {</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">--;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> >= </span><span class="constant">0</span><span class="plain">) && (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> < </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) && (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] >= </span><span class="constant">0</span><span class="plain">))</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">]] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">-1;</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
|
|
<span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_2"></a><b>§11.2. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Extract the character class to match from the pattern</span> <span class="cwebmacronumber">11.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">len</span><span class="plain">;</span>
|
|
<span class="identifier">chcl</span><span class="plain"> = </span><span class="functiontext">Regexp::get_cclass</span><span class="plain">(</span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">, &</span><span class="identifier">len</span><span class="plain">, &</span><span class="identifier">range_from</span><span class="plain">, &</span><span class="identifier">range_to</span><span class="plain">, &</span><span class="identifier">reverse</span><span class="plain">);</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> += </span><span class="identifier">len</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_3"></a><b>§11.3. </b>This is standard regular-expression notation, except that I haven't bothered
|
|
to implement numeric repetition counts, which we won't need:
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Extract repetition markers from the pattern</span> <span class="cwebmacronumber">11.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">chcl</span><span class="plain"> == </span><span class="constant">WHITESPACE_CLASS</span><span class="plain">) {</span>
|
|
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'+'</span><span class="plain">) {</span>
|
|
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'*'</span><span class="plain">) {</span>
|
|
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'?'</span><span class="plain">) { </span><span class="identifier">greedy</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++; }</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_4"></a><b>§11.4. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Count how many repetitions can be made here</span> <span class="cwebmacronumber">11.4</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">reps</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">+</span><span class="identifier">reps</span><span class="plain">)) && (</span><span class="identifier">reps</span><span class="plain"> < </span><span class="identifier">rep_to</span><span class="plain">)); </span><span class="identifier">reps</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::test_cclass</span><span class="plain">(</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">+</span><span class="identifier">reps</span><span class="plain">), </span><span class="identifier">chcl</span><span class="plain">,</span>
|
|
<span class="identifier">range_from</span><span class="plain">, </span><span class="identifier">range_to</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">reverse</span><span class="plain">) == </span><span class="constant">FALSE</span><span class="plain">)</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_5"></a><b>§11.5. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Try all possible match lengths until we find a match</span> <span class="cwebmacronumber">11.5</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">rep_from</span><span class="plain">, </span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">reps</span><span class="plain">, </span><span class="identifier">dj</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">, </span><span class="identifier">from_tpos</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">greedy</span><span class="plain">) { </span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">reps</span><span class="plain">; </span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">rep_from</span><span class="plain">; </span><span class="identifier">dj</span><span class="plain"> = -1; }</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">from</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> != </span><span class="identifier">to</span><span class="plain">+</span><span class="identifier">dj</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> += </span><span class="identifier">dj</span><span class="plain">) {</span>
|
|
<span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">from_tpos</span><span class="plain"> + </span><span class="identifier">j</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">try</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &</span><span class="identifier">at</span><span class="plain">, </span><span class="identifier">allow_partial</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">try</span><span class="plain"> >= </span><span class="constant">0</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">try</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP11_6"></a><b>§11.6. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Copy the bracketed texts found into the global strings</span> <span class="cwebmacronumber">11.6</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="functiontext">Str::clear</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-></span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain"> <= </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain">++)</span>
|
|
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-></span><span class="identifier">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">], </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">));</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">mr</span><span class="plain">-></span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP11">§11</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP12"></a><b>§12. </b>So then: most characters in the pattern are taken literally (if the pattern
|
|
says <code class="display"><span class="extract">q</span></code>, the only match is with a lower-case letter "q"), except that:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<ul class="items"><li>(a) a space means "one or more characters of white space";
|
|
</li><li>(b) <code class="display"><span class="extract">%d</span></code> means any decimal digit;
|
|
</li><li>(c) <code class="display"><span class="extract">%c</span></code> means any character at all;
|
|
</li><li>(d) <code class="display"><span class="extract">%C</span></code> means any character which isn't white space;
|
|
</li><li>(e) <code class="display"><span class="extract">%i</span></code> means any character from the identifier class (see above);
|
|
</li><li>(f) <code class="display"><span class="extract">%p</span></code> means any character which can be used in the name of a Preform
|
|
nonterminal, which is to say, an identifier character or a hyphen;
|
|
</li><li>(g) <code class="display"><span class="extract">%P</span></code> means the same or else a colon;
|
|
</li><li>(h) <code class="display"><span class="extract">%t</span></code> means a tab;
|
|
</li><li>(i) <code class="display"><span class="extract">%q</span></code> means a double-quote.
|
|
</li></ul>
|
|
<p class="inwebparagraph"><code class="display"><span class="extract">%</span></code> otherwise makes a literal escape; a space means any whitespace character;
|
|
square brackets enclose literal alternatives, and note as usual with grep
|
|
engines that <code class="display"><span class="extract">[]xyz]</span></code> is legal and makes a set of four possibilities, the
|
|
first of which is a literal close square; within a set, a hyphen makes a
|
|
character range; an initial <code class="display"><span class="extract">^</span></code> negates the result; and otherwise everything
|
|
is literal.
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">ANY_CLASS</span><span class="plain"> </span><span class="constant">1</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">DIGIT_CLASS</span><span class="plain"> </span><span class="constant">2</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">WHITESPACE_CLASS</span><span class="plain"> </span><span class="constant">3</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">NONWHITESPACE_CLASS</span><span class="plain"> </span><span class="constant">4</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">IDENTIFIER_CLASS</span><span class="plain"> </span><span class="constant">5</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">PREFORM_CLASS</span><span class="plain"> </span><span class="constant">6</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">PREFORMC_CLASS</span><span class="plain"> </span><span class="constant">7</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">LITERAL_CLASS</span><span class="plain"> </span><span class="constant">8</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TAB_CLASS</span><span class="plain"> </span><span class="constant">9</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">QUOTE_CLASS</span><span class="plain"> </span><span class="constant">10</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::get_cclass</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">ppos</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">len</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">to</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">reverse</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">] == </span><span class="character">'^'</span><span class="plain">) { </span><span class="identifier">ppos</span><span class="plain">++; *</span><span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; } </span><span class="reserved">else</span><span class="plain"> { *</span><span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; }</span>
|
|
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'%'</span><span class="plain">:</span>
|
|
<span class="identifier">ppos</span><span class="plain">++;</span>
|
|
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">2</span><span class="plain">;</span>
|
|
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'d'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">DIGIT_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'c'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">ANY_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'C'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">NONWHITESPACE_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'i'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">IDENTIFIER_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'p'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">PREFORM_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'P'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">PREFORMC_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'q'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">QUOTE_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'t'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TAB_CLASS</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">*</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; *</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">'['</span><span class="plain">:</span>
|
|
<span class="plain">*</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">+1;</span>
|
|
<span class="identifier">ppos</span><span class="plain"> += </span><span class="constant">2</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> ((</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) && (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">] != </span><span class="character">']'</span><span class="plain">)) </span><span class="identifier">ppos</span><span class="plain">++;</span>
|
|
<span class="plain">*</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain"> - </span><span class="constant">1</span><span class="plain">; *</span><span class="identifier">len</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain"> - *</span><span class="identifier">from</span><span class="plain"> + </span><span class="constant">2</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="character">' '</span><span class="plain">:</span>
|
|
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">WHITESPACE_CLASS</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; *</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; *</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::get_cclass is used in <a href="#SP11_2">§11.2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP13"></a><b>§13. </b></p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::test_cclass</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">chcl</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">range_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">range_to</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">drawn_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">reverse</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">chcl</span><span class="plain">) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">ANY_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">DIGIT_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">isdigit</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">WHITESPACE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Characters::is_space_or_tab</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TAB_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'\t'</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">NONWHITESPACE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (!(</span><span class="functiontext">Characters::is_space_or_tab</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">QUOTE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> != </span><span class="character">'\"'</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">IDENTIFIER_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::identifier_char</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">PREFORM_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'-'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'a'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'z'</span><span class="plain">)) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'0'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'9'</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">PREFORMC_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'-'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">':'</span><span class="plain">) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'a'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'z'</span><span class="plain">)) ||</span>
|
|
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="character">'0'</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="character">'9'</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">LITERAL_CLASS:</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">range_to</span><span class="plain"> > </span><span class="identifier">range_from</span><span class="plain">) && (</span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">range_from</span><span class="plain">] == </span><span class="character">'^'</span><span class="plain">)) {</span>
|
|
<span class="identifier">range_from</span><span class="plain">++; </span><span class="identifier">reverse</span><span class="plain"> = </span><span class="identifier">reverse</span><span class="plain">?</span><span class="identifier">FALSE:TRUE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">range_from</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> <= </span><span class="identifier">range_to</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c1</span><span class="plain"> = </span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">], </span><span class="identifier">c2</span><span class="plain"> = </span><span class="identifier">c1</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">j</span><span class="plain">+1 < </span><span class="identifier">range_to</span><span class="plain">) && (</span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+1] == </span><span class="character">'-'</span><span class="plain">)) { </span><span class="identifier">c2</span><span class="plain"> = </span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+2]; </span><span class="identifier">j</span><span class="plain"> += </span><span class="constant">2</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="identifier">c1</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="identifier">c2</span><span class="plain">)) {</span>
|
|
<span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">reverse</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = (</span><span class="identifier">match</span><span class="plain">)?</span><span class="identifier">FALSE:TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">match</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::test_cclass is used in <a href="#SP11_4">§11.4</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP14"></a><b>§14. Replacement. </b>And this routine conveniently handles searching and replacing. This time we
|
|
can match at substrings of the <code class="display"><span class="extract">text</span></code> (i.e., we are not forced to match
|
|
from the start right to the end), and multiple replacements can be made.
|
|
For example,
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="functiontext">Regexp::replace</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"[aeiou]"</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"!"</span><span class="plain">, </span><span class="constant">REP_REPEATING</span><span class="plain">);</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph">will turn the <code class="display"><span class="extract">text</span></code> "goose eggs" into "g!!s! !ggs".
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">REP_REPEATING</span><span class="plain"> </span><span class="constant">1</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">REP_ATSTART</span><span class="plain"> </span><span class="constant">2</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::replace</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">replacement</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">options</span><span class="plain">) {</span>
|
|
<span class="identifier">TEMPORARY_TEXT</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">);</span>
|
|
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain"> = </span><span class="functiontext">Regexp::create_mr</span><span class="plain">();</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">changes</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0, </span><span class="identifier">L</span><span class="plain">=</span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">L</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
|
|
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">mp</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">i</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="functiontext">Regexp::prepare</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">try</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &</span><span class="identifier">mp</span><span class="plain">, </span><span class="constant">TRUE</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">try</span><span class="plain"> >= </span><span class="constant">0</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">replacement</span><span class="plain">)</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">=0; </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain">++) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain"> = </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">];</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'%'</span><span class="plain">) {</span>
|
|
<span class="identifier">j</span><span class="plain">++;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ind</span><span class="plain"> = </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] - </span><span class="character">'0'</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">ind</span><span class="plain"> >= </span><span class="constant">0</span><span class="plain">) && (</span><span class="identifier">ind</span><span class="plain"> < </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">))</span>
|
|
<span class="identifier">WRITE_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="string">"%S"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">ind</span><span class="plain">]);</span>
|
|
<span class="reserved">else</span>
|
|
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]);</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]);</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">left</span><span class="plain"> = </span><span class="identifier">L</span><span class="plain"> - </span><span class="identifier">try</span><span class="plain">;</span>
|
|
<span class="identifier">changes</span><span class="plain">++;</span>
|
|
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="identifier">L</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain"> = </span><span class="identifier">L</span><span class="plain">-</span><span class="identifier">left</span><span class="plain">-1;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">options</span><span class="plain"> & </span><span class="constant">REP_REPEATING</span><span class="plain">) == </span><span class="constant">0</span><span class="plain">) { </span><<span class="cwebmacro">Add the rest</span> <span class="cwebmacronumber">14.1</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">options</span><span class="plain"> & </span><span class="constant">REP_ATSTART</span><span class="plain">) { </span><<span class="cwebmacro">Add the rest</span> <span class="cwebmacronumber">14.1</span>><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="plain">}</span>
|
|
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&</span><span class="identifier">mr</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">changes</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) </span><span class="functiontext">Str::copy</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">altered</span><span class="plain">);</span>
|
|
<span class="identifier">DISCARD_TEXT</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">changes</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Regexp::replace appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP14_1"></a><b>§14.1. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Add the rest</span> <span class="cwebmacronumber">14.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">++; </span><span class="identifier">i</span><span class="plain"><</span><span class="identifier">L</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
|
|
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">));</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP14">§14</a> (twice).</p>
|
|
|
|
<hr class="tocbar">
|
|
<ul class="toc"><li><a href="4-taa.html">Back to 'Tries and Avinues'</a></li><li><i>(This section ends Chapter 4: Text Handling.)</i></li></ul><hr class="tocbar">
|
|
<!--End of weave-->
|
|
</main>
|
|
</body>
|
|
</html>
|
|
|