inweb-bootstrap/docs/foundation-module/4-pm.html
2020-04-07 23:04:32 +01:00

668 lines
97 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>4/taa</title>
<meta name="viewport" content="width=device-width initial-scale=1">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
<link href="../inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
</head>
<body>
<nav role="navigation">
<h1><a href="../webs.html">Sources</a></h1>
<ul>
<li><a href="../inweb/index.html">inweb</a></li>
</ul>
<h2>Foundation</h2>
<ul>
<li><a href="../foundation-module/index.html">foundation-module</a></li>
<li><a href="../foundation-test/index.html">foundation-test</a></li>
</ul>
</nav>
<main role="main">
<!--Weave of '4/pm' generated by 7-->
<ul class="crumbs"><li><a href="../webs.html">Source</a></li><li><a href="index.html">foundation</a></li><li><a href="index.html#4">Chapter 4: Text Handling</a></li><li><b>Pattern Matching</b></li></ul><p class="purpose">To provide a limited regular-expression parser.</p>
<ul class="toc"><li><a href="#SP1">&#167;1. Character types</a></li><li><a href="#SP3">&#167;3. Simple parsing</a></li><li><a href="#SP6">&#167;6. A Worse PCRE</a></li><li><a href="#SP14">&#167;14. Replacement</a></li></ul><hr class="tocbar">
<p class="inwebparagraph"><a id="SP1"></a><b>&#167;1. Character types. </b>We will define white space as spaces and tabs only, since the various kinds
of line terminator will always be stripped out before this is applied.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::white_space</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">' '</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'\t'</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::white_space is used in <a href="#SP5">&#167;5</a>.</p>
<p class="inwebparagraph"><a id="SP2"></a><b>&#167;2. </b>The presence of <code class="display"><span class="extract">:</span></code> here is perhaps a bit surprising, since it's illegal in
C and has other meanings in other languages, but it's legal in C-for-Inform
identifiers.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::identifier_char</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">':'</span><span class="plain">) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'A'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'Z'</span><span class="plain">)) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'a'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'z'</span><span class="plain">)) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'0'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'9'</span><span class="plain">))) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::identifier_char is used in <a href="#SP13">&#167;13</a>.</p>
<p class="inwebparagraph"><a id="SP3"></a><b>&#167;3. Simple parsing. </b>The following finds the earliest minimal-length substring of a string,
delimited by two pairs of characters: for example, <code class="display"><span class="extract">&lt;&lt;</span></code> and <code class="display"><span class="extract">&gt;&gt;</span></code>. This could
easily be done as a regular expression using <code class="display"><span class="extract">Regexp::match</span></code>, but the routine
here is much quicker.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::find_expansion</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">on1</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">on2</span><span class="plain">,</span>
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">off1</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">off2</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">len</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"> &lt; </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">) == </span><span class="identifier">on1</span><span class="plain">) &amp;&amp; (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">+1) == </span><span class="identifier">on2</span><span class="plain">)) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">=</span><span class="identifier">i</span><span class="plain">+2; </span><span class="identifier">j</span><span class="plain"> &lt; </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">j</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">) == </span><span class="identifier">off1</span><span class="plain">) &amp;&amp; (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">+1) == </span><span class="identifier">off2</span><span class="plain">)) {</span>
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="identifier">j</span><span class="plain">+2-</span><span class="identifier">i</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="reserved">return</span><span class="plain"> -1;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::find_expansion appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP4"></a><b>&#167;4. </b>Still more simply:
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::find_open_brace</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain"> &lt; </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">) == </span><span class="character">'{'</span><span class="plain">)</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> -1;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::find_open_brace appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP5"></a><b>&#167;5. </b>Note that we count the empty string as being white space. Again, this is
equivalent to <code class="display"><span class="extract">Regexp::match(p, " *")</span></code>, but much faster.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::string_is_white_space</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">) {</span>
<span class="identifier">LOOP_THROUGH_TEXT</span><span class="plain">(</span><span class="identifier">P</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::white_space</span><span class="plain">(</span><span class="functiontext">Str::get</span><span class="plain">(</span><span class="identifier">P</span><span class="plain">)) == </span><span class="constant">FALSE</span><span class="plain">)</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::string_is_white_space appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP6"></a><b>&#167;6. A Worse PCRE. </b>I originally wanted to call the function in this section <code class="display"><span class="extract">a_better_sscanf</span></code>, then
thought perhaps <code class="display"><span class="extract">a_worse_PCRE</span></code> would be more true. (PCRE is Philip Hazel's superb
C implementation of regular-expression parsing, but I didn't need its full strength,
and I didn't want to complicate the build process by linking to it.)
</p>
<p class="inwebparagraph">This is a very minimal regular expression parser, simply for convenience of parsing
short texts against particularly simple patterns. Here is an example of use:
</p>
<pre class="display">
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain"> = </span><span class="functiontext">Regexp::create_mr</span><span class="plain">();</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::match</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"fish (%d+) ([a-zA-Z_][a-zA-Z0-9_]*) *"</span><span class="plain">) {</span>
<span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"Fish number: %S\n"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[0]);</span>
<span class="identifier">PRINT</span><span class="plain">(</span><span class="string">"Fish name: %S\n"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[1]);</span>
<span class="plain">}</span>
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">);</span>
</pre>
<p class="inwebparagraph">Note the <code class="display"><span class="extract">L</span></code> at the front of the regex itself: this is a wide string.
</p>
<p class="inwebparagraph">This tries to match the given <code class="display"><span class="extract">text</span></code> to see if it consists of the word fish,
then any amount of whitespace, then a string of digits which are copied into
<code class="display"><span class="extract">mr-&gt;exp[0]</span></code>, then whitespace again, and then an alphanumeric identifier to be
copied into <code class="display"><span class="extract">mr-&gt;exp[1]</span></code>, and finally optional whitespace. (If no match is
made, the contents of the found strings are undefined.)
</p>
<p class="inwebparagraph">Note that this differs from, for example, Perl's regular expression matcher
in several ways. The regular expression syntax is slightly different and in
general simpler. A match has to be made from start to end, so it's as if there
were an implicit <code class="display"><span class="extract">^</span></code> at the front and <code class="display"><span class="extract">$</span></code> at the back (in Perl terms). The
full match text is therefore always the entire text put in, so there's no
need to record this. In Perl, matching against <code class="display"><span class="extract">m/(.*) plus (.*)/</span></code> would
set three subexpressions: number 0 would be the whole text matched, number
1 would be the first bracketed part, number 2 the second. Here, though, the
corresponding regex would be written <code class="display"><span class="extract">L"(%c*) plus (%c*)"</span></code>, and the bracketed
terms would be subexpressions 0 and 1.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain"> </span><span class="constant">5</span><span class="plain"> </span><span class="comment">this many bracketed subexpressions can be extracted</span>
</pre>
<p class="inwebparagraph"><a id="SP7"></a><b>&#167;7. </b>The internal state of the matcher is stored as follows:
</p>
<pre class="display">
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_position</span><span class="plain"> {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">tpos</span><span class="plain">; </span><span class="comment">position within text being matched</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ppos</span><span class="plain">; </span><span class="comment">position within pattern</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bc</span><span class="plain">; </span><span class="comment">count of bracketed subexpressions so far begun</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bl</span><span class="plain">; </span><span class="comment">bracket indentation level</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">bracket_nesting</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
<span class="comment">which subexpression numbers (0, 1, 2, 3) correspond to which nesting</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">brackets_start</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">], </span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
<span class="comment">positions in text being matched, inclusive</span>
<span class="plain">} </span><span class="reserved">match_position</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The structure match_position is private to this section.</p>
<p class="inwebparagraph"><a id="SP8"></a><b>&#167;8. </b>It may appear that match texts are limited to 64 characters here, but they
are not. They are simply a little faster to access if short.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain"> </span><span class="constant">64</span>
</pre>
<pre class="display">
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_result</span><span class="plain"> {</span>
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">match_text_storage</span><span class="plain">[</span><span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain">];</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">text_stream</span><span class="plain"> </span><span class="identifier">match_text_struct</span><span class="plain">;</span>
<span class="plain">} </span><span class="reserved">match_result</span><span class="plain">;</span>
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_results</span><span class="plain"> {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">no_matched_texts</span><span class="plain">;</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_result</span><span class="plain"> </span><span class="identifier">exp_storage</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">exp</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">exp_at</span><span class="plain">[</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">];</span>
<span class="plain">} </span><span class="reserved">match_results</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The structure match_result is private to this section.</p>
<p class="endnote">The structure match_results is accessed in 3/cla, 8/ws, 8/bf and here.</p>
<p class="inwebparagraph"><a id="SP9"></a><b>&#167;9. </b>Match result objects are inherently ephemeral, and we can expect to be
creating them and throwing them away frequently. This must be done
explicitly. Note that the storage required is on the C stack (unless some
result strings grow very large), so that it's very quick to allocate and
deallocate.
</p>
<pre class="display">
<span class="reserved">match_results</span><span class="plain"> </span><span class="functiontext">Regexp::create_mr</span><span class="plain">(</span><span class="reserved">void</span><span class="plain">) {</span>
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain">;</span>
<span class="identifier">mr</span><span class="plain">.</span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="identifier">mr</span><span class="plain">.</span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = -1;</span>
<span class="plain">}</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) {</span>
<span class="identifier">STREAM_CLOSE</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">NULL</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::create_mr is used in <a href="#SP14">&#167;14</a>, 3/cla (<a href="3-cla.html#SP11">&#167;11</a>, <a href="3-cla.html#SP12">&#167;12</a>), 8/ws (<a href="8-ws.html#SP7_3_2">&#167;7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">&#167;7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">&#167;7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">&#167;7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">&#167;7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">&#167;3</a>).</p>
<p class="endnote">The function Regexp::dispose_of is used in <a href="#SP10">&#167;10</a>, <a href="#SP14">&#167;14</a>, 3/cla (<a href="3-cla.html#SP11">&#167;11</a>), 8/ws (<a href="8-ws.html#SP7_3_2">&#167;7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">&#167;7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">&#167;7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">&#167;7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">&#167;7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">&#167;3</a>).</p>
<p class="inwebparagraph"><a id="SP10"></a><b>&#167;10. </b>So, then: the matcher itself.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rv</span><span class="plain"> = (</span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">NULL</span><span class="plain">, </span><span class="constant">FALSE</span><span class="plain">) &gt;= </span><span class="constant">0</span><span class="plain">)?</span><span class="identifier">TRUE:FALSE</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">mr</span><span class="plain">) &amp;&amp; (</span><span class="identifier">rv</span><span class="plain"> == </span><span class="constant">FALSE</span><span class="plain">)) </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">rv</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match_from</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">,</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">x</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">allow_partial</span><span class="plain">) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">match_to</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">x</span><span class="plain"> &lt; </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">;</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="identifier">match_to</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &amp;</span><span class="identifier">at</span><span class="plain">, </span><span class="identifier">allow_partial</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">match_to</span><span class="plain"> == -1) {</span>
<span class="identifier">match_to</span><span class="plain"> = </span><span class="identifier">x</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) </span><span class="functiontext">Regexp::dispose_of</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">match_to</span><span class="plain"> - </span><span class="identifier">x</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Regexp::prepare</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = -1;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) </span><span class="identifier">STREAM_CLOSE</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="identifier">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain"> =</span>
<span class="functiontext">Streams::new_buffer</span><span class="plain">(</span>
<span class="constant">MATCH_TEXT_INITIAL_ALLOCATION</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_storage</span><span class="plain">);</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain">.</span><span class="element">stream_flags</span><span class="plain"> |= </span><span class="constant">FOR_RE_STRF</span><span class="plain">;</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = &amp;(</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_storage</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">].</span><span class="element">match_text_struct</span><span class="plain">);</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::match is used in 3/cla (<a href="3-cla.html#SP11">&#167;11</a>, <a href="3-cla.html#SP12">&#167;12</a>), 8/ws (<a href="8-ws.html#SP7_3_2">&#167;7.3.2</a>, <a href="8-ws.html#SP7_3_3_2">&#167;7.3.3.2</a>, <a href="8-ws.html#SP7_3_3_2_1">&#167;7.3.3.2.1</a>, <a href="8-ws.html#SP7_2_2_1">&#167;7.2.2.1</a>, <a href="8-ws.html#SP7_2_2_3">&#167;7.2.2.3</a>), 8/bf (<a href="8-bf.html#SP3">&#167;3</a>).</p>
<p class="endnote">The function Regexp::match_from appears nowhere else.</p>
<p class="endnote">The function Regexp::prepare is used in <a href="#SP14">&#167;14</a>.</p>
<p class="inwebparagraph"><a id="SP11"></a><b>&#167;11. </b></p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="reserved">match_results</span><span class="plain"> *</span><span class="identifier">mr</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">,</span>
<span class="reserved">match_position</span><span class="plain"> *</span><span class="identifier">scan_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">allow_partial</span><span class="plain">) {</span>
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">scan_from</span><span class="plain">) </span><span class="identifier">at</span><span class="plain"> = *</span><span class="identifier">scan_from</span><span class="plain">;</span>
<span class="reserved">else</span><span class="plain"> { </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">tpos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; }</span>
<span class="reserved">while</span><span class="plain"> ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">)) || (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">])) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">allow_partial</span><span class="plain">) &amp;&amp; (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="constant">0</span><span class="plain">)) </span><span class="reserved">break</span><span class="plain">;</span>
&lt;<span class="cwebmacro">Parentheses in the match pattern set up substrings to extract</span> <span class="cwebmacronumber">11.1</span>&gt;<span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">chcl</span><span class="plain">, </span><span class="comment">what class of characters to match: a <code class="display"><span class="extract">*_CLASS</span></code> value</span>
<span class="identifier">range_from</span><span class="plain">, </span><span class="identifier">range_to</span><span class="plain">, </span><span class="comment">for <code class="display"><span class="extract">LITERAL_CLASS</span></code> only</span>
<span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; </span><span class="comment">require a non-match rather than a match</span>
&lt;<span class="cwebmacro">Extract the character class to match from the pattern</span> <span class="cwebmacronumber">11.2</span>&gt;<span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">, </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="comment">minimum and maximum number of repetitions</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">greedy</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="comment">go for a maximal-length match if possible</span>
&lt;<span class="cwebmacro">Extract repetition markers from the pattern</span> <span class="cwebmacronumber">11.3</span>&gt;<span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">reps</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
&lt;<span class="cwebmacro">Count how many repetitions can be made here</span> <span class="cwebmacronumber">11.4</span>&gt;<span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">reps</span><span class="plain"> &lt; </span><span class="identifier">rep_from</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> -1;</span>
<span class="comment">we can now accept anything from <code class="display"><span class="extract">rep_from</span></code> to <code class="display"><span class="extract">reps</span></code> repetitions</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">rep_from</span><span class="plain"> == </span><span class="identifier">reps</span><span class="plain">) { </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> += </span><span class="identifier">reps</span><span class="plain">; </span><span class="reserved">continue</span><span class="plain">; }</span>
&lt;<span class="cwebmacro">Try all possible match lengths until we find a match</span> <span class="cwebmacronumber">11.5</span>&gt;<span class="plain">;</span>
<span class="comment">no match length worked, so no match</span>
<span class="reserved">return</span><span class="plain"> -1;</span>
<span class="plain">}</span>
&lt;<span class="cwebmacro">Copy the bracketed texts found into the global strings</span> <span class="cwebmacronumber">11.6</span>&gt;<span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">tpos</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::match_r is used in <a href="#SP10">&#167;10</a>, <a href="#SP11_5">&#167;11.5</a>, <a href="#SP14">&#167;14</a>.</p>
<p class="inwebparagraph"><a id="SP11_1"></a><b>&#167;11.1. </b><code class="display">
&lt;<span class="cwebmacrodefn">Parentheses in the match pattern set up substrings to extract</span> <span class="cwebmacronumber">11.1</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'('</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> &lt; </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] = -1;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> &lt; </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) {</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">;</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">] = -1;</span>
<span class="plain">}</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">++; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">++; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
<span class="reserved">continue</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">')'</span><span class="plain">) {</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">--;</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> &gt;= </span><span class="constant">0</span><span class="plain">) &amp;&amp; (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> &lt; </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">) &amp;&amp; (</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">] &gt;= </span><span class="constant">0</span><span class="plain">))</span>
<span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bracket_nesting</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bl</span><span class="plain">]] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">-1;</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
<span class="reserved">continue</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP11_2"></a><b>&#167;11.2. </b><code class="display">
&lt;<span class="cwebmacrodefn">Extract the character class to match from the pattern</span> <span class="cwebmacronumber">11.2</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">len</span><span class="plain">;</span>
<span class="identifier">chcl</span><span class="plain"> = </span><span class="functiontext">Regexp::get_cclass</span><span class="plain">(</span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">, &amp;</span><span class="identifier">len</span><span class="plain">, &amp;</span><span class="identifier">range_from</span><span class="plain">, &amp;</span><span class="identifier">range_to</span><span class="plain">, &amp;</span><span class="identifier">reverse</span><span class="plain">);</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> += </span><span class="identifier">len</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP11_3"></a><b>&#167;11.3. </b>This is standard regular-expression notation, except that I haven't bothered
to implement numeric repetition counts, which we won't need:
</p>
<p class="macrodefinition"><code class="display">
&lt;<span class="cwebmacrodefn">Extract repetition markers from the pattern</span> <span class="cwebmacronumber">11.3</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">chcl</span><span class="plain"> == </span><span class="constant">WHITESPACE_CLASS</span><span class="plain">) {</span>
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'+'</span><span class="plain">) {</span>
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'*'</span><span class="plain">) {</span>
<span class="identifier">rep_from</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">rep_to</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">)-</span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">] == </span><span class="character">'?'</span><span class="plain">) { </span><span class="identifier">greedy</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; </span><span class="identifier">at</span><span class="plain">.</span><span class="element">ppos</span><span class="plain">++; }</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP11_4"></a><b>&#167;11.4. </b><code class="display">
&lt;<span class="cwebmacrodefn">Count how many repetitions can be made here</span> <span class="cwebmacronumber">11.4</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">reps</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; ((</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">+</span><span class="identifier">reps</span><span class="plain">)) &amp;&amp; (</span><span class="identifier">reps</span><span class="plain"> &lt; </span><span class="identifier">rep_to</span><span class="plain">)); </span><span class="identifier">reps</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::test_cclass</span><span class="plain">(</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">+</span><span class="identifier">reps</span><span class="plain">), </span><span class="identifier">chcl</span><span class="plain">,</span>
<span class="identifier">range_from</span><span class="plain">, </span><span class="identifier">range_to</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">reverse</span><span class="plain">) == </span><span class="constant">FALSE</span><span class="plain">)</span>
<span class="reserved">break</span><span class="plain">;</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP11_5"></a><b>&#167;11.5. </b><code class="display">
&lt;<span class="cwebmacrodefn">Try all possible match lengths until we find a match</span> <span class="cwebmacronumber">11.5</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">rep_from</span><span class="plain">, </span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">reps</span><span class="plain">, </span><span class="identifier">dj</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">, </span><span class="identifier">from_tpos</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">greedy</span><span class="plain">) { </span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">reps</span><span class="plain">; </span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">rep_from</span><span class="plain">; </span><span class="identifier">dj</span><span class="plain"> = -1; }</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">from</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> != </span><span class="identifier">to</span><span class="plain">+</span><span class="identifier">dj</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> += </span><span class="identifier">dj</span><span class="plain">) {</span>
<span class="identifier">at</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">from_tpos</span><span class="plain"> + </span><span class="identifier">j</span><span class="plain">;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">try</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &amp;</span><span class="identifier">at</span><span class="plain">, </span><span class="identifier">allow_partial</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">try</span><span class="plain"> &gt;= </span><span class="constant">0</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">try</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP11_6"></a><b>&#167;11.6. </b><code class="display">
&lt;<span class="cwebmacrodefn">Copy the bracketed texts found into the global strings</span> <span class="cwebmacronumber">11.6</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">mr</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="functiontext">Str::clear</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]);</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain"> &lt;= </span><span class="identifier">at</span><span class="plain">.</span><span class="identifier">brackets_end</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain">++)</span>
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">mr</span><span class="plain">-&gt;</span><span class="identifier">exp</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">], </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">));</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">exp_at</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">brackets_start</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
<span class="plain">}</span>
<span class="identifier">mr</span><span class="plain">-&gt;</span><span class="element">no_matched_texts</span><span class="plain"> = </span><span class="identifier">at</span><span class="plain">.</span><span class="element">bc</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP11">&#167;11</a>.</p>
<p class="inwebparagraph"><a id="SP12"></a><b>&#167;12. </b>So then: most characters in the pattern are taken literally (if the pattern
says <code class="display"><span class="extract">q</span></code>, the only match is with a lower-case letter "q"), except that:
</p>
<p class="inwebparagraph"></p>
<ul class="items"><li>(a) a space means "one or more characters of white space";
</li><li>(b) <code class="display"><span class="extract">%d</span></code> means any decimal digit;
</li><li>(c) <code class="display"><span class="extract">%c</span></code> means any character at all;
</li><li>(d) <code class="display"><span class="extract">%C</span></code> means any character which isn't white space;
</li><li>(e) <code class="display"><span class="extract">%i</span></code> means any character from the identifier class (see above);
</li><li>(f) <code class="display"><span class="extract">%p</span></code> means any character which can be used in the name of a Preform
nonterminal, which is to say, an identifier character or a hyphen;
</li><li>(g) <code class="display"><span class="extract">%P</span></code> means the same or else a colon;
</li><li>(h) <code class="display"><span class="extract">%t</span></code> means a tab;
</li><li>(i) <code class="display"><span class="extract">%q</span></code> means a double-quote.
</li></ul>
<p class="inwebparagraph"><code class="display"><span class="extract">%</span></code> otherwise makes a literal escape; a space means any whitespace character;
square brackets enclose literal alternatives, and note as usual with grep
engines that <code class="display"><span class="extract">[]xyz]</span></code> is legal and makes a set of four possibilities, the
first of which is a literal close square; within a set, a hyphen makes a
character range; an initial <code class="display"><span class="extract">^</span></code> negates the result; and otherwise everything
is literal.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">ANY_CLASS</span><span class="plain"> </span><span class="constant">1</span>
<span class="definitionkeyword">define</span> <span class="constant">DIGIT_CLASS</span><span class="plain"> </span><span class="constant">2</span>
<span class="definitionkeyword">define</span> <span class="constant">WHITESPACE_CLASS</span><span class="plain"> </span><span class="constant">3</span>
<span class="definitionkeyword">define</span> <span class="constant">NONWHITESPACE_CLASS</span><span class="plain"> </span><span class="constant">4</span>
<span class="definitionkeyword">define</span> <span class="constant">IDENTIFIER_CLASS</span><span class="plain"> </span><span class="constant">5</span>
<span class="definitionkeyword">define</span> <span class="constant">PREFORM_CLASS</span><span class="plain"> </span><span class="constant">6</span>
<span class="definitionkeyword">define</span> <span class="constant">PREFORMC_CLASS</span><span class="plain"> </span><span class="constant">7</span>
<span class="definitionkeyword">define</span> <span class="constant">LITERAL_CLASS</span><span class="plain"> </span><span class="constant">8</span>
<span class="definitionkeyword">define</span> <span class="constant">TAB_CLASS</span><span class="plain"> </span><span class="constant">9</span>
<span class="definitionkeyword">define</span> <span class="constant">QUOTE_CLASS</span><span class="plain"> </span><span class="constant">10</span>
</pre>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::get_cclass</span><span class="plain">(</span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">ppos</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">len</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">to</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> *</span><span class="identifier">reverse</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">] == </span><span class="character">'^'</span><span class="plain">) { </span><span class="identifier">ppos</span><span class="plain">++; *</span><span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; } </span><span class="reserved">else</span><span class="plain"> { *</span><span class="identifier">reverse</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">; }</span>
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'%'</span><span class="plain">:</span>
<span class="identifier">ppos</span><span class="plain">++;</span>
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">2</span><span class="plain">;</span>
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'d'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">DIGIT_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'c'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">ANY_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'C'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">NONWHITESPACE_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'i'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">IDENTIFIER_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'p'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">PREFORM_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'P'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">PREFORMC_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'q'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">QUOTE_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'t'</span><span class="plain">: </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TAB_CLASS</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">*</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; *</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">'['</span><span class="plain">:</span>
<span class="plain">*</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">+1;</span>
<span class="identifier">ppos</span><span class="plain"> += </span><span class="constant">2</span><span class="plain">;</span>
<span class="reserved">while</span><span class="plain"> ((</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">]) &amp;&amp; (</span><span class="identifier">pattern</span><span class="plain">[</span><span class="identifier">ppos</span><span class="plain">] != </span><span class="character">']'</span><span class="plain">)) </span><span class="identifier">ppos</span><span class="plain">++;</span>
<span class="plain">*</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain"> - </span><span class="constant">1</span><span class="plain">; *</span><span class="identifier">len</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain"> - *</span><span class="identifier">from</span><span class="plain"> + </span><span class="constant">2</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="character">' '</span><span class="plain">:</span>
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">WHITESPACE_CLASS</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">*</span><span class="identifier">len</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">; *</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; *</span><span class="identifier">to</span><span class="plain"> = </span><span class="identifier">ppos</span><span class="plain">; </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">LITERAL_CLASS</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::get_cclass is used in <a href="#SP11_2">&#167;11.2</a>.</p>
<p class="inwebparagraph"><a id="SP13"></a><b>&#167;13. </b></p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::test_cclass</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">chcl</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">range_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">range_to</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">drawn_from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">reverse</span><span class="plain">) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">FALSE</span><span class="plain">;</span>
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">chcl</span><span class="plain">) {</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">ANY_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">DIGIT_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">isdigit</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">WHITESPACE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Characters::is_space_or_tab</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TAB_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'\t'</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">NONWHITESPACE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (!(</span><span class="functiontext">Characters::is_space_or_tab</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">QUOTE_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> != </span><span class="character">'\"'</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">IDENTIFIER_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Regexp::identifier_char</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">)) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">PREFORM_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'-'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'a'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'z'</span><span class="plain">)) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'0'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'9'</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">PREFORMC_CLASS:</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'-'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'_'</span><span class="plain">) || (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">':'</span><span class="plain">) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'a'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'z'</span><span class="plain">)) ||</span>
<span class="plain">((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="character">'0'</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="character">'9'</span><span class="plain">))) </span><span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">LITERAL_CLASS:</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">range_to</span><span class="plain"> &gt; </span><span class="identifier">range_from</span><span class="plain">) &amp;&amp; (</span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">range_from</span><span class="plain">] == </span><span class="character">'^'</span><span class="plain">)) {</span>
<span class="identifier">range_from</span><span class="plain">++; </span><span class="identifier">reverse</span><span class="plain"> = </span><span class="identifier">reverse</span><span class="plain">?</span><span class="identifier">FALSE:TRUE</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">range_from</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> &lt;= </span><span class="identifier">range_to</span><span class="plain">; </span><span class="identifier">j</span><span class="plain">++) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c1</span><span class="plain"> = </span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">], </span><span class="identifier">c2</span><span class="plain"> = </span><span class="identifier">c1</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">j</span><span class="plain">+1 &lt; </span><span class="identifier">range_to</span><span class="plain">) &amp;&amp; (</span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+1] == </span><span class="character">'-'</span><span class="plain">)) { </span><span class="identifier">c2</span><span class="plain"> = </span><span class="identifier">drawn_from</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">+2]; </span><span class="identifier">j</span><span class="plain"> += </span><span class="constant">2</span><span class="plain">; }</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> &gt;= </span><span class="identifier">c1</span><span class="plain">) &amp;&amp; (</span><span class="identifier">c</span><span class="plain"> &lt;= </span><span class="identifier">c2</span><span class="plain">)) {</span>
<span class="identifier">match</span><span class="plain"> = </span><span class="constant">TRUE</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="reserved">break</span><span class="plain">;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">reverse</span><span class="plain">) </span><span class="identifier">match</span><span class="plain"> = (</span><span class="identifier">match</span><span class="plain">)?</span><span class="identifier">FALSE:TRUE</span><span class="plain">;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">match</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::test_cclass is used in <a href="#SP11_4">&#167;11.4</a>.</p>
<p class="inwebparagraph"><a id="SP14"></a><b>&#167;14. Replacement. </b>And this routine conveniently handles searching and replacing. This time we
can match at substrings of the <code class="display"><span class="extract">text</span></code> (i.e., we are not forced to match
from the start right to the end), and multiple replacements can be made.
For example,
</p>
<pre class="display">
<span class="functiontext">Regexp::replace</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"[aeiou]"</span><span class="plain">, </span><span class="identifier">L</span><span class="string">"!"</span><span class="plain">, </span><span class="constant">REP_REPEATING</span><span class="plain">);</span>
</pre>
<p class="inwebparagraph">will turn the <code class="display"><span class="extract">text</span></code> "goose eggs" into "g!!s! !ggs".
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">REP_REPEATING</span><span class="plain"> </span><span class="constant">1</span>
<span class="definitionkeyword">define</span> <span class="constant">REP_ATSTART</span><span class="plain"> </span><span class="constant">2</span>
</pre>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Regexp::replace</span><span class="plain">(</span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">pattern</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">replacement</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">options</span><span class="plain">) {</span>
<span class="identifier">TEMPORARY_TEXT</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">);</span>
<span class="reserved">match_results</span><span class="plain"> </span><span class="identifier">mr</span><span class="plain"> = </span><span class="functiontext">Regexp::create_mr</span><span class="plain">();</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">changes</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0, </span><span class="identifier">L</span><span class="plain">=</span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="identifier">L</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="reserved">match_position</span><span class="plain"> </span><span class="identifier">mp</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">tpos</span><span class="plain"> = </span><span class="identifier">i</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">ppos</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">bc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">mp</span><span class="plain">.</span><span class="element">bl</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
<span class="functiontext">Regexp::prepare</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">try</span><span class="plain"> = </span><span class="functiontext">Regexp::match_r</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">, </span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">pattern</span><span class="plain">, &amp;</span><span class="identifier">mp</span><span class="plain">, </span><span class="constant">TRUE</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">try</span><span class="plain"> &gt;= </span><span class="constant">0</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">replacement</span><span class="plain">)</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">=0; </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]; </span><span class="identifier">j</span><span class="plain">++) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain"> = </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">];</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'%'</span><span class="plain">) {</span>
<span class="identifier">j</span><span class="plain">++;</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ind</span><span class="plain"> = </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] - </span><span class="character">'0'</span><span class="plain">;</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">ind</span><span class="plain"> &gt;= </span><span class="constant">0</span><span class="plain">) &amp;&amp; (</span><span class="identifier">ind</span><span class="plain"> &lt; </span><span class="constant">MAX_BRACKETED_SUBEXPRESSIONS</span><span class="plain">))</span>
<span class="identifier">WRITE_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="string">"%S"</span><span class="plain">, </span><span class="identifier">mr</span><span class="plain">.</span><span class="element">exp</span><span class="plain">[</span><span class="identifier">ind</span><span class="plain">]);</span>
<span class="reserved">else</span>
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]);</span>
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="identifier">replacement</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">]);</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">left</span><span class="plain"> = </span><span class="identifier">L</span><span class="plain"> - </span><span class="identifier">try</span><span class="plain">;</span>
<span class="identifier">changes</span><span class="plain">++;</span>
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="identifier">L</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">); </span><span class="identifier">i</span><span class="plain"> = </span><span class="identifier">L</span><span class="plain">-</span><span class="identifier">left</span><span class="plain">-1;</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">options</span><span class="plain"> &amp; </span><span class="constant">REP_REPEATING</span><span class="plain">) == </span><span class="constant">0</span><span class="plain">) { </span>&lt;<span class="cwebmacro">Add the rest</span> <span class="cwebmacronumber">14.1</span>&gt;<span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
<span class="reserved">continue</span><span class="plain">;</span>
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">));</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">options</span><span class="plain"> &amp; </span><span class="constant">REP_ATSTART</span><span class="plain">) { </span>&lt;<span class="cwebmacro">Add the rest</span> <span class="cwebmacronumber">14.1</span>&gt;<span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
<span class="plain">}</span>
<span class="functiontext">Regexp::dispose_of</span><span class="plain">(&amp;</span><span class="identifier">mr</span><span class="plain">);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">changes</span><span class="plain"> &gt; </span><span class="constant">0</span><span class="plain">) </span><span class="functiontext">Str::copy</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">altered</span><span class="plain">);</span>
<span class="identifier">DISCARD_TEXT</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">changes</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function Regexp::replace appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP14_1"></a><b>&#167;14.1. </b><code class="display">
&lt;<span class="cwebmacrodefn">Add the rest</span> <span class="cwebmacronumber">14.1</span>&gt; =
</code></p>
<pre class="displaydefn">
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">++; </span><span class="identifier">i</span><span class="plain">&lt;</span><span class="identifier">L</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="identifier">PUT_TO</span><span class="plain">(</span><span class="identifier">altered</span><span class="plain">, </span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">text</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">));</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">This code is used in <a href="#SP14">&#167;14</a> (twice).</p>
<hr class="tocbar">
<ul class="toc"><li><a href="4-taa.html">Back to 'Tries and Avinues'</a></li><li><i>(This section ends Chapter 4: Text Handling.)</i></li></ul><hr class="tocbar">
<!--End of weave-->
</main>
</body>
</html>