538 lines
67 KiB
HTML
538 lines
67 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>Tries and Avinues</title>
|
|
<meta name="viewport" content="width=device-width initial-scale=1">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<meta http-equiv="Content-Language" content="en-gb">
|
|
<link href="../inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
|
|
</head>
|
|
<body>
|
|
<nav role="navigation">
|
|
<h1><a href="../webs.html">Sources</a></h1>
|
|
<ul>
|
|
<li><a href="../inweb/index.html">inweb</a></li>
|
|
</ul>
|
|
<h2>Foundation</h2>
|
|
<ul>
|
|
<li><a href="../foundation-module/index.html">foundation-module</a></li>
|
|
<li><a href="../foundation-test/index.html">foundation-test</a></li>
|
|
</ul>
|
|
|
|
|
|
</nav>
|
|
<main role="main">
|
|
|
|
<!--Weave of 'Tries and Avinues' generated by 7-->
|
|
<ul class="crumbs"><li><a href="../webs.html">Source</a></li><li><a href="index.html">foundation</a></li><li><a href="index.html#4">Chapter 4: Text Handling</a></li><li><b>Tries and Avinues</b></li></ul><p class="purpose">To examine heads and tails of text, to see how it may inflect.</p>
|
|
|
|
<ul class="toc"><li><a href="#SP1">§1. Tries</a></li><li><a href="#SP5">§5. Avinues</a></li><li><a href="#SP9">§9. Logging</a></li></ul><hr class="tocbar">
|
|
|
|
<p class="inwebparagraph"><a id="SP1"></a><b>§1. Tries. </b>The standard data structure for searches through possible prefixes or
|
|
suffixes is a "trie". The term goes back to Edward Fredkin in 1961;
|
|
some pronounce it "try" and some "tree", and either would be a fair
|
|
description. Like hash tables, tries are a means of minimising string
|
|
comparisons when sorting through possible outcomes based on a text.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">The trie is a tree with three kinds of node:
|
|
</p>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<ul class="items"><li>(a) "Heads". Every trie has exactly one such node, and it's always the root.
|
|
There are two versions of this: a start head represents matching from the
|
|
front of a text, whereas an end head represents matching from the back.
|
|
</li></ul>
|
|
<ul class="items"><li>(b) "Choices". A choice node has a given match character, say an "f", and
|
|
represents which node to go to next if this is the current character in the
|
|
text. It must either be a valid Unicode character or <code class="display"><span class="extract">TRIE_ANYTHING</span></code>, which
|
|
is a wildcard representing "any text of any length here". Since a choice
|
|
must always lead somewhere, <code class="display"><span class="extract">on_success</span></code> must point to another node.
|
|
There can be any number of choices at a given position, so choice nodes
|
|
are always organised in linked lists joined by <code class="display"><span class="extract">next</span></code>.
|
|
</li></ul>
|
|
<ul class="items"><li>(c) "Terminals", always leaves, which have match character set to the
|
|
impossible value <code class="display"><span class="extract">TRIE_STOP</span></code>, and for which <code class="display"><span class="extract">match_outcome</span></code> is non-null; thus,
|
|
different terminal nodes can result in different outcomes if they are ever
|
|
reached at the end of a successful scan. A terminal node is always the only item
|
|
in a list.
|
|
</li></ul>
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_START</span><span class="plain"> -1 </span><span class="comment"> head: the root of a trie parsing forwards from the start</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_END</span><span class="plain"> -2 </span><span class="comment"> head: the root of a trie parsing backwards from the end</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_ANYTHING</span><span class="plain"> </span><span class="constant">10003</span><span class="plain"> </span><span class="comment"> choice: match any text here</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_ANY_GROUP</span><span class="plain"> </span><span class="constant">10001</span><span class="plain"> </span><span class="comment"> choice: match any character from this group</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_NOT_GROUP</span><span class="plain"> </span><span class="constant">10002</span><span class="plain"> </span><span class="comment"> choice: match any character not in this group</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">TRIE_STOP</span><span class="plain"> -3 </span><span class="comment"> terminal: here's the outcome</span>
|
|
<span class="definitionkeyword">define</span> <span class="constant">MAX_TRIE_GROUP_SIZE</span><span class="plain"> </span><span class="constant">26</span><span class="plain"> </span><span class="comment"> size of the allowable groups of characters</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_trie</span><span class="plain"> {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">match_character</span><span class="plain">; </span><span class="comment"> or one of the special cases above</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">group_characters</span><span class="plain">[</span><span class="constant">MAX_TRIE_GROUP_SIZE</span><span class="plain">+1];</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">match_outcome</span><span class="plain">;</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">on_success</span><span class="plain">;</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">next</span><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">match_trie</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure match_trie is accessed in 2/mmr and here.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2"></a><b>§2. </b>We have just one routine for extending and scanning the trie: it either
|
|
tries to find whether a text <code class="display"><span class="extract">p</span></code> leads to any outcome in the existing trie,
|
|
or else forcibly extends the existing trie to ensure that it does.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">It might look as if calling <code class="display"><span class="extract">Tries::search</span></code> always returns <code class="display"><span class="extract">add_outcome</span></code> when
|
|
this is set, but this isn't true: if the trie already contains a node
|
|
representing how to deal with <code class="display"><span class="extract">p</span></code>, we get whatever outcome is already
|
|
established.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">There are two motions to keep track of: our progress through the text <code class="display"><span class="extract">p</span></code>
|
|
being scanned, and our progress through the trie which tells us how to scan it.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">We scan the text either forwards or backwards, starting with the first or
|
|
last character and then working through, finishing with a 0 terminator.
|
|
(This is true even if working backwards: we pretend the character stored
|
|
before the text began is 0.) <code class="display"><span class="extract">i</span></code> represents the index of our current position
|
|
in <code class="display"><span class="extract">p</span></code>, and runs either from 0 up to <code class="display"><span class="extract">N</span></code> or from <code class="display"><span class="extract">N-1</span></code> down to <code class="display"><span class="extract">-1</span></code>,
|
|
where <code class="display"><span class="extract">N</span></code> is the number of characters in <code class="display"><span class="extract">p</span></code>.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">We scan the trie using a pair of pointers. <code class="display"><span class="extract">prev</span></code> is the last node we
|
|
successfully left, and <code class="display"><span class="extract">pos</span></code> is one we are currently at, which can be
|
|
either a terminal node or a choice node (in which case it's the head of
|
|
a linked list of such nodes).
|
|
</p>
|
|
|
|
|
|
<pre class="definitions">
|
|
<span class="definitionkeyword">define</span> <span class="constant">MAX_TRIE_REWIND</span><span class="plain"> </span><span class="constant">10</span><span class="plain"> </span><span class="comment"> that should be far, far more rewinding than necessary</span>
|
|
</pre>
|
|
|
|
<pre class="display">
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="functiontext">Tries::search</span><span class="plain">(</span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">T</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">add_outcome</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">T</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"no trie to search"</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">start</span><span class="plain">, </span><span class="identifier">endpoint</span><span class="plain">, </span><span class="identifier">delta</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Look at the root node of the trie, setting up the scan accordingly</span> <span class="cwebmacronumber">2.1</span>><span class="plain">;</span>
|
|
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">prev</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">, *</span><span class="identifier">pos</span><span class="plain"> = </span><span class="identifier">T</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Accept the current node of the trie</span> <span class="cwebmacronumber">2.4</span>><span class="plain">;</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rewind_sp</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">rewind_points</span><span class="plain">[</span><span class="constant">MAX_TRIE_REWIND</span><span class="plain">];</span>
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">rewind_positions</span><span class="plain">[</span><span class="constant">MAX_TRIE_REWIND</span><span class="plain">];</span>
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">rewind_prev_positions</span><span class="plain">[</span><span class="constant">MAX_TRIE_REWIND</span><span class="plain">];</span>
|
|
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain"> = </span><span class="identifier">start</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"> != </span><span class="identifier">endpoint</span><span class="plain">+</span><span class="identifier">delta</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"> += </span><span class="identifier">delta</span><span class="plain">) {</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> </span><span class="identifier">group</span><span class="plain">[</span><span class="constant">MAX_TRIE_GROUP_SIZE</span><span class="plain">+1];</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">g</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="comment"> size of group</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain"> = (</span><span class="identifier">i</span><span class="plain"><0)?0:(</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">, </span><span class="identifier">i</span><span class="plain">)); </span><span class="comment"> i.e., zero at the two ends of the text</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">c</span><span class="plain"> >= </span><span class="constant">0x20</span><span class="plain">) && (</span><span class="identifier">c</span><span class="plain"> <= </span><span class="constant">0x7f</span><span class="plain">)) </span><span class="identifier">c</span><span class="plain"> = </span><span class="functiontext">Characters::tolower</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">); </span><span class="comment"> normalise it within ASCII</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="constant">0x20</span><span class="plain">) { </span><span class="identifier">c</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">i</span><span class="plain"> = </span><span class="identifier">endpoint</span><span class="plain"> - </span><span class="identifier">delta</span><span class="plain">; } </span><span class="comment"> force any space to be equivalent to the final 0</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">add_outcome</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">pairc</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'<'</span><span class="plain">) </span><span class="identifier">pairc</span><span class="plain"> = </span><span class="character">'>'</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'>'</span><span class="plain">) </span><span class="identifier">pairc</span><span class="plain"> = </span><span class="character">'<'</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pairc</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">j</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">j</span><span class="plain"> = </span><span class="identifier">i</span><span class="plain">+</span><span class="identifier">delta</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> != </span><span class="identifier">endpoint</span><span class="plain">; </span><span class="identifier">j</span><span class="plain"> += </span><span class="identifier">delta</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ch</span><span class="plain"> = (</span><span class="identifier">j</span><span class="plain"><0)?0:(</span><span class="functiontext">Str::get_at</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">));</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ch</span><span class="plain"> == </span><span class="identifier">pairc</span><span class="plain">) </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">g</span><span class="plain"> > </span><span class="constant">MAX_TRIE_GROUP_SIZE</span><span class="plain">) { </span><span class="identifier">g</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="reserved">break</span><span class="plain">; }</span>
|
|
<span class="identifier">group</span><span class="plain">[</span><span class="identifier">g</span><span class="plain">++] = </span><span class="identifier">ch</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">group</span><span class="plain">[</span><span class="identifier">g</span><span class="plain">] = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">g</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) </span><span class="identifier">i</span><span class="plain"> = </span><span class="identifier">j</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'*'</span><span class="plain">) </span><span class="identifier">endpoint</span><span class="plain"> -= </span><span class="identifier">delta</span><span class="plain">;</span>
|
|
|
|
<span class="identifier">RewindHere:</span>
|
|
<<span class="cwebmacro">Look through the possible exits from this position and move on if any match</span> <span class="cwebmacronumber">2.2</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">add_outcome</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">rewind_sp</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) {</span>
|
|
<span class="identifier">i</span><span class="plain"> = </span><span class="identifier">rewind_points</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">-1];</span>
|
|
<span class="identifier">pos</span><span class="plain"> = </span><span class="identifier">rewind_positions</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">-1];</span>
|
|
<span class="identifier">prev</span><span class="plain"> = </span><span class="identifier">rewind_prev_positions</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">-1];</span>
|
|
<span class="identifier">rewind_sp</span><span class="plain">--;</span>
|
|
<span class="reserved">goto</span><span class="plain"> </span><span class="identifier">RewindHere</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">NULL</span><span class="plain">; </span><span class="comment"> failure!</span>
|
|
<span class="plain">}</span>
|
|
<<span class="cwebmacro">We have run out of trie and must create a new exit to continue</span> <span class="cwebmacronumber">2.3</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">pos</span><span class="plain">) && (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANYTHING</span><span class="plain">)) </span><<span class="cwebmacro">Accept the current node of the trie</span> <span class="cwebmacronumber">2.4</span>><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">pos</span><span class="plain">) && (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain">)) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain">; </span><span class="comment"> success!</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">add_outcome</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">NULL</span><span class="plain">; </span><span class="comment"> failure!</span>
|
|
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">)</span>
|
|
<<span class="cwebmacro">We failed by running out of trie, so we must add a terminal node to make this string acceptable</span> <span class="cwebmacronumber">2.5</span>>
|
|
<span class="reserved">else</span>
|
|
<<span class="cwebmacro">We failed by finishing at a non-terminal node, so we must add an outcome</span> <span class="cwebmacronumber">2.6</span>><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::search is used in <a href="#SP6">§6</a>, <a href="#SP8">§8</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_1"></a><b>§2.1. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Look at the root node of the trie, setting up the scan accordingly</span> <span class="cwebmacronumber">2.1</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">start</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">endpoint</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">); </span><span class="identifier">delta</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">T</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_END</span><span class="plain">) { </span><span class="identifier">start</span><span class="plain"> = </span><span class="functiontext">Str::len</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">)-1; </span><span class="identifier">endpoint</span><span class="plain"> = -1; </span><span class="identifier">delta</span><span class="plain"> = -1; }</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_2"></a><b>§2.2. </b>In general trie searches can be made more efficient if the trie is shuffled
|
|
so that the most recently matched exit in the list if moved to the top, as
|
|
this tends to make commonly used exits migrate upwards and rarities downwards.
|
|
But we aren't going to search these tries anything like intensively enough
|
|
to make it worth the trouble.
|
|
</p>
|
|
|
|
<p class="inwebparagraph">(The following cannot be a <code class="display"><span class="extract">while</span></code> loop since C does not allow us to <code class="display"><span class="extract">break</span></code>
|
|
or <code class="display"><span class="extract">continue</span></code> out of an outer loop from an inner one.)
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">Look through the possible exits from this position and move on if any match</span> <span class="cwebmacronumber">2.2</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">ambig</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">, </span><span class="identifier">unambig</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">;</span>
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">point</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">point</span><span class="plain"> = </span><span class="identifier">pos</span><span class="plain">; </span><span class="identifier">point</span><span class="plain">; </span><span class="identifier">point</span><span class="plain"> = </span><span class="identifier">point</span><span class="plain">-></span><span class="element">next</span><span class="plain">)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Tries::is_ambiguous</span><span class="plain">(</span><span class="identifier">point</span><span class="plain">)) </span><span class="identifier">ambig</span><span class="plain">++;</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="identifier">unambig</span><span class="plain">++;</span>
|
|
|
|
<span class="identifier">FauxWhileLoop:</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">add_outcome</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) || (</span><span class="functiontext">Tries::is_ambiguous</span><span class="plain">(</span><span class="identifier">pos</span><span class="plain">) == </span><span class="constant">FALSE</span><span class="plain">))</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="functiontext">Tries::matches</span><span class="plain">(</span><span class="identifier">pos</span><span class="plain">, </span><span class="identifier">c</span><span class="plain">)) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANYTHING</span><span class="plain">) </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">add_outcome</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) && (</span><span class="identifier">ambig</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) && (</span><span class="identifier">ambig</span><span class="plain">+</span><span class="identifier">unambig</span><span class="plain"> > </span><span class="constant">1</span><span class="plain">)</span>
|
|
<span class="plain">&& (</span><span class="identifier">rewind_sp</span><span class="plain"> < </span><span class="constant">MAX_TRIE_REWIND</span><span class="plain">)) {</span>
|
|
<span class="identifier">rewind_points</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">] = </span><span class="identifier">i</span><span class="plain">;</span>
|
|
<span class="identifier">rewind_positions</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">] = </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">next</span><span class="plain">;</span>
|
|
<span class="identifier">rewind_prev_positions</span><span class="plain">[</span><span class="identifier">rewind_sp</span><span class="plain">] = </span><span class="identifier">prev</span><span class="plain">;</span>
|
|
<span class="identifier">rewind_sp</span><span class="plain">++;</span>
|
|
<span class="plain">}</span>
|
|
<<span class="cwebmacro">Accept the current node of the trie</span> <span class="cwebmacronumber">2.4</span>><span class="plain">;</span>
|
|
<span class="reserved">continue</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">pos</span><span class="plain"> = </span><span class="identifier">pos</span><span class="plain">-></span><span class="identifier">next</span><span class="plain">;</span>
|
|
<span class="reserved">goto</span><span class="plain"> </span><span class="identifier">FauxWhileLoop</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_3"></a><b>§2.3. </b><code class="display">
|
|
<<span class="cwebmacrodefn">We have run out of trie and must create a new exit to continue</span> <span class="cwebmacronumber">2.3</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">new_pos</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">g</span><span class="plain"> > </span><span class="constant">0</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">nt</span><span class="plain"> = </span><span class="constant">TRIE_ANY_GROUP</span><span class="plain">;</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain"> = </span><span class="identifier">group</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">group</span><span class="plain">[0] == </span><span class="character">'!'</span><span class="plain">) { </span><span class="identifier">from</span><span class="plain">++; </span><span class="identifier">nt</span><span class="plain"> = </span><span class="constant">TRIE_NOT_GROUP</span><span class="plain">; }</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">group</span><span class="plain">[(</span><span class="reserved">int</span><span class="plain">) </span><span class="functiontext">Wide::len</span><span class="plain">(</span><span class="identifier">group</span><span class="plain">)-1] == </span><span class="character">'!'</span><span class="plain">) {</span>
|
|
<span class="identifier">group</span><span class="plain">[(</span><span class="reserved">int</span><span class="plain">) </span><span class="functiontext">Wide::len</span><span class="plain">(</span><span class="identifier">group</span><span class="plain">)-1] = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">nt</span><span class="plain"> = </span><span class="constant">TRIE_NOT_GROUP</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="identifier">new_pos</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="identifier">nt</span><span class="plain">);</span>
|
|
<span class="identifier">wcscpy</span><span class="plain">(</span><span class="identifier">new_pos</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">, </span><span class="identifier">from</span><span class="plain">);</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> </span><span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="character">'*'</span><span class="plain">) </span><span class="identifier">new_pos</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="constant">TRIE_ANYTHING</span><span class="plain">);</span>
|
|
<span class="reserved">else</span><span class="plain"> </span><span class="identifier">new_pos</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="identifier">c</span><span class="plain">);</span>
|
|
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">prev</span><span class="plain">-></span><span class="identifier">on_success</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain"> = </span><span class="identifier">new_pos</span><span class="plain">;</span>
|
|
<span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">ppoint</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">, *</span><span class="identifier">point</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">point</span><span class="plain"> = </span><span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">; </span><span class="identifier">point</span><span class="plain">; </span><span class="identifier">ppoint</span><span class="plain"> = </span><span class="identifier">point</span><span class="plain">, </span><span class="identifier">point</span><span class="plain"> = </span><span class="identifier">point</span><span class="plain">-></span><span class="element">next</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">new_pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> < </span><span class="identifier">point</span><span class="plain">-></span><span class="element">match_character</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">ppoint</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) {</span>
|
|
<span class="identifier">new_pos</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">;</span>
|
|
<span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain"> = </span><span class="identifier">new_pos</span><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
|
|
<span class="identifier">ppoint</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">new_pos</span><span class="plain">;</span>
|
|
<span class="identifier">new_pos</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">point</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">point</span><span class="plain">-></span><span class="identifier">next</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) {</span>
|
|
<span class="identifier">point</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">new_pos</span><span class="plain">;</span>
|
|
<span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="identifier">pos</span><span class="plain"> = </span><span class="identifier">new_pos</span><span class="plain">;</span>
|
|
<<span class="cwebmacro">Accept the current node of the trie</span> <span class="cwebmacronumber">2.4</span>><span class="plain">; </span><span class="reserved">continue</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_4"></a><b>§2.4. </b><code class="display">
|
|
<<span class="cwebmacrodefn">Accept the current node of the trie</span> <span class="cwebmacronumber">2.4</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"trie invariant broken"</span><span class="plain">);</span>
|
|
<span class="identifier">prev</span><span class="plain"> = </span><span class="identifier">pos</span><span class="plain">; </span><span class="identifier">pos</span><span class="plain"> = </span><span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a> (twice), <a href="#SP2_2">§2.2</a>, <a href="#SP2_3">§2.3</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_5"></a><b>§2.5. </b>If <code class="display"><span class="extract">pos</span></code> is <code class="display"><span class="extract">NULL</span></code> then it follows that <code class="display"><span class="extract">prev->on_success</span></code> is <code class="display"><span class="extract">NULL</span></code>, since
|
|
this is how <code class="display"><span class="extract">pos</span></code> was calculated; so to add a new terminal node we simply add
|
|
it there.
|
|
</p>
|
|
|
|
|
|
<p class="macrodefinition"><code class="display">
|
|
<<span class="cwebmacrodefn">We failed by running out of trie, so we must add a terminal node to make this string acceptable</span> <span class="cwebmacronumber">2.5</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="constant">TRIE_STOP</span><span class="plain">);</span>
|
|
<span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain"> = </span><span class="identifier">add_outcome</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">add_outcome</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP2_6"></a><b>§2.6. </b><code class="display">
|
|
<<span class="cwebmacrodefn">We failed by finishing at a non-terminal node, so we must add an outcome</span> <span class="cwebmacronumber">2.6</span>> =
|
|
</code></p>
|
|
|
|
|
|
<pre class="displaydefn">
|
|
<span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="constant">TRIE_STOP</span><span class="plain">);</span>
|
|
<span class="identifier">prev</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain"> = </span><span class="identifier">add_outcome</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">add_outcome</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">This code is used in <a href="#SP2">§2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP3"></a><b>§3. </b>Single nodes are matched thus:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Tries::matches</span><span class="plain">(</span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">pos</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">c</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANYTHING</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANY_GROUP</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">k</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">k</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">[</span><span class="identifier">k</span><span class="plain">]; </span><span class="identifier">k</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">[</span><span class="identifier">k</span><span class="plain">])</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_NOT_GROUP</span><span class="plain">) {</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">k</span><span class="plain">;</span>
|
|
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">k</span><span class="plain"> = </span><span class="constant">0</span><span class="plain">; </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">[</span><span class="identifier">k</span><span class="plain">]; </span><span class="identifier">k</span><span class="plain">++)</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">c</span><span class="plain"> == </span><span class="identifier">pos</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">[</span><span class="identifier">k</span><span class="plain">])</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="identifier">c</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">Tries::is_ambiguous</span><span class="plain">(</span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">pos</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANYTHING</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_ANY_GROUP</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">pos</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> == </span><span class="constant">TRIE_NOT_GROUP</span><span class="plain">) </span><span class="reserved">return</span><span class="plain"> </span><span class="constant">TRUE</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="constant">FALSE</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::matches is used in <a href="#SP2_2">§2.2</a>.</p>
|
|
|
|
<p class="endnote">The function Tries::is_ambiguous is used in <a href="#SP2_2">§2.2</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP4"></a><b>§4. </b>Where:
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">mc</span><span class="plain">) {</span>
|
|
<span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">T</span><span class="plain"> = </span><span class="identifier">CREATE</span><span class="plain">(</span><span class="reserved">match_trie</span><span class="plain">);</span>
|
|
<span class="identifier">T</span><span class="plain">-></span><span class="element">match_character</span><span class="plain"> = </span><span class="identifier">mc</span><span class="plain">;</span>
|
|
<span class="identifier">T</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">T</span><span class="plain">-></span><span class="element">on_success</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">T</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">T</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::new is used in <a href="#SP2_3">§2.3</a>, <a href="#SP2_5">§2.5</a>, <a href="#SP2_6">§2.6</a>, <a href="#SP6">§6</a>.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP5"></a><b>§5. Avinues. </b>A trie is only a limited form of finite state machine. We're not going to need
|
|
the whole power of these, but we do find it useful to chain a series of tries
|
|
together. The idea is to scan against one trie, then, if there's no result,
|
|
start again with the next, and so on. Inform therefore often matches text
|
|
against a linked list of tries: we'll call that an "avinue".
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_avinue</span><span class="plain"> {</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">the_trie</span><span class="plain">;</span>
|
|
<span class="reserved">struct</span><span class="plain"> </span><span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">next</span><span class="plain">;</span>
|
|
<span class="plain">} </span><span class="reserved">match_avinue</span><span class="plain">;</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The structure match_avinue is accessed in 2/mmr and here.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP6"></a><b>§6. </b>An avinue starts out with a single trie, which itself has just a single
|
|
head node (of either sort).
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="functiontext">Tries::new_avinue</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">from_start</span><span class="plain">) {</span>
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">A</span><span class="plain"> = </span><span class="identifier">CREATE</span><span class="plain">(</span><span class="reserved">match_avinue</span><span class="plain">);</span>
|
|
<span class="identifier">A</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">A</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain"> = </span><span class="functiontext">Tries::new</span><span class="plain">(</span><span class="identifier">from_start</span><span class="plain">);</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">A</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Tries::add_to_avinue</span><span class="plain">(</span><span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">mt</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain">, </span><span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">to</span><span class="plain">) {</span>
|
|
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">mt</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) || (</span><span class="identifier">mt</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">)) </span><span class="identifier">internal_error</span><span class="plain">(</span><span class="string">"null trie"</span><span class="plain">);</span>
|
|
<span class="functiontext">Tries::search</span><span class="plain">(</span><span class="identifier">mt</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain">, </span><span class="identifier">from</span><span class="plain">, </span><span class="identifier">to</span><span class="plain">);</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::new_avinue appears nowhere else.</p>
|
|
|
|
<p class="endnote">The function Tries::add_to_avinue appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP7"></a><b>§7. </b>The following duplicates an avinue, pointing to the same sequence of
|
|
tries.
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="functiontext">Tries::duplicate_avinue</span><span class="plain">(</span><span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">A</span><span class="plain">) {</span>
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">F</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">, *</span><span class="identifier">FL</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">A</span><span class="plain">) {</span>
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">FN</span><span class="plain"> = </span><span class="identifier">CREATE</span><span class="plain">(</span><span class="reserved">match_avinue</span><span class="plain">);</span>
|
|
<span class="identifier">FN</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="identifier">FN</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain"> = </span><span class="identifier">A</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain">;</span>
|
|
<span class="identifier">A</span><span class="plain"> = </span><span class="identifier">A</span><span class="plain">-></span><span class="element">next</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">FL</span><span class="plain">) </span><span class="identifier">FL</span><span class="plain">-></span><span class="element">next</span><span class="plain"> = </span><span class="identifier">FN</span><span class="plain">;</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">F</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) </span><span class="identifier">F</span><span class="plain"> = </span><span class="identifier">FN</span><span class="plain">;</span>
|
|
<span class="identifier">FL</span><span class="plain"> = </span><span class="identifier">FN</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">F</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::duplicate_avinue appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP8"></a><b>§8. </b>As noted above, searching an avinue is a matter of searching with each
|
|
trie in turn until one matches (if it does).
|
|
</p>
|
|
|
|
<pre class="display">
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="functiontext">Tries::search_avinue</span><span class="plain">(</span><span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">T</span><span class="plain">, </span><span class="reserved">text_stream</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">) {</span>
|
|
<span class="identifier">wchar_t</span><span class="plain"> *</span><span class="identifier">result</span><span class="plain"> = </span><span class="identifier">NULL</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> ((</span><span class="identifier">T</span><span class="plain">) && (</span><span class="identifier">result</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">)) {</span>
|
|
<span class="identifier">result</span><span class="plain"> = </span><span class="functiontext">Tries::search</span><span class="plain">(</span><span class="identifier">T</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain">, </span><span class="identifier">p</span><span class="plain">, </span><span class="identifier">NULL</span><span class="plain">);</span>
|
|
<span class="identifier">T</span><span class="plain"> = </span><span class="identifier">T</span><span class="plain">-></span><span class="element">next</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">result</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::search_avinue appears nowhere else.</p>
|
|
|
|
<p class="inwebparagraph"><a id="SP9"></a><b>§9. Logging. </b></p>
|
|
|
|
<pre class="display">
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Tries::log_avinue</span><span class="plain">(</span><span class="constant">OUTPUT_STREAM</span><span class="plain">, </span><span class="reserved">void</span><span class="plain"> *</span><span class="identifier">vA</span><span class="plain">) {</span>
|
|
<span class="reserved">match_avinue</span><span class="plain"> *</span><span class="identifier">A</span><span class="plain"> = (</span><span class="reserved">match_avinue</span><span class="plain"> *) </span><span class="identifier">vA</span><span class="plain">;</span>
|
|
<span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Avinue:\n"</span><span class="plain">); </span><span class="constant">INDENT</span><span class="plain">;</span>
|
|
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">n</span><span class="plain"> = </span><span class="constant">1</span><span class="plain">;</span>
|
|
<span class="reserved">while</span><span class="plain"> (</span><span class="identifier">A</span><span class="plain">) {</span>
|
|
<span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Trie %d:\n"</span><span class="plain">, </span><span class="identifier">n</span><span class="plain">++); </span><span class="constant">INDENT</span><span class="plain">;</span>
|
|
<span class="functiontext">Tries::log</span><span class="plain">(</span><span class="identifier">OUT</span><span class="plain">, </span><span class="identifier">A</span><span class="plain">-></span><span class="element">the_trie</span><span class="plain">);</span>
|
|
<span class="constant">OUTDENT</span><span class="plain">;</span>
|
|
<span class="identifier">A</span><span class="plain"> = </span><span class="identifier">A</span><span class="plain">-></span><span class="element">next</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="constant">OUTDENT</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
|
|
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">Tries::log</span><span class="plain">(</span><span class="constant">OUTPUT_STREAM</span><span class="plain">, </span><span class="reserved">match_trie</span><span class="plain"> *</span><span class="identifier">T</span><span class="plain">) {</span>
|
|
<span class="reserved">for</span><span class="plain"> (; </span><span class="identifier">T</span><span class="plain">; </span><span class="identifier">T</span><span class="plain"> = </span><span class="identifier">T</span><span class="plain">-></span><span class="element">next</span><span class="plain">) {</span>
|
|
<span class="reserved">switch</span><span class="plain"> (</span><span class="identifier">T</span><span class="plain">-></span><span class="identifier">match_character</span><span class="plain">) {</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_START:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Start"</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_END:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"End"</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_ANYTHING:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Anything"</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_ANY_GROUP:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Group <%w>"</span><span class="plain">, </span><span class="identifier">T</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_NOT_GROUP:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Negated group <%w>"</span><span class="plain">, </span><span class="identifier">T</span><span class="plain">-></span><span class="element">group_characters</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="identifier">TRIE_STOP:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"Stop"</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="reserved">case</span><span class="plain"> </span><span class="constant">0</span><span class="plain">: </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"00"</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="identifier">default:</span><span class="plain"> </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"%c"</span><span class="plain">, </span><span class="identifier">T</span><span class="plain">-></span><span class="element">match_character</span><span class="plain">); </span><span class="reserved">break</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">T</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain">) </span><span class="identifier">WRITE</span><span class="plain">(</span><span class="string">" --> %s"</span><span class="plain">, </span><span class="identifier">T</span><span class="plain">-></span><span class="element">match_outcome</span><span class="plain">);</span>
|
|
<span class="identifier">WRITE</span><span class="plain">(</span><span class="string">"\n"</span><span class="plain">);</span>
|
|
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">T</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">) {</span>
|
|
<span class="constant">INDENT</span><span class="plain">; </span><span class="functiontext">Tries::log</span><span class="plain">(</span><span class="identifier">OUT</span><span class="plain">, </span><span class="identifier">T</span><span class="plain">-></span><span class="element">on_success</span><span class="plain">); </span><span class="constant">OUTDENT</span><span class="plain">;</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
<span class="plain">}</span>
|
|
</pre>
|
|
|
|
<p class="inwebparagraph"></p>
|
|
|
|
<p class="endnote">The function Tries::log_avinue is used in 1/fnd (<a href="1-fnd.html#SP8_3">§8.3</a>).</p>
|
|
|
|
<p class="endnote">The function Tries::log appears nowhere else.</p>
|
|
|
|
<hr class="tocbar">
|
|
<ul class="toc"><li><a href="4-tf.html">Back to 'Text Files'</a></li><li><a href="4-pm.html">Continue with 'Pattern Matching'</a></li></ul><hr class="tocbar">
|
|
<!--End of weave-->
|
|
</main>
|
|
</body>
|
|
</html>
|
|
|