inweb-bootstrap/docs/foundation-module/4-cst.html

208 lines
17 KiB
HTML
Raw Normal View History

2019-02-04 22:26:45 +00:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>4/chr</title>
<meta name="viewport" content="width=device-width initial-scale=1">
2019-02-04 22:26:45 +00:00
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
<link href="../inweb.css" rel="stylesheet" rev="stylesheet" type="text/css">
2019-02-04 22:26:45 +00:00
</head>
<body>
<nav role="navigation">
<h1><a href="../webs.html">Sources</a></h1>
<ul>
<li><a href="../inweb/index.html">inweb</a></li>
</ul>
<h2>Foundation</h2>
<ul>
<li><a href="../foundation-module/index.html">foundation-module</a></li>
<li><a href="../foundation-test/index.html">foundation-test</a></li>
</ul>
</nav>
<main role="main">
2019-02-09 12:33:40 +00:00
<!--Weave of '4/cst' generated by 7-->
<ul class="crumbs"><li><a href="../webs.html">Source</a></li><li><a href="index.html">foundation</a></li><li><a href="index.html#4">Chapter 4: Text Handling</a></li><li><b>C Strings</b></li></ul><p class="purpose">A minimal library for handling C-style strings.</p>
2019-02-04 22:26:45 +00:00
<p class="inwebparagraph"><a id="SP1"></a><b>&#167;1. </b>Programs using Foundation store text in <code class="display"><span class="extract">text_stream</span></code> structures almost all
of the time, but old-style, null-terminated <code class="display"><span class="extract">char *</span></code> array strings are
still occasionally needed.
</p>
<p class="inwebparagraph">We need to handle C strings long enough to contain any plausible filename, and
any run of a dozen or so lines of code; but we have no real need to handle
strings of unlimited length, nor to be parsimonious with memory.
</p>
<p class="inwebparagraph">The following defines a type for a string long enough for our purposes.
It should be at least as long as the constant sometimes called <code class="display"><span class="extract">PATH_MAX</span></code>,
the maximum length of a pathname, which is 1024 on Mac OS X.
</p>
<pre class="definitions">
<span class="definitionkeyword">define</span> <span class="constant">MAX_STRING_LENGTH</span><span class="plain"> 8*1024</span>
</pre>
<pre class="display">
<span class="reserved">typedef</span><span class="plain"> </span><span class="reserved">char</span><span class="plain"> </span><span class="identifier">string</span><span class="plain">[</span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">+1];</span>
</pre>
<p class="inwebparagraph"></p>
<p class="inwebparagraph"><a id="SP2"></a><b>&#167;2. </b>Occasionally we need access to the real, unbounded strlen:
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">CStrings::strlen_unbounded</span><span class="plain">(</span><span class="reserved">const</span><span class="plain"> </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain">) </span><span class="identifier">strlen</span><span class="plain">(</span><span class="identifier">p</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::strlen_unbounded appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP3"></a><b>&#167;3. </b>Any out-of-range access immediately halts the program; this is drastic, but
an attempt to continue execution after a string overflow might conceivably
result in a malformatted shell command being passed to the operating system,
which we cannot risk.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">CStrings::check_len</span><span class="plain">(</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">n</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">n</span><span class="plain"> &gt; </span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">) || (</span><span class="identifier">n</span><span class="plain"> &lt; 0)) </span><span class="functiontext">Errors::fatal</span><span class="plain">(</span><span class="string">"String overflow\</span><span class="plain">n</span><span class="string">"</span><span class="plain">);</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">n</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::check_len is used in <a href="#SP5">&#167;5</a>.</p>
<p class="inwebparagraph"><a id="SP4"></a><b>&#167;4. </b>The following is then protected from reading out of range if given a
non-terminated string, though this should never actually happen.
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">CStrings::len</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">str</span><span class="plain">) {</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">=0; </span><span class="identifier">i</span><span class="plain">&lt;=</span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">; </span><span class="identifier">i</span><span class="plain">++)</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">str</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] == 0) </span><span class="reserved">return</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
<span class="identifier">str</span><span class="plain">[</span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">] = 0;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::len is used in <a href="#SP5">&#167;5</a>.</p>
<p class="inwebparagraph"><a id="SP5"></a><b>&#167;5. </b>We then have a replacement for <code class="display"><span class="extract">strcpy</span></code>, identical except that it's
bounds-checked:
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">CStrings::copy</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">to</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain">) {</span>
<span class="functiontext">CStrings::check_len</span><span class="plain">(</span><span class="functiontext">CStrings::len</span><span class="plain">(</span><span class="identifier">from</span><span class="plain">));</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; ((</span><span class="identifier">from</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) &amp;&amp; (</span><span class="identifier">i</span><span class="plain"> &lt; </span><span class="constant">MAX_STRING_LENGTH</span><span class="plain">)); </span><span class="identifier">i</span><span class="plain">++) </span><span class="identifier">to</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">from</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
<span class="identifier">to</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = 0;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::copy appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP6"></a><b>&#167;6. </b>String comparisons will be done with the following, not <code class="display"><span class="extract">strcmp</span></code> directly:
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">CStrings::ne</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">A</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">B</span><span class="plain">) {</span>
<span class="reserved">return</span><span class="plain"> (</span><span class="functiontext">CStrings::cmp</span><span class="plain">(</span><span class="identifier">A</span><span class="plain">, </span><span class="identifier">B</span><span class="plain">) == 0)?</span><span class="constant">FALSE</span><span class="plain">:</span><span class="constant">TRUE</span><span class="plain">;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::ne appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP7"></a><b>&#167;7. </b>On the rare occasions when we need to sort alphabetically we'll also call:
</p>
<pre class="display">
<span class="reserved">int</span><span class="plain"> </span><span class="functiontext">CStrings::cmp</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">A</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">B</span><span class="plain">) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">A</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) || (</span><span class="identifier">A</span><span class="plain">[0] == 0)) {</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">B</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) || (</span><span class="identifier">B</span><span class="plain">[0] == 0)) </span><span class="reserved">return</span><span class="plain"> 0;</span>
<span class="reserved">return</span><span class="plain"> -1;</span>
<span class="plain">}</span>
<span class="reserved">if</span><span class="plain"> ((</span><span class="identifier">B</span><span class="plain"> == </span><span class="identifier">NULL</span><span class="plain">) || (</span><span class="identifier">B</span><span class="plain">[0] == 0)) </span><span class="reserved">return</span><span class="plain"> 1;</span>
<span class="reserved">return</span><span class="plain"> </span><span class="identifier">strcmp</span><span class="plain">(</span><span class="identifier">A</span><span class="plain">, </span><span class="identifier">B</span><span class="plain">);</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::cmp is used in <a href="#SP6">&#167;6</a>.</p>
<p class="inwebparagraph"><a id="SP8"></a><b>&#167;8. </b>And the following is needed to deal with extension filenames on platforms
whose locale is encoded as UTF-8.
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">CStrings::transcode_ISO_string_to_UTF8</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">p</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">dest</span><span class="plain">) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">, </span><span class="identifier">j</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0, </span><span class="identifier">j</span><span class="plain">=0; </span><span class="identifier">p</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]; </span><span class="identifier">i</span><span class="plain">++) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">charcode</span><span class="plain"> = (</span><span class="reserved">int</span><span class="plain">) (((</span><span class="reserved">unsigned</span><span class="plain"> </span><span class="reserved">char</span><span class="plain"> *)</span><span class="identifier">p</span><span class="plain">)[</span><span class="identifier">i</span><span class="plain">]);</span>
<span class="reserved">if</span><span class="plain"> (</span><span class="identifier">charcode</span><span class="plain"> &gt;= 128) {</span>
<span class="identifier">dest</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">++] = (</span><span class="reserved">char</span><span class="plain">) (0</span><span class="identifier">xC0</span><span class="plain"> + (</span><span class="identifier">charcode</span><span class="plain"> &gt;&gt; 6));</span>
<span class="identifier">dest</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">++] = (</span><span class="reserved">char</span><span class="plain">) (0</span><span class="identifier">x80</span><span class="plain"> + (</span><span class="identifier">charcode</span><span class="plain"> &amp; 0</span><span class="identifier">x3f</span><span class="plain">));</span>
<span class="plain">} </span><span class="reserved">else</span><span class="plain"> {</span>
<span class="identifier">dest</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">++] = </span><span class="identifier">p</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
<span class="plain">}</span>
<span class="plain">}</span>
<span class="identifier">dest</span><span class="plain">[</span><span class="identifier">j</span><span class="plain">] = 0;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::transcode_ISO_string_to_UTF8 appears nowhere else.</p>
<p class="inwebparagraph"><a id="SP9"></a><b>&#167;9. </b>I dislike to use <code class="display"><span class="extract">strncpy</span></code> because, and for some reason this surprises
me every time, it truncates but fails to write a null termination character
if the string to be copied is larger than the buffer to write to: the
result is therefore not a well-formed string and we have to fix matters by
hand. This I think makes for opaque code. So:
</p>
<pre class="display">
<span class="reserved">void</span><span class="plain"> </span><span class="functiontext">CStrings::truncated_strcpy</span><span class="plain">(</span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">to</span><span class="plain">, </span><span class="reserved">char</span><span class="plain"> *</span><span class="identifier">from</span><span class="plain">, </span><span class="reserved">int</span><span class="plain"> </span><span class="identifier">max</span><span class="plain">) {</span>
<span class="reserved">int</span><span class="plain"> </span><span class="identifier">i</span><span class="plain">;</span>
<span class="reserved">for</span><span class="plain"> (</span><span class="identifier">i</span><span class="plain">=0; ((</span><span class="identifier">from</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">]) &amp;&amp; (</span><span class="identifier">i</span><span class="plain">&lt;</span><span class="identifier">max</span><span class="plain">-1)); </span><span class="identifier">i</span><span class="plain">++) </span><span class="identifier">to</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = </span><span class="identifier">from</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">];</span>
<span class="identifier">to</span><span class="plain">[</span><span class="identifier">i</span><span class="plain">] = 0;</span>
<span class="plain">}</span>
</pre>
<p class="inwebparagraph"></p>
<p class="endnote">The function CStrings::truncated_strcpy is used in 2/dl (<a href="2-dl.html#SP6">&#167;6</a>).</p>
2019-03-12 23:32:12 +00:00
<hr class="tocbar">
<ul class="toc"><li><a href="4-chr.html">Back to 'Characters'</a></li><li><a href="4-ws.html">Continue with 'Wide Strings.w'</a></li></ul><hr class="tocbar">
<!--End of weave-->
</main>
2019-02-04 22:26:45 +00:00
</body>
</html>