inweb-bootstrap/docs/foundation-module/4-tf.html

420 lines
77 KiB
HTML
Raw Normal View History

2019-02-04 22:26:45 +00:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
2020-04-08 22:41:00 +00:00
<title>Text Files</title>
2020-04-23 22:23:44 +00:00
<link href="../docs-assets/Breadcrumbs.css" rel="stylesheet" rev="stylesheet" type="text/css">
<meta name="viewport" content="width=device-width initial-scale=1">
2019-02-04 22:26:45 +00:00
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-gb">
2020-04-20 22:26:08 +00:00
<link href="../docs-assets/Contents.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-30 22:36:38 +00:00
<link href="../docs-assets/Progress.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-25 10:33:39 +00:00
<link href="../docs-assets/Navigation.css" rel="stylesheet" rev="stylesheet" type="text/css">
<link href="../docs-assets/Fonts.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-20 22:26:08 +00:00
<link href="../docs-assets/Base.css" rel="stylesheet" rev="stylesheet" type="text/css">
<script>
function togglePopup(material_id) {
var popup = document.getElementById(material_id);
popup.classList.toggle("show");
}
</script>
<link href="../docs-assets/Popups.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-21 16:55:17 +00:00
<link href="../docs-assets/Colours.css" rel="stylesheet" rev="stylesheet" type="text/css">
2020-04-23 22:23:44 +00:00
2019-02-04 22:26:45 +00:00
</head>
2020-04-25 10:33:39 +00:00
<body class="commentary-font">
<nav role="navigation">
2020-04-13 16:06:45 +00:00
<h1><a href="../index.html">
2020-04-20 22:26:08 +00:00
<img src="../docs-assets/Octagram.png" width=72 height=72">
2020-04-13 16:06:45 +00:00
</a></h1>
<ul><li><a href="../inweb/index.html">inweb</a></li>
</ul><h2>Foundation Module</h2><ul>
<li><a href="index.html"><span class="selectedlink">foundation</span></a></li>
<li><a href="../foundation-test/index.html">foundation-test</a></li>
2020-04-13 16:06:45 +00:00
</ul><h2>Example Webs</h2><ul>
2020-04-12 16:24:23 +00:00
<li><a href="../goldbach/index.html">goldbach</a></li>
<li><a href="../twinprimes/twinprimes.html">twinprimes</a></li>
2020-04-15 22:45:08 +00:00
<li><a href="../eastertide/index.html">eastertide</a></li>
2020-04-14 17:36:42 +00:00
</ul><h2>Repository</h2><ul>
2020-04-20 22:34:44 +00:00
<li><a href="https://github.com/ganelson/inweb"><img src="../docs-assets/github.png" height=18> github</a></li>
2020-04-14 17:36:42 +00:00
</ul><h2>Related Projects</h2><ul>
<li><a href="../../../inform/docs/index.html">inform</a></li>
<li><a href="../../../intest/docs/index.html">intest</a></li>
2020-04-13 16:06:45 +00:00
</ul>
</nav>
<main role="main">
2020-04-23 22:23:44 +00:00
<!--Weave of 'Text Files' generated by Inweb-->
2020-04-30 22:36:38 +00:00
<div class="breadcrumbs">
<ul class="crumbs"><li><a href="../index.html">Home</a></li><li><a href="index.html">foundation</a></li><li><a href="index.html#4">Chapter 4: Text Handling</a></li><li><b>Text Files</b></li></ul></div>
<p class="purpose">To read text files of whatever flavour, one line at a time.</p>
2019-02-04 22:26:45 +00:00
2020-04-15 22:45:08 +00:00
<ul class="toc"><li><a href="4-tf.html#SP1">&#167;1. Text files</a></li><li><a href="4-tf.html#SP2">&#167;2. Text file positions</a></li><li><a href="4-tf.html#SP5">&#167;5. Text file scanner</a></li><li><a href="4-tf.html#SP8">&#167;8. Reading UTF-8 files</a></li></ul><hr class="tocbar">
2019-02-04 22:26:45 +00:00
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP1"></a><b>&#167;1. Text files. </b>Foundation was written mainly to support command-line tools which, of their
2019-02-04 22:26:45 +00:00
nature, deal with a lot of text files: source code of programs, configuration
files, HTML, XML and so on. The main aim of this section is to provide a
standard way to read in and iterate through lines of a text file.
</p>
2020-04-24 23:06:02 +00:00
<p class="commentary">First, though, here is a perhaps clumsy but effective way to test if a
2019-02-04 22:26:45 +00:00
file actually exists on disc at a given filename:
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::exists</span><button class="popup" onclick="togglePopup('usagePopup1')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup1">Usage of <span class="code-font"><span class="function-syntax">TextFiles::exists</span></span>:<br/>Web Structure - <a href="8-ws.html#SP7_2_2_4">&#167;7.2.2.4</a>, <a href="8-ws.html#SP8">&#167;8</a><br/>Build Files - <a href="8-bf.html#SP1">&#167;1</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">filename</span><span class="plain-syntax"> *</span><span class="identifier-syntax">F</span><span class="plain-syntax">) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">FILE</span><span class="plain-syntax"> *</span><span class="identifier-syntax">HANDLE</span><span class="plain-syntax"> = </span><a href="3-fln.html#SP10" class="function-link"><span class="function-syntax">Filenames::fopen</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">F</span><span class="plain-syntax">, </span><span class="string-syntax">"rb"</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">HANDLE</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">fclose</span><span class="plain-syntax">(</span><span class="identifier-syntax">HANDLE</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP2"></a><b>&#167;2. Text file positions. </b>Here's how we record a position in a text file:
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">filename</span><span class="plain-syntax"> *</span><span class="identifier-syntax">text_file_filename</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">FILE</span><span class="plain-syntax"> *</span><span class="identifier-syntax">handle_when_open</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">;</span>
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">line_count</span><span class="plain-syntax">; </span><span class="comment-syntax"> counting from 1</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">line_position</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">skip_terminator</span><span class="plain-syntax">;</span>
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">actively_scanning</span><span class="plain-syntax">; </span><span class="comment-syntax"> whether we are still interested in the rest of the file</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax">} </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>The structure text_file_position is accessed in 3/em, 3/cla, 8/ws and here.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP3"></a><b>&#167;3. </b>For access:
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-25 12:26:09 +00:00
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::get_line_count</span><span class="plain-syntax">(</span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> *</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">) {</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">line_count</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP4"></a><b>&#167;4. </b>And this is for a real nowhere man:
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-25 12:26:09 +00:00
<span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::nowhere</span><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">text_file_filename</span><span class="plain-syntax"> = </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_count</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_position</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="constant-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax"> = </span><span class="constant-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
2020-04-25 12:26:09 +00:00
<span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::at</span><span class="plain-syntax">(</span><span class="reserved-syntax">filename</span><span class="plain-syntax"> *</span><span class="identifier-syntax">F</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">line</span><span class="plain-syntax">) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax"> = </span><a href="4-tf.html#SP4" class="function-link"><span class="function-syntax">TextFiles::nowhere</span></a><span class="plain-syntax">();</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">text_file_filename</span><span class="plain-syntax"> = </span><span class="identifier-syntax">F</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_count</span><span class="plain-syntax"> = </span><span class="identifier-syntax">line</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP5"></a><b>&#167;5. Text file scanner. </b>We read lines in, delimited by any of the standard line-ending characters,
2020-04-22 22:57:09 +00:00
and send them one at a time to a function called <span class="extract"><span class="extract-syntax">iterator</span></span>. Throughout,
we preserve a pointer called <span class="extract"><span class="extract-syntax">state</span></span> to some object being used by the
2019-02-04 22:26:45 +00:00
client.
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-05-11 18:12:07 +00:00
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::read</span><button class="popup" onclick="togglePopup('usagePopup2')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup2">Usage of <span class="code-font"><span class="function-syntax">TextFiles::read</span></span>:<br/>Command Line Arguments - <a href="3-cla.html#SP11">&#167;11</a><br/>HTML - <a href="5-htm.html#SP11">&#167;11</a><br/>Web Structure - <a href="8-ws.html#SP6">&#167;6</a><br/>Build Files - <a href="8-bf.html#SP3">&#167;3</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">filename</span><span class="plain-syntax"> *</span><span class="identifier-syntax">F</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">, </span><span class="reserved-syntax">char</span><span class="plain-syntax"> *</span><span class="identifier-syntax">message</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">serious</span><span class="plain-syntax">,</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">void</span><span class="plain-syntax"> (</span><span class="identifier-syntax">iterator</span><span class="plain-syntax">)(</span><span class="reserved-syntax">text_stream</span><span class="plain-syntax"> *, </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> *, </span><span class="reserved-syntax">void</span><span class="plain-syntax"> *),</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> *</span><span class="identifier-syntax">start_at</span><span class="plain-syntax">, </span><span class="reserved-syntax">void</span><span class="plain-syntax"> *</span><span class="identifier-syntax">state</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">;</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">ufb</span><span class="plain-syntax"> = </span><a href="4-tf.html#SP8" class="function-link"><span class="function-syntax">TextFiles::create_ufb</span></a><span class="plain-syntax">();</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_1" class="named-paragraph-link"><span class="named-paragraph">Open the text file</span><span class="named-paragraph-number">5.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_2" class="named-paragraph-link"><span class="named-paragraph">Set the initial position, seeking it in the file if need be</span><span class="named-paragraph-number">5.2</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_3" class="named-paragraph-link"><span class="named-paragraph">Read in lines and send them one by one to the iterator</span><span class="named-paragraph-number">5.3</span></a></span><span class="plain-syntax">;</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">fclose</span><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_count</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-25 10:33:39 +00:00
<p class="commentary firstcommentary"><a id="SP5_1"></a><b>&#167;5.1. </b><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Open the text file</span><span class="named-paragraph-number">5.1</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax"> = </span><a href="3-fln.html#SP10" class="function-link"><span class="function-syntax">Filenames::fopen</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">F</span><span class="plain-syntax">, </span><span class="string-syntax">"rb"</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">message</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">serious</span><span class="plain-syntax">) </span><a href="3-em.html#SP2" class="function-link"><span class="function-syntax">Errors::fatal_with_file</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">message</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> { </span><a href="3-em.html#SP7" class="function-link"><span class="function-syntax">Errors::with_file</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">message</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">); </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">; }</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP5">&#167;5</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP5_2"></a><b>&#167;5.2. </b>The ANSI definition of <span class="extract"><span class="extract-syntax">ftell</span></span> and <span class="extract"><span class="extract-syntax">fseek</span></span> says that, with text files, the
2019-02-04 22:26:45 +00:00
only definite position value is 0 &mdash; meaning the beginning of the file &mdash; and
2020-04-22 22:57:09 +00:00
this is what we initialise <span class="extract"><span class="extract-syntax">line_position</span></span> to. We must otherwise only write
values returned by <span class="extract"><span class="extract-syntax">ftell</span></span> into this field.
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Set the initial position, seeking it in the file if need be</span><span class="named-paragraph-number">5.2</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">start_at</span><span class="plain-syntax"> == </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_count</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_position</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'X'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax"> = *</span><span class="identifier-syntax">start_at</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">fseek</span><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">, (</span><span class="reserved-syntax">long</span><span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax">) (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_position</span><span class="plain-syntax">), </span><span class="identifier-syntax">SEEK_SET</span><span class="plain-syntax">)) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">serious</span><span class="plain-syntax">) </span><a href="3-em.html#SP2" class="function-link"><span class="function-syntax">Errors::fatal_with_file</span></a><span class="plain-syntax">(</span><span class="string-syntax">"unable to seek position in file"</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><a href="3-em.html#SP7" class="function-link"><span class="function-syntax">Errors::with_file</span></a><span class="plain-syntax">(</span><span class="string-syntax">"unable to seek position in file"</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax"> = </span><span class="constant-syntax">TRUE</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">text_file_filename</span><span class="plain-syntax"> = </span><span class="identifier-syntax">F</span><span class="plain-syntax">;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP5">&#167;5</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP5_3"></a><b>&#167;5.3. </b>We aim to get this right whether the lines are terminated by <span class="extract"><span class="extract-syntax">0A</span></span>, <span class="extract"><span class="extract-syntax">0D</span></span>,
2020-04-22 22:57:09 +00:00
<span class="extract"><span class="extract-syntax">0A 0D</span></span> or <span class="extract"><span class="extract-syntax">0D 0A</span></span>. The final line is not required to be terminated.
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Read in lines and send them one by one to the iterator</span><span class="named-paragraph-number">5.3</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">TEMPORARY_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">line</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax">)) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><a href="4-tf.html#SP8" class="function-link"><span class="function-syntax">TextFiles::utf8_fgetc</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">, &amp;</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">ufb</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">)) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="4-sm.html#SP14" class="function-link"><span class="function-syntax">Str::put_at</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">line</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_3_1" class="named-paragraph-link"><span class="named-paragraph">Feed the completed line to the iterator routine</span><span class="named-paragraph-number">5.3.1</span></a></span><span class="plain-syntax">;</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">) </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">) </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'X'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_3_2" class="named-paragraph-link"><span class="named-paragraph">Update the text file position</span><span class="named-paragraph-number">5.3.2</span></a></span><span class="plain-syntax">;</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="4-sm.html#SP14" class="function-link"><span class="function-syntax">Str::put_at</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">line</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">++, (</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax">))</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP5_3_1" class="named-paragraph-link"><span class="named-paragraph">Feed the completed line to the iterator routine</span><span class="named-paragraph-number">5.3.1</span></a></span><span class="plain-syntax">;</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">DISCARD_TEXT</span><span class="plain-syntax">(</span><span class="identifier-syntax">line</span><span class="plain-syntax">);</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP5">&#167;5</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP5_3_1"></a><b>&#167;5.3.1. </b>We update the line counter only when a line is actually sent:
2019-02-04 22:26:45 +00:00
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Feed the completed line to the iterator routine</span><span class="named-paragraph-number">5.3.1</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">iterator</span><span class="plain-syntax">(</span><span class="identifier-syntax">line</span><span class="plain-syntax">, &amp;</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">, </span><span class="identifier-syntax">state</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_count</span><span class="plain-syntax">++;</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP5_3">&#167;5.3</a> (twice).</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP5_3_2"></a><b>&#167;5.3.2. </b>But we update the text file position after every apparent line terminator.
2019-02-04 22:26:45 +00:00
This is because we might otherwise, on a Windows text file, end up with an
2020-04-22 22:57:09 +00:00
<span class="extract"><span class="extract-syntax">ftell</span></span> position in between the <span class="extract"><span class="extract-syntax">CR</span></span> and the <span class="extract"><span class="extract-syntax">LF</span></span>; if we resume at that point,
2019-02-04 22:26:45 +00:00
later on, we'll then have an off-by-one error in the line numbering in the
resumption as compared to during the original pass.
</p>
2020-04-24 23:06:02 +00:00
<p class="commentary">Properly speaking, <span class="extract"><span class="extract-syntax">ftell</span></span> returns a long <span class="extract"><span class="extract-syntax">int</span></span>, not an <span class="extract"><span class="extract-syntax">int</span></span>, but on a
2019-02-04 22:26:45 +00:00
32-bit-or-more integer machine, this gives us room for files to run to 2GB.
Text files seldom come that large.
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Update the text file position</span><span class="named-paragraph-number">5.3.2</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_position</span><span class="plain-syntax"> = (</span><span class="reserved-syntax">int</span><span class="plain-syntax">) (</span><span class="identifier-syntax">ftell</span><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">.</span><span class="element-syntax">line_position</span><span class="plain-syntax"> == -1) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">serious</span><span class="plain-syntax">)</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="3-em.html#SP2" class="function-link"><span class="function-syntax">Errors::fatal_with_file</span></a><span class="plain-syntax">(</span><span class="string-syntax">"unable to determine position in file"</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="3-em.html#SP7" class="function-link"><span class="function-syntax">Errors::with_file</span></a><span class="plain-syntax">(</span><span class="string-syntax">"unable to determine position in file"</span><span class="plain-syntax">, </span><span class="identifier-syntax">F</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP5_3">&#167;5.3</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP6"></a><b>&#167;6. </b></p>
2020-04-21 16:55:17 +00:00
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-25 12:26:09 +00:00
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::read_line</span><span class="plain-syntax">(</span><span class="constant-syntax">OUTPUT_STREAM</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">, </span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> *</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="4-sm.html#SP15" class="function-link"><span class="function-syntax">Str::clear</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">, </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="character-syntax">' '</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax">)) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><a href="4-tf.html#SP8" class="function-link"><span class="function-syntax">TextFiles::utf8_fgetc</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">, </span><span class="identifier-syntax">NULL</span><span class="plain-syntax">, </span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">, &amp;</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">)) {</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="4-sm.html#SP14" class="function-link"><span class="function-syntax">Str::put_at</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">, </span><span class="constant-syntax">0</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">) </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">) </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'\x0a'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> } </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">skip_terminator</span><span class="plain-syntax"> = </span><span class="character-syntax">'X'</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">line_position</span><span class="plain-syntax"> = (</span><span class="reserved-syntax">int</span><span class="plain-syntax">) (</span><span class="identifier-syntax">ftell</span><span class="plain-syntax">(</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">handle_when_open</span><span class="plain-syntax">));</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">i</span><span class="plain-syntax"> = </span><span class="constant-syntax">0</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">line_count</span><span class="plain-syntax">++; </span><span class="reserved-syntax">return</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
2020-04-21 23:52:25 +00:00
<span class="plain-syntax"> </span><a href="4-sm.html#SP14" class="function-link"><span class="function-syntax">Str::put_at</span></a><span class="plain-syntax">(</span><span class="identifier-syntax">OUT</span><span class="plain-syntax">, </span><span class="identifier-syntax">i</span><span class="plain-syntax">++, (</span><span class="identifier-syntax">wchar_t</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax">);</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">i</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax">)) </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">line_count</span><span class="plain-syntax">++;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP7"></a><b>&#167;7. </b>The routine being iterated can indicate that it has had enough by
2019-02-04 22:26:45 +00:00
calling the following:
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-25 12:26:09 +00:00
<span class="reserved-syntax">void</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::lose_interest</span><span class="plain-syntax">(</span><span class="reserved-syntax">text_file_position</span><span class="plain-syntax"> *</span><span class="identifier-syntax">tfp</span><span class="plain-syntax">) {</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">tfp</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">actively_scanning</span><span class="plain-syntax"> = </span><span class="constant-syntax">FALSE</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP8"></a><b>&#167;8. Reading UTF-8 files. </b>The following routine reads a sequence of Unicode characters from a UTF-8
2019-02-04 22:26:45 +00:00
encoded file, but returns them as a sequence of ISO Latin-1 characters, a
trick it can only pull off by escaping non-ISO characters. This is done by
2020-04-22 22:57:09 +00:00
taking character number <span class="extract"><span class="extract-syntax">N</span></span> and feeding it out, one character at a time, as
the text <span class="extract"><span class="extract-syntax">[unicode N]</span></span>, writing the number in decimal. Only one UTF-8
2019-02-04 22:26:45 +00:00
file like this will be being read at a time, and the routine will be
2020-04-22 22:57:09 +00:00
repeatedly called until <span class="extract"><span class="extract-syntax">EOF</span></span> or a line division.
2019-02-04 22:26:45 +00:00
</p>
2020-04-24 23:06:02 +00:00
<p class="commentary">Strictly speaking, we transmit not as ISO Latin-1 but as that subset of ISO
2019-02-04 22:26:45 +00:00
which have corresponding (different) codes in the ZSCII character set. This
excludes some typewriter symbols and a handful of letterforms, as we shall
see.
</p>
2020-04-24 23:06:02 +00:00
<p class="commentary">There are two exceptions: <span class="extract"><span class="extract-syntax">TextFiles::utf8_fgetc</span></span> can also return the usual C
2020-04-22 22:57:09 +00:00
end-of-file pseudo-character <span class="extract"><span class="extract-syntax">EOF</span></span>, and it can also return the Unicode BOM
2019-02-04 22:26:45 +00:00
(byte-ordering marker) pseudo-character, which is legal at the start of a
file and which is automatically prepended by some text editors and
word-processors when they save a UTF-8 file (though in fact it is not
2020-04-22 22:57:09 +00:00
required by the UTF-8 specification). Anyone calling <span class="extract"><span class="extract-syntax">TextFiles::utf8_fgetc</span></span> must
check the return value for <span class="extract"><span class="extract-syntax">EOF</span></span> every time, and for <span class="extract"><span class="extract-syntax">0xFEFF</span></span> every time we
2019-02-04 22:26:45 +00:00
might be at the start of the file being read.
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-21 16:55:17 +00:00
<span class="reserved-syntax">typedef</span><span class="plain-syntax"> </span><span class="reserved-syntax">struct</span><span class="plain-syntax"> </span><span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax"> {</span>
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">char</span><span class="plain-syntax"> </span><span class="identifier-syntax">unicode_feed_buffer</span><span class="plain-syntax">[32]; </span><span class="comment-syntax"> holds a single escape such as "[unicode 3106]"</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">ufb_counter</span><span class="plain-syntax">; </span><span class="comment-syntax"> position in the unicode feed buffer</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax">} </span><span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax">;</span>
<span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::create_ufb</span><button class="popup" onclick="togglePopup('usagePopup3')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup3">Usage of <span class="code-font"><span class="function-syntax">TextFiles::create_ufb</span></span>:<br/><a href="4-tf.html#SP5">&#167;5</a><br/>Streams - <a href="2-str.html#SP28_2">&#167;28.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">void</span><span class="plain-syntax">) {</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">.</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax"> = -1;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
<span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="function-syntax">TextFiles::utf8_fgetc</span><button class="popup" onclick="togglePopup('usagePopup4')"><span class="comment-syntax">?</span><span class="popuptext" id="usagePopup4">Usage of <span class="code-font"><span class="function-syntax">TextFiles::utf8_fgetc</span></span>:<br/><a href="4-tf.html#SP5_3">&#167;5.3</a>, <a href="4-tf.html#SP6">&#167;6</a><br/>Streams - <a href="2-str.html#SP28_2">&#167;28.2</a></span></button><span class="plain-syntax">(</span><span class="reserved-syntax">FILE</span><span class="plain-syntax"> *</span><span class="identifier-syntax">from</span><span class="plain-syntax">, </span><span class="reserved-syntax">char</span><span class="plain-syntax"> **</span><span class="identifier-syntax">or_from</span><span class="plain-syntax">, </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">,</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">unicode_file_buffer</span><span class="plain-syntax"> *</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">, </span><span class="identifier-syntax">conts</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0</span><span class="plain-syntax">)) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">unicode_feed_buffer</span><span class="plain-syntax">[</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax">] == </span><span class="constant-syntax">0</span><span class="plain-syntax">) </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax"> = -1;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">unicode_feed_buffer</span><span class="plain-syntax">[</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax">++];</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">from</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">fgetc</span><span class="plain-syntax">(</span><span class="identifier-syntax">from</span><span class="plain-syntax">); </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">or_from</span><span class="plain-syntax">) </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = ((</span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">char</span><span class="plain-syntax">) *((*</span><span class="identifier-syntax">or_from</span><span class="plain-syntax">)++));</span>
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">; </span><span class="comment-syntax"> ruling out EOF leaves a genuine byte from the file</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0x80</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">; </span><span class="comment-syntax"> in all other cases, a UTF-8 continuation sequence begins</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP8_1" class="named-paragraph-link"><span class="named-paragraph">Unpack one to five continuation bytes to obtain the Unicode character code</span><span class="named-paragraph-number">8.1</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP8_2" class="named-paragraph-link"><span class="named-paragraph">Return non-ASCII codes in the intersection of ISO Latin-1 and ZSCII as literals</span><span class="named-paragraph-number">8.2</span></a></span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax">) </span><span class="named-paragraph-container code-font"><a href="4-tf.html#SP8_3" class="named-paragraph-link"><span class="named-paragraph">Return Unicode fancy equivalents as simpler literals</span><span class="named-paragraph-number">8.3</span></a></span><span class="plain-syntax">;</span>
2020-04-21 16:55:17 +00:00
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xFEFF</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">; </span><span class="comment-syntax"> the Unicode BOM non-character</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">escape_oddities</span><span class="plain-syntax"> == </span><span class="constant-syntax">FALSE</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">sprintf</span><span class="plain-syntax">(</span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">unicode_feed_buffer</span><span class="plain-syntax">, </span><span class="string-syntax">"[unicode %d]"</span><span class="plain-syntax">, </span><span class="identifier-syntax">c</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">ufb</span><span class="plain-syntax">-&gt;</span><span class="element-syntax">ufb_counter</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'['</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span><span class="plain-syntax">;</span>
<span class="plain-syntax">}</span>
</pre>
<ul class="endnotetexts"><li>The structure unicode_file_buffer is private to this section.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP8_1"></a><b>&#167;8.1. </b>Not every byte sequence is legal in a UTF-8 file: if we find a malformed
2019-02-04 22:26:45 +00:00
continuation, we process it as a question mark rather than throwing a
fatal error (which is pretty well the only alternative here). The user
is likely to see problem messages later on which arise from the question
marks, and that will have to do.
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Unpack one to five continuation bytes to obtain the Unicode character code</span><span class="named-paragraph-number">8.1</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-05-11 21:57:58 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0xC0</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span><span class="plain-syntax">; </span><span class="comment-syntax"> malformed UTF-8</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0xE0</span><span class="plain-syntax">) { </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0x1f</span><span class="plain-syntax">; </span><span class="identifier-syntax">conts</span><span class="plain-syntax"> = </span><span class="constant-syntax">1</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0xF0</span><span class="plain-syntax">) { </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0xf</span><span class="plain-syntax">; </span><span class="identifier-syntax">conts</span><span class="plain-syntax"> = </span><span class="constant-syntax">2</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0xF8</span><span class="plain-syntax">) { </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0x7</span><span class="plain-syntax">; </span><span class="identifier-syntax">conts</span><span class="plain-syntax"> = </span><span class="constant-syntax">3</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax">&lt;</span><span class="constant-syntax">0xFC</span><span class="plain-syntax">) { </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0x3</span><span class="plain-syntax">; </span><span class="identifier-syntax">conts</span><span class="plain-syntax"> = </span><span class="constant-syntax">4</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">else</span><span class="plain-syntax"> { </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0x1</span><span class="plain-syntax">; </span><span class="identifier-syntax">conts</span><span class="plain-syntax"> = </span><span class="constant-syntax">5</span><span class="plain-syntax">; }</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">while</span><span class="plain-syntax"> (</span><span class="identifier-syntax">conts</span><span class="plain-syntax"> &gt; </span><span class="constant-syntax">0</span><span class="plain-syntax">) {</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">int</span><span class="plain-syntax"> </span><span class="identifier-syntax">d</span><span class="plain-syntax"> = </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">from</span><span class="plain-syntax">) </span><span class="identifier-syntax">d</span><span class="plain-syntax"> = </span><span class="identifier-syntax">fgetc</span><span class="plain-syntax">(</span><span class="identifier-syntax">from</span><span class="plain-syntax">); </span><span class="reserved-syntax">else</span><span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">or_from</span><span class="plain-syntax">) </span><span class="identifier-syntax">d</span><span class="plain-syntax"> = ((</span><span class="reserved-syntax">unsigned</span><span class="plain-syntax"> </span><span class="reserved-syntax">char</span><span class="plain-syntax">) *((*</span><span class="identifier-syntax">or_from</span><span class="plain-syntax">)++));</span>
2020-05-11 21:57:58 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">d</span><span class="plain-syntax"> == </span><span class="identifier-syntax">EOF</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'?'</span><span class="plain-syntax">; </span><span class="comment-syntax"> malformed UTF-8</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;&lt; </span><span class="constant-syntax">6</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax"> = </span><span class="identifier-syntax">c</span><span class="plain-syntax"> + (</span><span class="identifier-syntax">d</span><span class="plain-syntax"> &amp; </span><span class="constant-syntax">0x3F</span><span class="plain-syntax">);</span>
<span class="plain-syntax"> </span><span class="identifier-syntax">conts</span><span class="plain-syntax">--;</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP8">&#167;8</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP8_2"></a><b>&#167;8.2. </b>For the ZSCII character set, see "The Inform 6 Designer's Manual", or
2019-02-04 22:26:45 +00:00
"The Z-Machine Standards Document". It offers a range of west European
accented letters which almost, but not quite, matches those on offer in
ISO Latin-1 &mdash; it omits for example Icelandic lower case eth. (ZSCII was
developed in the 1980s by Infocom, Imc., to encode their interactive
fiction offerings. Had they been collaborating with J. R. R. Tolkien
rather than Douglas Adams, they might have filled this gap. As it was,
"eth" never occurred in any of their works.)
</p>
2020-04-24 23:06:02 +00:00
<p class="commentary">We let the multiplication sign <span class="extract"><span class="extract-syntax">0xd7</span></span> through even though ZSCII doesn't
2019-02-04 22:26:45 +00:00
support it, but convert it to an "x": this is so that we can parse numbers
in scientific notation.
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Return non-ASCII codes in the intersection of ISO Latin-1 and ZSCII as literals</span><span class="named-paragraph-number">8.2</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xa1</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xa3</span><span class="plain-syntax">) || (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xbf</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">; </span><span class="comment-syntax"> pound sign, inverted ! and ?</span>
2020-05-11 21:57:58 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xd7</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'x'</span><span class="plain-syntax">; </span><span class="comment-syntax"> convert multiplication sign to lower case "x"</span>
2020-04-24 23:06:02 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0xc0</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0xff</span><span class="plain-syntax">)) { </span><span class="comment-syntax"> accented West European letters, but...</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="constant-syntax">0xd0</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="constant-syntax">0xf0</span><span class="plain-syntax">) &amp;&amp; </span><span class="comment-syntax"> not Icelandic eths</span>
<span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="constant-syntax">0xde</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="constant-syntax">0xfe</span><span class="plain-syntax">) &amp;&amp; </span><span class="comment-syntax"> nor Icelandic thorns</span>
<span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> != </span><span class="constant-syntax">0xf7</span><span class="plain-syntax">)) </span><span class="comment-syntax"> nor division signs</span>
2020-04-21 16:55:17 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="identifier-syntax">c</span><span class="plain-syntax">;</span>
<span class="plain-syntax"> }</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP8">&#167;8</a>.</li></ul>
2020-04-24 23:06:02 +00:00
<p class="commentary firstcommentary"><a id="SP8_3"></a><b>&#167;8.3. </b>We err on the safe side, accepting em-rules and non-breaking spaces, etc.,
2019-02-04 22:26:45 +00:00
where we would normally expect hyphens and ordinary spaces: this is intended
for the benefit of users with helpful word-processors which autocorrect
hyphens into em-rules when they are flanked by spaces, and so on.
</p>
2020-04-25 10:33:39 +00:00
<p class="commentary"><span class="named-paragraph-container code-font"><span class="named-paragraph-defn">Return Unicode fancy equivalents as simpler literals</span><span class="named-paragraph-number">8.3</span></span><span class="comment-syntax"> =</span>
</p>
2020-04-25 10:33:39 +00:00
<pre class="displayed-code all-displayed-code code-font">
2020-05-11 21:57:58 +00:00
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0x85</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">; </span><span class="comment-syntax"> NEL, or "next line"</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> == </span><span class="constant-syntax">0xa0</span><span class="plain-syntax">) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">' '</span><span class="plain-syntax">; </span><span class="comment-syntax"> non-breaking space</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0x2000</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0x200a</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">' '</span><span class="plain-syntax">; </span><span class="comment-syntax"> space variants</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0x2010</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0x2014</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'-'</span><span class="plain-syntax">; </span><span class="comment-syntax"> rules and dashes</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0x2018</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0x2019</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'\''</span><span class="plain-syntax">; </span><span class="comment-syntax"> smart single quotes</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0x201c</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0x201d</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'"'</span><span class="plain-syntax">; </span><span class="comment-syntax"> smart double quotes</span>
<span class="plain-syntax"> </span><span class="reserved-syntax">if</span><span class="plain-syntax"> ((</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &gt;= </span><span class="constant-syntax">0x2028</span><span class="plain-syntax">) &amp;&amp; (</span><span class="identifier-syntax">c</span><span class="plain-syntax"> &lt;= </span><span class="constant-syntax">0x2029</span><span class="plain-syntax">)) </span><span class="reserved-syntax">return</span><span class="plain-syntax"> </span><span class="character-syntax">'\x0d'</span><span class="plain-syntax">; </span><span class="comment-syntax"> fancy newlines</span>
</pre>
<ul class="endnotetexts"><li>This code is used in <a href="4-tf.html#SP8">&#167;8</a>.</li></ul>
2020-04-30 22:36:38 +00:00
<nav role="progress"><div class="progresscontainer">
<ul class="progressbar"><li class="progressprev"><a href="4-sm.html">&#10094;</a></li><li class="progresschapter"><a href="P-abgtf.html">P</a></li><li class="progresschapter"><a href="1-fm.html">1</a></li><li class="progresschapter"><a href="2-dl.html">2</a></li><li class="progresschapter"><a href="3-em.html">3</a></li><li class="progresscurrentchapter">4</li><li class="progresssection"><a href="4-chr.html">chr</a></li><li class="progresssection"><a href="4-cst.html">cst</a></li><li class="progresssection"><a href="4-ws.html">ws</a></li><li class="progresssection"><a href="4-sm.html">sm</a></li><li class="progresscurrent">tf</li><li class="progresssection"><a href="4-taa.html">taa</a></li><li class="progresssection"><a href="4-pm.html">pm</a></li><li class="progresschapter"><a href="5-htm.html">5</a></li><li class="progresschapter"><a href="6-bf.html">6</a></li><li class="progresschapter"><a href="7-vn.html">7</a></li><li class="progresschapter"><a href="8-ws.html">8</a></li><li class="progressnext"><a href="4-taa.html">&#10095;</a></li></ul></div>
</nav><!--End of weave-->
2020-04-23 22:23:44 +00:00
</main>
2019-02-04 22:26:45 +00:00
</body>
</html>