2019-02-04 22:26:45 +00:00
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
< html >
< head >
2020-04-08 22:41:00 +00:00
< title > C Strings< / title >
2020-03-19 00:03:04 +00:00
< meta name = "viewport" content = "width=device-width initial-scale=1" >
2019-02-04 22:26:45 +00:00
< meta http-equiv = "Content-Type" content = "text/html; charset=utf-8" >
< meta http-equiv = "Content-Language" content = "en-gb" >
2020-03-19 00:03:04 +00:00
< link href = "../inweb.css" rel = "stylesheet" rev = "stylesheet" type = "text/css" >
2019-02-04 22:26:45 +00:00
< / head >
< body >
2020-03-19 00:03:04 +00:00
< nav role = "navigation" >
< h1 > < a href = "../webs.html" > Sources< / a > < / h1 >
< ul >
< li > < a href = "../inweb/index.html" > inweb< / a > < / li >
< / ul >
< h2 > Foundation< / h2 >
< ul >
< li > < a href = "../foundation-module/index.html" > foundation-module< / a > < / li >
< li > < a href = "../foundation-test/index.html" > foundation-test< / a > < / li >
< / ul >
< / nav >
< main role = "main" >
2020-04-08 22:41:00 +00:00
<!-- Weave of 'C Strings' generated by 7 -->
2020-03-19 00:03:04 +00:00
< ul class = "crumbs" > < li > < a href = "../webs.html" > Source< / a > < / li > < li > < a href = "index.html" > foundation< / a > < / li > < li > < a href = "index.html#4" > Chapter 4: Text Handling< / a > < / li > < li > < b > C Strings< / b > < / li > < / ul > < p class = "purpose" > A minimal library for handling C-style strings.< / p >
2019-02-04 22:26:45 +00:00
< p class = "inwebparagraph" > < a id = "SP1" > < / a > < b > § 1. < / b > Programs using Foundation store text in < code class = "display" > < span class = "extract" > text_stream< / span > < / code > structures almost all
of the time, but old-style, null-terminated < code class = "display" > < span class = "extract" > char *< / span > < / code > array strings are
still occasionally needed.
< / p >
< p class = "inwebparagraph" > We need to handle C strings long enough to contain any plausible filename, and
any run of a dozen or so lines of code; but we have no real need to handle
strings of unlimited length, nor to be parsimonious with memory.
< / p >
< p class = "inwebparagraph" > The following defines a type for a string long enough for our purposes.
It should be at least as long as the constant sometimes called < code class = "display" > < span class = "extract" > PATH_MAX< / span > < / code > ,
the maximum length of a pathname, which is 1024 on Mac OS X.
< / p >
< pre class = "definitions" >
2020-04-06 11:26:10 +00:00
< span class = "definitionkeyword" > define< / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > < / span > < span class = "constant" > 8< / span > < span class = "plain" > *1024< / span >
2019-02-04 22:26:45 +00:00
< / pre >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > typedef< / span > < span class = "plain" > < / span > < span class = "reserved" > char< / span > < span class = "plain" > < / span > < span class = "identifier" > string< / span > < span class = "plain" > [< / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > +1];< / span >
2019-02-04 22:26:45 +00:00
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "inwebparagraph" > < a id = "SP2" > < / a > < b > § 2. < / b > Occasionally we need access to the real, unbounded strlen:
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::strlen_unbounded< / span > < span class = "plain" > (< / span > < span class = "reserved" > const< / span > < span class = "plain" > < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > p< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > return< / span > < span class = "plain" > (< / span > < span class = "reserved" > int< / span > < span class = "plain" > ) < / span > < span class = "identifier" > strlen< / span > < span class = "plain" > (< / span > < span class = "identifier" > p< / span > < span class = "plain" > );< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::strlen_unbounded appears nowhere else.< / p >
< p class = "inwebparagraph" > < a id = "SP3" > < / a > < b > § 3. < / b > Any out-of-range access immediately halts the program; this is drastic, but
an attempt to continue execution after a string overflow might conceivably
result in a malformatted shell command being passed to the operating system,
which we cannot risk.
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::check_len< / span > < span class = "plain" > (< / span > < span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > n< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > if< / span > < span class = "plain" > ((< / span > < span class = "identifier" > n< / span > < span class = "plain" > > < / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > ) || (< / span > < span class = "identifier" > n< / span > < span class = "plain" > < < / span > < span class = "constant" > 0< / span > < span class = "plain" > )) < / span > < span class = "functiontext" > Errors::fatal< / span > < span class = "plain" > (< / span > < span class = "string" > "String overflow\n"< / span > < span class = "plain" > );< / span >
< span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "identifier" > n< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::check_len is used in < a href = "#SP5" > § 5< / a > .< / p >
< p class = "inwebparagraph" > < a id = "SP4" > < / a > < b > § 4. < / b > The following is then protected from reading out of range if given a
non-terminated string, though this should never actually happen.
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::len< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > str< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > for< / span > < span class = "plain" > (< / span > < span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > i< / span > < span class = "plain" > =0; < / span > < span class = "identifier" > i< / span > < span class = "plain" > < =< / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > ; < / span > < span class = "identifier" > i< / span > < span class = "plain" > ++)< / span >
< span class = "reserved" > if< / span > < span class = "plain" > (< / span > < span class = "identifier" > str< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ] == < / span > < span class = "constant" > 0< / span > < span class = "plain" > ) < / span > < span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "identifier" > i< / span > < span class = "plain" > ;< / span >
< span class = "identifier" > str< / span > < span class = "plain" > [< / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > ] = < / span > < span class = "constant" > 0< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::len is used in < a href = "#SP5" > § 5< / a > .< / p >
< p class = "inwebparagraph" > < a id = "SP5" > < / a > < b > § 5. < / b > We then have a replacement for < code class = "display" > < span class = "extract" > strcpy< / span > < / code > , identical except that it's
bounds-checked:
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > void< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::copy< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > to< / span > < span class = "plain" > , < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > from< / span > < span class = "plain" > ) {< / span >
< span class = "functiontext" > CStrings::check_len< / span > < span class = "plain" > (< / span > < span class = "functiontext" > CStrings::len< / span > < span class = "plain" > (< / span > < span class = "identifier" > from< / span > < span class = "plain" > ));< / span >
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > i< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > for< / span > < span class = "plain" > (< / span > < span class = "identifier" > i< / span > < span class = "plain" > =0; ((< / span > < span class = "identifier" > from< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ]) & & (< / span > < span class = "identifier" > i< / span > < span class = "plain" > < < / span > < span class = "constant" > MAX_STRING_LENGTH< / span > < span class = "plain" > )); < / span > < span class = "identifier" > i< / span > < span class = "plain" > ++) < / span > < span class = "identifier" > to< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ] = < / span > < span class = "identifier" > from< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ];< / span >
2020-04-04 19:46:43 +00:00
< span class = "identifier" > to< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ] = < / span > < span class = "constant" > 0< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::copy appears nowhere else.< / p >
< p class = "inwebparagraph" > < a id = "SP6" > < / a > < b > § 6. < / b > String comparisons will be done with the following, not < code class = "display" > < span class = "extract" > strcmp< / span > < / code > directly:
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::ne< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > A< / span > < span class = "plain" > , < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > B< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > return< / span > < span class = "plain" > (< / span > < span class = "functiontext" > CStrings::cmp< / span > < span class = "plain" > (< / span > < span class = "identifier" > A< / span > < span class = "plain" > , < / span > < span class = "identifier" > B< / span > < span class = "plain" > ) == < / span > < span class = "constant" > 0< / span > < span class = "plain" > )?< / span > < span class = "identifier" > FALSE:TRUE< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::ne appears nowhere else.< / p >
< p class = "inwebparagraph" > < a id = "SP7" > < / a > < b > § 7. < / b > On the rare occasions when we need to sort alphabetically we'll also call:
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::cmp< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > A< / span > < span class = "plain" > , < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > B< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > if< / span > < span class = "plain" > ((< / span > < span class = "identifier" > A< / span > < span class = "plain" > == < / span > < span class = "identifier" > NULL< / span > < span class = "plain" > ) || (< / span > < span class = "identifier" > A< / span > < span class = "plain" > [0] == < / span > < span class = "constant" > 0< / span > < span class = "plain" > )) {< / span >
< span class = "reserved" > if< / span > < span class = "plain" > ((< / span > < span class = "identifier" > B< / span > < span class = "plain" > == < / span > < span class = "identifier" > NULL< / span > < span class = "plain" > ) || (< / span > < span class = "identifier" > B< / span > < span class = "plain" > [0] == < / span > < span class = "constant" > 0< / span > < span class = "plain" > )) < / span > < span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "constant" > 0< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > return< / span > < span class = "plain" > -1;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > if< / span > < span class = "plain" > ((< / span > < span class = "identifier" > B< / span > < span class = "plain" > == < / span > < span class = "identifier" > NULL< / span > < span class = "plain" > ) || (< / span > < span class = "identifier" > B< / span > < span class = "plain" > [0] == < / span > < span class = "constant" > 0< / span > < span class = "plain" > )) < / span > < span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "constant" > 1< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > return< / span > < span class = "plain" > < / span > < span class = "identifier" > strcmp< / span > < span class = "plain" > (< / span > < span class = "identifier" > A< / span > < span class = "plain" > , < / span > < span class = "identifier" > B< / span > < span class = "plain" > );< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::cmp is used in < a href = "#SP6" > § 6< / a > .< / p >
< p class = "inwebparagraph" > < a id = "SP8" > < / a > < b > § 8. < / b > And the following is needed to deal with extension filenames on platforms
whose locale is encoded as UTF-8.
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > void< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::transcode_ISO_string_to_UTF8< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > p< / span > < span class = "plain" > , < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > dest< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > i< / span > < span class = "plain" > , < / span > < span class = "identifier" > j< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > for< / span > < span class = "plain" > (< / span > < span class = "identifier" > i< / span > < span class = "plain" > =0, < / span > < span class = "identifier" > j< / span > < span class = "plain" > =0; < / span > < span class = "identifier" > p< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ]; < / span > < span class = "identifier" > i< / span > < span class = "plain" > ++) {< / span >
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > charcode< / span > < span class = "plain" > = (< / span > < span class = "reserved" > int< / span > < span class = "plain" > ) (((< / span > < span class = "reserved" > unsigned< / span > < span class = "plain" > < / span > < span class = "reserved" > char< / span > < span class = "plain" > *)< / span > < span class = "identifier" > p< / span > < span class = "plain" > )[< / span > < span class = "identifier" > i< / span > < span class = "plain" > ]);< / span >
< span class = "reserved" > if< / span > < span class = "plain" > (< / span > < span class = "identifier" > charcode< / span > < span class = "plain" > > = < / span > < span class = "constant" > 128< / span > < span class = "plain" > ) {< / span >
< span class = "identifier" > dest< / span > < span class = "plain" > [< / span > < span class = "identifier" > j< / span > < span class = "plain" > ++] = (< / span > < span class = "reserved" > char< / span > < span class = "plain" > ) (< / span > < span class = "constant" > 0xC0< / span > < span class = "plain" > + (< / span > < span class = "identifier" > charcode< / span > < span class = "plain" > > > < / span > < span class = "constant" > 6< / span > < span class = "plain" > ));< / span >
< span class = "identifier" > dest< / span > < span class = "plain" > [< / span > < span class = "identifier" > j< / span > < span class = "plain" > ++] = (< / span > < span class = "reserved" > char< / span > < span class = "plain" > ) (< / span > < span class = "constant" > 0x80< / span > < span class = "plain" > + (< / span > < span class = "identifier" > charcode< / span > < span class = "plain" > & < / span > < span class = "constant" > 0x3f< / span > < span class = "plain" > ));< / span >
< span class = "plain" > } < / span > < span class = "reserved" > else< / span > < span class = "plain" > {< / span >
2019-02-04 22:26:45 +00:00
< span class = "identifier" > dest< / span > < span class = "plain" > [< / span > < span class = "identifier" > j< / span > < span class = "plain" > ++] = < / span > < span class = "identifier" > p< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ];< / span >
< span class = "plain" > }< / span >
< span class = "plain" > }< / span >
2020-04-04 19:46:43 +00:00
< span class = "identifier" > dest< / span > < span class = "plain" > [< / span > < span class = "identifier" > j< / span > < span class = "plain" > ] = < / span > < span class = "constant" > 0< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::transcode_ISO_string_to_UTF8 appears nowhere else.< / p >
< p class = "inwebparagraph" > < a id = "SP9" > < / a > < b > § 9. < / b > I dislike to use < code class = "display" > < span class = "extract" > strncpy< / span > < / code > because, and for some reason this surprises
me every time, it truncates but fails to write a null termination character
if the string to be copied is larger than the buffer to write to: the
result is therefore not a well-formed string and we have to fix matters by
hand. This I think makes for opaque code. So:
< / p >
< pre class = "display" >
2020-04-06 11:26:10 +00:00
< span class = "reserved" > void< / span > < span class = "plain" > < / span > < span class = "functiontext" > CStrings::truncated_strcpy< / span > < span class = "plain" > (< / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > to< / span > < span class = "plain" > , < / span > < span class = "reserved" > char< / span > < span class = "plain" > *< / span > < span class = "identifier" > from< / span > < span class = "plain" > , < / span > < span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > max< / span > < span class = "plain" > ) {< / span >
< span class = "reserved" > int< / span > < span class = "plain" > < / span > < span class = "identifier" > i< / span > < span class = "plain" > ;< / span >
< span class = "reserved" > for< / span > < span class = "plain" > (< / span > < span class = "identifier" > i< / span > < span class = "plain" > =0; ((< / span > < span class = "identifier" > from< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ]) & & (< / span > < span class = "identifier" > i< / span > < span class = "plain" > < < / span > < span class = "identifier" > max< / span > < span class = "plain" > -1)); < / span > < span class = "identifier" > i< / span > < span class = "plain" > ++) < / span > < span class = "identifier" > to< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ] = < / span > < span class = "identifier" > from< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ];< / span >
2020-04-04 19:46:43 +00:00
< span class = "identifier" > to< / span > < span class = "plain" > [< / span > < span class = "identifier" > i< / span > < span class = "plain" > ] = < / span > < span class = "constant" > 0< / span > < span class = "plain" > ;< / span >
2019-02-04 22:26:45 +00:00
< span class = "plain" > }< / span >
< / pre >
< p class = "inwebparagraph" > < / p >
< p class = "endnote" > The function CStrings::truncated_strcpy is used in 2/dl (< a href = "2-dl.html#SP6" > § 6< / a > ).< / p >
2019-03-12 23:32:12 +00:00
< hr class = "tocbar" >
2020-04-09 17:32:37 +00:00
< ul class = "toc" > < li > < a href = "4-chr.html" > Back to 'Characters'< / a > < / li > < li > < a href = "4-ws.html" > Continue with 'Wide Strings'< / a > < / li > < / ul > < hr class = "tocbar" >
2019-03-18 11:16:10 +00:00
<!-- End of weave -->
2020-03-19 00:03:04 +00:00
< / main >
2019-02-04 22:26:45 +00:00
< / body >
< / html >