Preliminaries: Nowebify.

This commit is contained in:
AwesomeAdam54321 2024-03-09 21:59:46 +08:00
parent e44d5e1b34
commit eec9a49d8c

View file

@ -2,7 +2,7 @@ How This Program Works.
An overview of how Inweb works, with links to all of its important functions.
@h Prerequisites.
@ \section{Prerequisites.}
This page is to help readers to get their bearings in the source code for
Inweb, which is a literate program or "web". Before diving in:
(a) It helps to have some experience of reading webs. The short examples
@ -11,22 +11,22 @@ Inweb, which is a literate program or "web". Before diving in:
fact that it uses some extension syntaxes provided by //inweb// itself.
Turn to //The InC Dialect// for full details, but essentially: it's plain
old C without predeclarations or header files, and where functions have names
like |Tags::add_by_name| rather than just |add_by_name|.
like [[Tags::add_by_name]] rather than just [[add_by_name]].
(c) Inweb makes use of a "module" of utility functions called //foundation//.
This is a web in its own right. There's no need to read it, but you may want
to take a quick look at //foundation: A Brief Guide to Foundation// or the
example //eastertide//.
@h Working out what to do, and what to do it to.
@ \section{Working out what to do, and what to do it to.}
Inweb is a C program, so it begins at //main//, in //Program Control//. PC
works out where Inweb is installed, then calls //Configuration//, which
//reads the command line options -> Configuration::read//.
The user's choices are stored in an //inweb_instructions// object, and Inweb
is put into one of four modes: |TANGLE_MODE|, |WEAVE_MODE|, |ANALYSE_MODE|, or
|TRANSLATE_MODE|.[1] Inweb never changes mode: once set, it remains
is put into one of four modes: [[TANGLE_MODE]], [[WEAVE_MODE]], [[ANALYSE_MODE]], or
[[TRANSLATE_MODE]].[1] Inweb never changes mode: once set, it remains
for the rest of the run. Inweb also acts on only one main web in any run,
unless in |TRANSLATE_MODE|, in which case none.
unless in [[TRANSLATE_MODE]], in which case none.
Once it has worked through the command line, //Configuration// also calls
//Colonies::load// to read the colony file, if one was given (see
@ -39,8 +39,8 @@ no traveller returns.
[1] Tangling and weaving are fundamental to all LP tools. Analysis means, say,
reading a web and listing functions in it. Translation is for side-activities
like //making makefiles -> Makefiles// or //gitignores -> Git Support//.
Strictly speaking there is also |NO_MODE| for runs where the user simply
asked for |-help| at the command line.
Strictly speaking there is also [[NO_MODE]] for runs where the user simply
asked for [[-help]] at the command line.
@ //Program Control// then resumes, calling //Main::follow_instructions// to
act on the //inweb_instructions// object. If the user did specify a web to
@ -55,26 +55,26 @@ and also references to a //chapter_md// for each chapter, and a //section_md//
for each section. There is always at least one //chapter_md//, each of which
has at least one //section_md//.[1] The "range text" for each chapter and
section is set here, which affects leafnames used in woven websites.[2] The
optional |build.txt| file for a web is read by //BuildFiles::read//, and the
optional [[build.txt]] file for a web is read by //BuildFiles::read//, and the
semantic version number determined at //BuildFiles::deduce_semver//.
Where a web imports a module, as for instance the //eastertide// example does,
//WebMetadata::get// creates a //module// object for each import. In any event,
it also creates a module called |"(main)"| to represent the main, non-imported,
it also creates a module called [["(main)"]] to represent the main, non-imported,
part of the overall program. Each module object also refers to the //chapter_md//
and //section_md// objects.[3]
The result of //Reader::load_web// is an object called a //web//, which expands
on the metadata considerably. If |W| is a web, |W->md| produces its //web_md//
metadata, but |W| also has numerous other fields.
on the metadata considerably. If [[W]] is a web, [[W->md]] produces its //web_md//
metadata, but [[W]] also has numerous other fields.
[1] For single-file webs like //twinprimes//, with no contents pages, Inweb
makes what it calls an "implied" chapter and section heading.
[2] Range texts are used at the command line, and in |-catalogue| output, for
[2] Range texts are used at the command line, and in [[-catalogue]] output, for
example; and also to determine leafnames of pages in a website being woven.
A range is really just an abbreviation. For example, |M| is the range for the
Manual chapter, |2/tp| for the section "The Parser" in Chapter 2.
A range is really just an abbreviation. For example, [[M]] is the range for the
Manual chapter, [[2/tp]] for the section "The Parser" in Chapter 2.
[3] The difference is that the //web_md// lists every chapter and section,
imported or not, whereas the //module// lists only those falling under its
@ -91,21 +91,21 @@ Inweb syntax is heavily line-based, and every line of every section file (except
the Contents page) becomes a //source_line//. In the end, then, Inweb has built
a four-level hierarchy on top of the more basic three-level hierarchy produced
by //foundation//:
= (hyperlinked text as BoxArt)
INWEB //web// ----> //chapter// ----> //section// ----> //source_line//
| | |
[[ | ]]
FOUNDATION //web_md// ----> //chapter_md// ----> //section_md//
//module//
=
@ The third stage is to call //Parser::parse_web//. This is where we check that
the web is syntactically valid line-by-line, reporting errors if any using
by calling //Main::error_in_web//. Each line is assigned a "category": for
example, the category |DEFINITIONS_LCAT| is given to lines holding definitions
made with |@d| or |@e|. See //Line Categories// for the complete roster.[1]
Running Inweb with the |-scan| switch lists out the lines parsed in this way;
example, the category [[DEFINITIONS_LCAT]] is given to lines holding definitions
made with [[@d]] or [[@e]]. See //Line Categories// for the complete roster.[1]
Running Inweb with the [[-scan]] switch lists out the lines parsed in this way;
for example:
= (text from Figures/scan.txt)
(text from Figures/scan.txt)
[1] There are more than 20, but many are no longer needed in "version 2" of
the Inweb syntax, which is the only one anyone should still use. Continuing
@ -120,20 +120,20 @@ usually a number like "2.3.1". Those numbers are assigned hierarchically,[1]
which is not a trivial algorithm: see //Numbering::number_web//.
It is the parser which finds all of the "paragraph macros", the term used
in the source code for named stretches of code in |@<...@>| notation. A
in the source code for named stretches of code in [[<<...>>]] notation. A
//para_macro// object is created for each one, and every section has its own
collection, stored in a |linked_list|.[2] Similarly, the parser finds all of
collection, stored in a [[linked_list]].[2] Similarly, the parser finds all of
the footnote texts, and works out their proper numbering; each becomes a
//footnote// object.[3]
At the end of the third stage, then, everything's ready to go, and in memory
we now have something like this:
= (hyperlinked text as BoxArt)
INWEB //web// ----> //chapter// ----> //section// ----> //paragraph// ----> //source_line//
| | | //para_macro//
[[ | ]] //para_macro//
FOUNDATION //web_md// ----> //chapter_md// ----> //section_md//
//module//
=
[1] Unlike in CWEB and other past literate programming tools, in which
paragraphs -- sometimes called "sections" by those programs, a different use
@ -141,19 +141,19 @@ of the word to ours -- are numbered simply 1, 2, 3, ..., through the entire
program. Doing this would entail some webs in the Inform project running up
to nearly 8000.
[2] In real-world use, to use a |dictionary| instead would involve more
[2] In real-world use, to use a [[dictionary]] instead would involve more
overhead than gain: there are never very many paragraph macros per section.
[3] Though the parser is not able to check that the footnotes are all used;
that's done at weaving time instead.
@h Programming languages.
@ \section{Programming languages.}
The contents page of a web usually mentions one or more programming languages.
A line at the top like
= (text as Inweb)
Language: C
=
results in the text "C" being stored in the bibliographic datum |"Language"|,
results in the text "C" being stored in the bibliographic datum [["Language"]],
and if contents lines for chapters or sections specify other languages,[1]
the loader stores those in the relevant //chapter_md// or //section_md//
objects. But to the loader, these are all just names.
@ -161,9 +161,9 @@ objects. But to the loader, these are all just names.
The reader then loads in definitions of these programming languages by
calling //Languages::find_by_name//, and the parser does the same when it
finds extract lines like
= (text as Inweb)
= (text as ACME)
=
to say that a passage of text must be syntax-coloured like the ACME language.
//Languages::find_by_name// is thus called at any time when Inweb finds need
@ -181,17 +181,17 @@ to a //programming_language//.
[1] A little-used feature of Inweb, which should arguably be taken out as
unnecessary now that colonies allow for multiple webs to coexist happily.
@h Weaving mode.
@ \section{Weaving mode.}
Let's get back to //Program Control//, which has now set everything up and is
about to take action. What it does depends on which of the four modes Inweb
is in; we'll start with |WEAVE_MODE|, the most difficult.
is in; we'll start with [[WEAVE_MODE]], the most difficult.
Weaves are highly comfigurable, so they depend on several factors:
(a) Which format is used, as represented by a //weave_format// object. For
example, HTML, ePub and PDF are all formats.
(b) Which pattern is used, as represented by a //weave_pattern// object. A
pattern is a choice of format together with some naming conventions and
auxiliary files. For example, |GitHubPages| is a pattern which imposes HTML
auxiliary files. For example, [[GitHubPages]] is a pattern which imposes HTML
format but also throws in, for example, the GitHub logo icon.
(c) Whether a filter to particular tags is used, as represented by a
//theme_tag//.[1]
@ -200,8 +200,8 @@ thing, but sometimes just one chapter, or just one section, and sometimes
a special setting for "do all chapters one at a time" or "do all sections
one at a time", a procedure called //The Swarm//.
[1] For example, Inweb automatically applies the |"Functions"| tag to any
paragraph defining one (see //Types and Functions//), and using |-weave-tag|
[1] For example, Inweb automatically applies the [["Functions"]] tag to any
paragraph defining one (see //Types and Functions//), and using [[-weave-tag]]
at the command line filters the weave down to just these. Sing to the tune
of Suzanne Vega's "Freeze Tag".
@ -247,10 +247,10 @@ The trickiest point of building the weave tree is done by //The Weaver of Text//
which breaks up lines of commentary or code to identify uses of mathematical
notation, footnote cues, function calls, and so on.
A convenience for testing the weave algorithm is to |-weave-as TestingInweb|.
|TestingInweb| is a weave pattern that outputs a textual representation of
A convenience for testing the weave algorithm is to [[-weave-as TestingInweb]].
[[TestingInweb]] is a weave pattern that outputs a textual representation of
the weave tree. For example:
= (text from Figures/tree.txt)
(text from Figures/tree.txt)
This is a "heterogeneous tree", in that its //tree_node// nodes are annotated
by data structures of different types. For example, a node for a section
heading is annotated with a //weave_section_header_node// structure. The
@ -263,10 +263,10 @@ it on Spotify.
@ Syntax-colouring is worth further mention. Just as the Weaver tries not to
get itself into fiddly details of formats, it also avoids specifics of
programming languages. It does this by calling //LanguageMethods::syntax_colour//,
which in turn calls the |SYNTAX_COLOUR_WEA_MTID| method for the relevant
which in turn calls the [[SYNTAX_COLOUR_WEA_MTID]] method for the relevant
instance of //programming_language//. In effect the weaver sends a snippet
of code and asks to be told how it's to be coloured: not in terms of green
vs blue, but in terms of |IDENTIFIER_COLOUR| vs |RESERVED_COLOUR| and so on.
vs blue, but in terms of [[IDENTIFIER_COLOUR]] vs [[RESERVED_COLOUR]] and so on.
Thus, the object representing "the C programming language" can in principle
choose any semantic colouring that it likes. In practice, if (as is usual) it
@ -281,7 +281,7 @@ in effect, a mini-language of their own, which is compiled by
@ So, then, the weave tree is now made. Just as each programming language
has an object representing it, so does each format, and at render time the
method call |RENDER_FOR_MTID| is sent to it. This has to turn the tree into
method call [[RENDER_FOR_MTID]] is sent to it. This has to turn the tree into
HTML, plain text, TeX source, or whatever may be. It's understood that not
every rendering instruction in the weave tree can be fully followed in every
format: for example, there's not much that plain text can do to render an
@ -298,7 +298,7 @@ has been eclipsed by...
Renderers should make requests for weave plugins or colour schemes if, and
only if, the need arises: for example, the HTML renderer requests the plugin
|Carousel| only if an image carousel is actually called for. Requests are
[[Carousel]] only if an image carousel is actually called for. Requests are
made by calling //Swarm::ensure_plugin// or //Swarm::ensure_colour_scheme//,
and see also the underlying code at //Assets, Plugins and Colour Schemes//.
(We want our HTML to run as little JavaScript as necessary at load time, which
@ -309,7 +309,7 @@ for example, when weaving the text you are currently reading, Inweb has to
decide where to send //text_stream//. This is handled by a suite of useful
functions in //Colonies// which coordinate URLs across websites so
that one web's weave can safely link to another's. In particular, cross-references
written in |//this notation//| are "resolved" by //Colonies::resolve_reference_in_weave//,
written in [[//this notation//]] are "resolved" by //Colonies::resolve_reference_in_weave//,
and the function //Colonies::reference_URL// turns them into relative URLs
from any given file. Within the main web being woven, //Colonies::paragraph_URL//
can make a link to any paragraph of your choice.[1]
@ -323,7 +323,7 @@ subsystem which amounts to a stream editor. Its role is to work through a
"template" and substitute in material from outside -- from the weave rendering,
from the bibliographic data for a web, and so on -- to produce a final file.
For example, a simple use of the collater is to work through the template:
= (text)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
@ -334,9 +334,9 @@ For example, a simple use of the collater is to work through the template:
[[Weave Content]]
</body>
</html>
=
and to collate material already generated by other parts of Inweb to fill the
double-squared placeholders, such as |[[Plugins]]|. The Collater, in fact,
double-squared placeholders, such as [[[[Plugins]]]]. The Collater, in fact,
is ultimately what generates all of the files made in a weave, even though
other parts of Inweb did all of the real work.
@ -348,8 +348,8 @@ some collation commands call for further acts of collation to happen inside
the original. See //Advanced Weaving with Patterns// for more on collation,
and see //Collater::collate// for the machinery.
@h Tangling mode.
Alternatively, we're in |TANGLE_MODE|, which is more straightforward.
@ \section{Tangling mode.}
Alternatively, we're in [[TANGLE_MODE]], which is more straightforward.
//Program Control// simply works out what we want to tangle, selecting the
appropriate //tangle_target// object, and calls //Tangler::tangle//.
Most webs have just one "tangle target", meaning that the whole web makes
@ -366,14 +366,14 @@ following instructions from the language definition file.
Languages declaring themselves "C-like" have access to special tangling
facilities, all implemented with non-ACME method calls: see //C-Like Languages//.
In particular, for coping with how |#ifdef| affects |#include| see
In particular, for coping with how [[#ifdef]] affects [[#include]] see
//CLike::additional_early_matter//; for predeclaration of functions and
structs and |typedef|s, see //CLike::additional_predeclarations//.
structs and [[typedef]]s, see //CLike::additional_predeclarations//.
The language calling itself "InC" gets even more: see //InC Support//, and
in particular //text_literal// for text constants like |I"banana"|
in particular //text_literal// for text constants like [[I"banana"]]
and //preform_nonterminal// for Preform grammar notation like
|<sentence-ending>|.
[[<sentence-ending>]].
[1] The original intention of this feature was that a program might want
to have, as "appendices", certain configuration files or other extraneous
@ -384,18 +384,18 @@ it now seems better practice to make such a sidekick file its own web, and
use a colony file to make everything tidy on a woven website. So maybe this
feature can go.
@h Analysis mode.
Alternatively, we're in |ANALYSE_MODE|. There's not much to this: //Program Control//
@ \section{Analysis mode.}
Alternatively, we're in [[ANALYSE_MODE]]. There's not much to this: //Program Control//
simply calls //Analyser::catalogue_the_sections//, or else makes use of the same
functions as |TRANSLATE_MODE| would -- but in the context of having read in a
web. If it makes a |.gitignore| file, for example, it does so for that specific
web, whereas if the same feature is used in |TRANSLATE_MODE|, it does so in
functions as [[TRANSLATE_MODE]] would -- but in the context of having read in a
web. If it makes a [[.gitignore]] file, for example, it does so for that specific
web, whereas if the same feature is used in [[TRANSLATE_MODE]], it does so in
the abstract and for no particular web.
@h Translation mode.
Or, finally, we're in |TRANSLATE_MODE|. We can:
@ \section{Translation mode.}
Or, finally, we're in [[TRANSLATE_MODE]]. We can:
(a) make a makefile by calling //Makefiles::write//;
(b) make a |.gitignore| file by calling //Git::write_gitignore//;
(b) make a [[.gitignore]] file by calling //Git::write_gitignore//;
(c) advance the build number in a build file, by calling out to the
Foundation code at //BuildFiles::advance//;
(d) run a syntax-colouring test to help debug a programming language definition --
@ -404,7 +404,7 @@ see //Program Control// itself for details.
And that is essentially it. Inweb winds up by returning exit code 1 if there
were errors, or 0 if not, like a good Unix citizen.
@h Adding to Inweb.
@ \section{Adding to Inweb.}
Here's some miscellaneous advice for those who would like to add to Inweb:
1. To add a new command-line switch, declare at //Configuration::read// and