437 lines
23 KiB
Text
437 lines
23 KiB
Text
How This Program Works.
|
|
|
|
An overview of how Inweb works, with links to all of its important functions.
|
|
|
|
@ \section{Prerequisites.}
|
|
This page is to help readers to get their bearings in the source code for
|
|
Inweb, which is a literate program or "web". Before diving in:
|
|
(a) It helps to have some experience of reading webs. The short examples
|
|
//goldbach// and //twinprimes// are enough to give the idea.
|
|
(b) Inweb is written in C, in fact ANSI C99, but this is disguised by the
|
|
fact that it uses some extension syntaxes provided by //inweb// itself.
|
|
Turn to //The InC Dialect// for full details, but essentially: it's plain
|
|
old C without predeclarations or header files, and where functions have names
|
|
like [[Tags::add_by_name]] rather than just [[add_by_name]].
|
|
(c) Inweb makes use of a "module" of utility functions called //foundation//.
|
|
This is a web in its own right. There's no need to read it, but you may want
|
|
to take a quick look at //foundation: A Brief Guide to Foundation// or the
|
|
example //eastertide//.
|
|
|
|
@ \section{Working out what to do, and what to do it to.}
|
|
Inweb is a C program, so it begins at //main//, in //Program Control//. PC
|
|
works out where Inweb is installed, then calls //Configuration//, which
|
|
//reads the command line options -> Configuration::read//.
|
|
|
|
The user's choices are stored in an //inweb_instructions// object, and Inweb
|
|
is put into one of four modes: [[TANGLE_MODE]], [[WEAVE_MODE]], [[ANALYSE_MODE]], or
|
|
[[TRANSLATE_MODE]].[1] Inweb never changes mode: once set, it remains
|
|
for the rest of the run. Inweb also acts on only one main web in any run,
|
|
unless in [[TRANSLATE_MODE]], in which case none.
|
|
|
|
Once it has worked through the command line, //Configuration// also calls
|
|
//Colonies::load// to read the colony file, if one was given (see
|
|
//Making Weaves into Websites//), and uses this to preset some settings:
|
|
see //Configuration::member_and_colony//.
|
|
|
|
All errors in configuration are sent to //Errors::fatal//, from whose bourne
|
|
no traveller returns.
|
|
|
|
[1] Tangling and weaving are fundamental to all LP tools. Analysis means, say,
|
|
reading a web and listing functions in it. Translation is for side-activities
|
|
like //making makefiles -> Makefiles// or //gitignores -> Git Support//.
|
|
Strictly speaking there is also [[NO_MODE]] for runs where the user simply
|
|
asked for [[-help]] at the command line.
|
|
|
|
@ //Program Control// then resumes, calling //Main::follow_instructions// to
|
|
act on the //inweb_instructions// object. If the user did specify a web to
|
|
work on, PC then goes through three stages to understand it.
|
|
|
|
First, PC calls //Reader::load_web// to read the metadata of the web -- that is,
|
|
its title and author, how it breaks down into chapters and sections, and what
|
|
modules it imports. The real work is done by the Foundation library function
|
|
//WebMetadata::get//, which returns a //web_md// object, providing details
|
|
such as its declared author and title (see //Bibliographic Data for Webs//),
|
|
and also references to a //chapter_md// for each chapter, and a //section_md//
|
|
for each section. There is always at least one //chapter_md//, each of which
|
|
has at least one //section_md//.[1] The "range text" for each chapter and
|
|
section is set here, which affects leafnames used in woven websites.[2] The
|
|
optional [[build.txt]] file for a web is read by //BuildFiles::read//, and the
|
|
semantic version number determined at //BuildFiles::deduce_semver//.
|
|
|
|
Where a web imports a module, as for instance the //eastertide// example does,
|
|
//WebMetadata::get// creates a //module// object for each import. In any event,
|
|
it also creates a module called [["(main)"]] to represent the main, non-imported,
|
|
part of the overall program. Each module object also refers to the //chapter_md//
|
|
and //section_md// objects.[3]
|
|
|
|
The result of //Reader::load_web// is an object called a //web//, which expands
|
|
on the metadata considerably. If [[W]] is a web, [[W->md]] produces its //web_md//
|
|
metadata, but [[W]] also has numerous other fields.
|
|
|
|
[1] For single-file webs like //twinprimes//, with no contents pages, Inweb
|
|
makes what it calls an "implied" chapter and section heading.
|
|
|
|
[2] Range texts are used at the command line, and in [[-catalogue]] output, for
|
|
example; and also to determine leafnames of pages in a website being woven.
|
|
A range is really just an abbreviation. For example, [[M]] is the range for the
|
|
Manual chapter, [[2/tp]] for the section "The Parser" in Chapter 2.
|
|
|
|
[3] The difference is that the //web_md// lists every chapter and section,
|
|
imported or not, whereas the //module// lists only those falling under its
|
|
own aegis.
|
|
|
|
@ After loading, the second stage is to call //Reader::read_web//. Whereas
|
|
loading was rapid and involved looking only at the contents page, reading
|
|
takes longer and means extracting every line of commentary or code. Just
|
|
as the loader wrapped the //web_md// in a larger //web// object, so too
|
|
the reader wraps each //chapter_md// in a //chapter//, and each //section_md//
|
|
in a //section//.
|
|
|
|
Inweb syntax is heavily line-based, and every line of every section file (except
|
|
the Contents page) becomes a //source_line//. In the end, then, Inweb has built
|
|
a four-level hierarchy on top of the more basic three-level hierarchy produced
|
|
by //foundation//:
|
|
|
|
INWEB //web// ----> //chapter// ----> //section// ----> //source_line//
|
|
[[ | ]]
|
|
FOUNDATION //web_md// ----> //chapter_md// ----> //section_md//
|
|
//module//
|
|
|
|
|
|
@ The third stage is to call //Parser::parse_web//. This is where we check that
|
|
the web is syntactically valid line-by-line, reporting errors if any using
|
|
by calling //Main::error_in_web//. Each line is assigned a "category": for
|
|
example, the category [[DEFINITIONS_LCAT]] is given to lines holding definitions
|
|
made with [[@d]] or [[@e]]. See //Line Categories// for the complete roster.[1]
|
|
Running Inweb with the [[-scan]] switch lists out the lines parsed in this way;
|
|
for example:
|
|
(text from Figures/scan.txt)
|
|
|
|
[1] There are more than 20, but many are no longer needed in "version 2" of
|
|
the Inweb syntax, which is the only one anyone should still use. Continuing
|
|
to support version 1 makes //The Parser// much fiddlier, and at some point we
|
|
will probably drop this quixotic goal.
|
|
|
|
@ The parser also recognises headings and footnotes, but most importantly, it
|
|
introduces an additional concept: the //paragraph//. Each nunbered passage
|
|
corresponds to one //paragraph// object; it may actually contain several
|
|
paragraphs of prose in the everyday English sense, but has just one heading,
|
|
usually a number like "2.3.1". Those numbers are assigned hierarchically,[1]
|
|
which is not a trivial algorithm: see //Numbering::number_web//.
|
|
|
|
It is the parser which finds all of the "paragraph macros", the term used
|
|
in the source code for named stretches of code in [[<<...>>]] notation. A
|
|
//para_macro// object is created for each one, and every section has its own
|
|
collection, stored in a [[linked_list]].[2] Similarly, the parser finds all of
|
|
the footnote texts, and works out their proper numbering; each becomes a
|
|
//footnote// object.[3]
|
|
|
|
At the end of the third stage, then, everything's ready to go, and in memory
|
|
we now have something like this:
|
|
|
|
INWEB //web// ----> //chapter// ----> //section// ----> //paragraph// ----> //source_line//
|
|
[[ | ]] //para_macro//
|
|
FOUNDATION //web_md// ----> //chapter_md// ----> //section_md//
|
|
//module//
|
|
|
|
|
|
[1] Unlike in CWEB and other past literate programming tools, in which
|
|
paragraphs -- sometimes called "sections" by those programs, a different use
|
|
of the word to ours -- are numbered simply 1, 2, 3, ..., through the entire
|
|
program. Doing this would entail some webs in the Inform project running up
|
|
to nearly 8000.
|
|
|
|
[2] In real-world use, to use a [[dictionary]] instead would involve more
|
|
overhead than gain: there are never very many paragraph macros per section.
|
|
|
|
[3] Though the parser is not able to check that the footnotes are all used;
|
|
that's done at weaving time instead.
|
|
|
|
@ \section{Programming languages.}
|
|
The contents page of a web usually mentions one or more programming languages.
|
|
A line at the top like
|
|
|
|
Language: C
|
|
|
|
results in the text "C" being stored in the bibliographic datum [["Language"]],
|
|
and if contents lines for chapters or sections specify other languages,[1]
|
|
the loader stores those in the relevant //chapter_md// or //section_md//
|
|
objects. But to the loader, these are all just names.
|
|
|
|
The reader then loads in definitions of these programming languages by
|
|
calling //Languages::find_by_name//, and the parser does the same when it
|
|
finds extract lines like
|
|
|
|
= (text as ACME)
|
|
|
|
to say that a passage of text must be syntax-coloured like the ACME language.
|
|
|
|
//Languages::find_by_name// is thus called at any time when Inweb finds need
|
|
of a language; it looks for a language definition file (see documentation
|
|
at //Supporting Programming Languages//), parses it one line at a time using
|
|
//Languages::read_definition_line//, and returns a //programming_language//
|
|
object. These correspond to their names: you cannot have two different PL
|
|
objects with languages both called "Python", say.
|
|
|
|
The practical effect is that a web can involve many languages, even though
|
|
the main use case is to have just one throughout. //web//, //chapter//,
|
|
//section// and even individual //source_line// objects all contain pointers
|
|
to a //programming_language//.
|
|
|
|
[1] A little-used feature of Inweb, which should arguably be taken out as
|
|
unnecessary now that colonies allow for multiple webs to coexist happily.
|
|
|
|
@ \section{Weaving mode.}
|
|
Let's get back to //Program Control//, which has now set everything up and is
|
|
about to take action. What it does depends on which of the four modes Inweb
|
|
is in; we'll start with [[WEAVE_MODE]], the most difficult.
|
|
|
|
Weaves are highly comfigurable, so they depend on several factors:
|
|
(a) Which format is used, as represented by a //weave_format// object. For
|
|
example, HTML, ePub and PDF are all formats.
|
|
(b) Which pattern is used, as represented by a //weave_pattern// object. A
|
|
pattern is a choice of format together with some naming conventions and
|
|
auxiliary files. For example, [[GitHubPages]] is a pattern which imposes HTML
|
|
format but also throws in, for example, the GitHub logo icon.
|
|
(c) Whether a filter to particular tags is used, as represented by a
|
|
//theme_tag//.[1]
|
|
(d) What subset of the web the user wants to weave -- by default the whole
|
|
thing, but sometimes just one chapter, or just one section, and sometimes
|
|
a special setting for "do all chapters one at a time" or "do all sections
|
|
one at a time", a procedure called //The Swarm//.
|
|
|
|
[1] For example, Inweb automatically applies the [["Functions"]] tag to any
|
|
paragraph defining one (see //Types and Functions//), and using [[-weave-tag]]
|
|
at the command line filters the weave down to just these. Sing to the tune
|
|
of Suzanne Vega's "Freeze Tag".
|
|
|
|
@ //Program Control// begins by attempting to load the weave pattern, with
|
|
//Patterns::find//; the syntax of weave pattern files can be found in
|
|
//Patterns::scan_pattern_line//.
|
|
|
|
It then either calls //Swarm::weave_subset// -- meaning, a subset of the
|
|
web, going into a single output file -- or //Swarm::weave//, which it turn
|
|
splits the web into subsets and sends each of those to //Swarm::weave_subset//.
|
|
|
|
//Swarm::weave// also causes an "index" to be made, though "index" here is
|
|
Inweb jargon for something which is more likely a contents page listing the
|
|
sections and linking to them.[1]
|
|
|
|
Either way, each single weaving operation arrives at //Swarm::weave_subset//,
|
|
which consolidates all the settings needed into a //weave_order// object:
|
|
it says, in effect, "weave content X into file Y using pattern Z".[2]
|
|
|
|
[1] No index is made if the user asked for only a single section or chapter
|
|
to be woven; only if there was a swarm.
|
|
|
|
[2] So when Inweb is used to construct the website you are, perhaps, reading
|
|
this text on, around 80 //weave_order// objects will be made, one for each
|
|
call to //Swarm::weave_subset//, which in turn is one for each section of the
|
|
source-code web of Inweb itself.
|
|
|
|
@ And so we descend into //The Weaver//, where the function //Weaver::weave//
|
|
is given the //weave_order// and told to get on with it.[1]
|
|
|
|
Rather than directly converting the source to (say) an HTML representation,
|
|
the Weaver first produces a "weave tree" which amounts to a format-neutral
|
|
list of rendering instructions: it then hands the tree over to
|
|
//Formats::render//. In this way, all specifics of individual output formats
|
|
are kept at arm's length from the actual weaving algorithm.
|
|
|
|
The weave tree is a simple business, built in a single pass of a depth-first
|
|
traverse of the web. The weaver keeps track of a modicum of "state" as it works,
|
|
and these running details are stored in a //weaver_state// object, but this is
|
|
thrown away as soon as the weaver finishes.
|
|
|
|
The trickiest point of building the weave tree is done by //The Weaver of Text//,
|
|
which breaks up lines of commentary or code to identify uses of mathematical
|
|
notation, footnote cues, function calls, and so on.
|
|
|
|
A convenience for testing the weave algorithm is to [[-weave-as TestingInweb]].
|
|
[[TestingInweb]] is a weave pattern that outputs a textual representation of
|
|
the weave tree. For example:
|
|
(text from Figures/tree.txt)
|
|
This is a "heterogeneous tree", in that its //tree_node// nodes are annotated
|
|
by data structures of different types. For example, a node for a section
|
|
heading is annotated with a //weave_section_header_node// structure. The
|
|
necessary types and object constructors are laid tediously out in
|
|
//Weave Tree//, a section which intentionally contains no non-trivial code.
|
|
|
|
[1] "Weaver, weave" really ought to be a folk song, but if so, I can't find
|
|
it on Spotify.
|
|
|
|
@ Syntax-colouring is worth further mention. Just as the Weaver tries not to
|
|
get itself into fiddly details of formats, it also avoids specifics of
|
|
programming languages. It does this by calling //LanguageMethods::syntax_colour//,
|
|
which in turn calls the [[SYNTAX_COLOUR_WEA_MTID]] method for the relevant
|
|
instance of //programming_language//. In effect the weaver sends a snippet
|
|
of code and asks to be told how it's to be coloured: not in terms of green
|
|
vs blue, but in terms of [[IDENTIFIER_COLOUR]] vs [[RESERVED_COLOUR]] and so on.
|
|
|
|
Thus, the object representing "the C programming language" can in principle
|
|
choose any semantic colouring that it likes. In practice, if (as is usual) it
|
|
assigns no particular code to this, what instead happens is that the generic
|
|
handler function in //ACME Support// takes on the task.[1] This runs the
|
|
colouring program in the language's definition file. Colouring programs are,
|
|
in effect, a mini-language of their own, which is compiled by
|
|
//Programming Languages// and then run in a low-level interpreter by
|
|
//The Painter//.
|
|
|
|
[1] "ACME" is used here in the sense of "generic".
|
|
|
|
@ So, then, the weave tree is now made. Just as each programming language
|
|
has an object representing it, so does each format, and at render time the
|
|
method call [[RENDER_FOR_MTID]] is sent to it. This has to turn the tree into
|
|
HTML, plain text, TeX source, or whatever may be. It's understood that not
|
|
every rendering instruction in the weave tree can be fully followed in every
|
|
format: for example, there's not much that plain text can do to render an
|
|
image carousel.
|
|
|
|
Inweb currently contains four renderers:
|
|
(a) //Debugging Format// renders the weave tree as a plain text display, and
|
|
is solely for testing.
|
|
(b) //TeX Format// renders the weave tree as TeX markup code -- in the early
|
|
days of literate programming, this was the sole weave format used; now it
|
|
has been eclipsed by...
|
|
(c) ...//HTML Formats//, which renders to HTML and also handles ePub ebooks.
|
|
(d) There is also //Plain Text Format//, a comically minimal approach.
|
|
|
|
Renderers should make requests for weave plugins or colour schemes if, and
|
|
only if, the need arises: for example, the HTML renderer requests the plugin
|
|
[[Carousel]] only if an image carousel is actually called for. Requests are
|
|
made by calling //Swarm::ensure_plugin// or //Swarm::ensure_colour_scheme//,
|
|
and see also the underlying code at //Assets, Plugins and Colour Schemes//.
|
|
(We want our HTML to run as little JavaScript as necessary at load time, which
|
|
is why we don't just give every weave every possible facility.)
|
|
|
|
The most complex issue for HTML rendering is working out the URLs for links:
|
|
for example, when weaving the text you are currently reading, Inweb has to
|
|
decide where to send //text_stream//. This is handled by a suite of useful
|
|
functions in //Colonies// which coordinate URLs across websites so
|
|
that one web's weave can safely link to another's. In particular, cross-references
|
|
written in [[//this notation//]] are "resolved" by //Colonies::resolve_reference_in_weave//,
|
|
and the function //Colonies::reference_URL// turns them into relative URLs
|
|
from any given file. Within the main web being woven, //Colonies::paragraph_URL//
|
|
can make a link to any paragraph of your choice.[1]
|
|
|
|
[1] Inweb anchors at paragraphs; it does not anchor at individual lines.
|
|
This is intentional, as it's intended to take the reader to just enough
|
|
context and explanation to understand what is being linked to.
|
|
|
|
@ Finally on weaving, special mention should go to //The Collater//, a
|
|
subsystem which amounts to a stream editor. Its role is to work through a
|
|
"template" and substitute in material from outside -- from the weave rendering,
|
|
from the bibliographic data for a web, and so on -- to produce a final file.
|
|
For example, a simple use of the collater is to work through the template:
|
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
|
|
<html>
|
|
<head>
|
|
<title>[[Booklet Title]]</title>
|
|
[[Plugins]]
|
|
</head>
|
|
<body>
|
|
[[Weave Content]]
|
|
</body>
|
|
</html>
|
|
|
|
and to collate material already generated by other parts of Inweb to fill the
|
|
double-squared placeholders, such as [[[[Plugins]]]]. The Collater, in fact,
|
|
is ultimately what generates all of the files made in a weave, even though
|
|
other parts of Inweb did all of the real work.
|
|
|
|
With that said, it's not a trivial algorithm, because it can also loop through
|
|
chapters and sections, as it does when it generates an index page to accompany
|
|
a swarm of individual section weaves. The contents pages for typical webs
|
|
presented online are made this way. The Collater is also recursive, in that
|
|
some collation commands call for further acts of collation to happen inside
|
|
the original. See //Advanced Weaving with Patterns// for more on collation,
|
|
and see //Collater::collate// for the machinery.
|
|
|
|
@ \section{Tangling mode.}
|
|
Alternatively, we're in [[TANGLE_MODE]], which is more straightforward.
|
|
//Program Control// simply works out what we want to tangle, selecting the
|
|
appropriate //tangle_target// object, and calls //Tangler::tangle//.
|
|
Most webs have just one "tangle target", meaning that the whole web makes
|
|
a single program -- in that case, the choice is obvious. However, the
|
|
contents section can mark certain chapters or sections as being independent
|
|
targets.[1]
|
|
|
|
//Tangler::tangle// works hierarchically, calling down to //Tangler::tangle_paragraph//
|
|
and finally //Tangler::tangle_line// on individual lines of code. Throughout
|
|
the process, the Tangler makes method calls to the current programming
|
|
language; see //Language Methods//. As with syntax-colouring, the default
|
|
arrangement is that these methods are handled by the generic "ACME" language,
|
|
following instructions from the language definition file.
|
|
|
|
Languages declaring themselves "C-like" have access to special tangling
|
|
facilities, all implemented with non-ACME method calls: see //C-Like Languages//.
|
|
In particular, for coping with how [[#ifdef]] affects [[#include]] see
|
|
//CLike::additional_early_matter//; for predeclaration of functions and
|
|
structs and [[typedef]]s, see //CLike::additional_predeclarations//.
|
|
|
|
The language calling itself "InC" gets even more: see //InC Support//, and
|
|
in particular //text_literal// for text constants like [[I"banana"]]
|
|
and //preform_nonterminal// for Preform grammar notation like
|
|
[[<sentence-ending>]].
|
|
|
|
[1] The original intention of this feature was that a program might want
|
|
to have, as "appendices", certain configuration files or other extraneous
|
|
matter needing explanation. The author was motivated here by the example of
|
|
"TeX", which was presented as a literate program, but was difficult fully
|
|
to understand without also reading its format files quite carefully. However,
|
|
it now seems better practice to make such a sidekick file its own web, and
|
|
use a colony file to make everything tidy on a woven website. So maybe this
|
|
feature can go.
|
|
|
|
@ \section{Analysis mode.}
|
|
Alternatively, we're in [[ANALYSE_MODE]]. There's not much to this: //Program Control//
|
|
simply calls //Analyser::catalogue_the_sections//, or else makes use of the same
|
|
functions as [[TRANSLATE_MODE]] would -- but in the context of having read in a
|
|
web. If it makes a [[.gitignore]] file, for example, it does so for that specific
|
|
web, whereas if the same feature is used in [[TRANSLATE_MODE]], it does so in
|
|
the abstract and for no particular web.
|
|
|
|
@ \section{Translation mode.}
|
|
Or, finally, we're in [[TRANSLATE_MODE]]. We can:
|
|
(a) make a makefile by calling //Makefiles::write//;
|
|
(b) make a [[.gitignore]] file by calling //Git::write_gitignore//;
|
|
(c) advance the build number in a build file, by calling out to the
|
|
Foundation code at //BuildFiles::advance//;
|
|
(d) run a syntax-colouring test to help debug a programming language definition --
|
|
see //Program Control// itself for details.
|
|
|
|
And that is essentially it. Inweb winds up by returning exit code 1 if there
|
|
were errors, or 0 if not, like a good Unix citizen.
|
|
|
|
@ \section{Adding to Inweb.}
|
|
Here's some miscellaneous advice for those who would like to add to Inweb:
|
|
|
|
1. To add a new command-line switch, declare at //Configuration::read// and
|
|
add a field to //inweb_instructions// which holds the setting; don't act on it
|
|
then and there, only in //Program Control// later. But we don't want these
|
|
settings to proliferate: ask first if adding a feature to, say, //Colonies//
|
|
or //weave_pattern// files would meet the same need.
|
|
|
|
2. To add new programming languages, try if possible to do everything you
|
|
need with a new definition file alone: see //Supporting Programming Languages//.
|
|
Failing that, see if making definition files more powerful would do it (for
|
|
example, by making the ACME support more general-purpose). Failing even that,
|
|
follow the model of //C-Like Languages//: that is, add logic to
|
|
//Languages::read_definition// which adds method receiver functions
|
|
to a language with a given name, or, preferably, some given declaration in
|
|
the language definition file. On no account insert any language bias into
|
|
//The Weaver// or //The Tangler//.
|
|
|
|
3. To add new forms of weave output, try if possible to make a new pattern:
|
|
see //Advanced Weaving with Patterns//. But this won't always be good enough.
|
|
For example, "an HTML website but done differently" should be a pattern based
|
|
on HTML, but Markdown would require a genuinely new format. (Though you would
|
|
still also create a new pattern in order to use it.) If you go down this road,
|
|
make a new section in //Chapter 5// following the model of, say,
|
|
//Plain Text Format// and then adding methods gradually.
|
|
(But don't forget to call your new format's creator function from
|
|
//Formats::create_weave_formats//.)
|
|
|
|
4. As with any program built on Foundation, if you are creating a new class of
|
|
object, don't forget to declare it in //Basics//.
|