inweb-bootstrap/Chapter_4/Language_Methods.nw

[LanguageMethods::] Language Methods.

To characterise the relevant differences in behaviour between the
various programming languages supported.

@ \section{Introduction.}
The conventions for writing, weaving and tangling a web are really quite
independent of the programming language being written, woven or tangled;
Knuth began literate programming with Pascal, but now uses C, and the original
Pascal webs were mechanically translated into C ones with remarkably little
fuss or bother. Modern LP tools, such as [[noweb]], aim to be language-agnostic.
But of course if you act the same on all languages, you give up the benefits
which might follow from knowing something about the languages you actually
write in.

The idea, then, is that Chapters 1 to 3 of the Inweb code treat all
material the same, and Chapter 4 contains all of the funny little exceptions
and special cases for particular programming languages. (This means Chapter 4
can't be understood without having at least browsed Chapters 1 to 3 first.)

Really all of the functionality of languages is provided through method calls,
all of them made from this section. That means a lot of simple wrapper routines
which don't do very much. This section may still be useful to read, since it
documents what amounts to an API.

@ \section{Parsing methods.}
We begin with parsing extensions. When these are used, we have already read
the web into chapters, sections and paragraphs, but for some languages we will
need a more detailed picture.

[[PARSE_TYPES_PAR_MTID]] gives a language to look for type declarations.

<<*>>=
enum PARSE_TYPES_PAR_MTID

<<*>>=
VOID_METHOD_TYPE(PARSE_TYPES_PAR_MTID, programming_language *pl, web *W)
void LanguageMethods::parse_types(web *W, programming_language *pl) {
	VOID_METHOD_CALL(pl, PARSE_TYPES_PAR_MTID, W);
}

@ [[PARSE_FUNCTIONS_PAR_MTID]] is, similarly, for function declarations.

<<*>>=
enum PARSE_FUNCTIONS_PAR_MTID

<<*>>=
VOID_METHOD_TYPE(PARSE_FUNCTIONS_PAR_MTID, programming_language *pl, web *W)
void LanguageMethods::parse_functions(web *W, programming_language *pl) {
	VOID_METHOD_CALL(pl, PARSE_FUNCTIONS_PAR_MTID, W);
}

@ [[FURTHER_PARSING_PAR_MTID]] is "further" in that it is called when the main
parser has finished work; it typically looks over the whole web for something
of interest.

<<*>>=
enum FURTHER_PARSING_PAR_MTID

<<*>>=
VOID_METHOD_TYPE(FURTHER_PARSING_PAR_MTID, programming_language *pl, web *W)
void LanguageMethods::further_parsing(web *W, programming_language *pl) {
	VOID_METHOD_CALL(pl, FURTHER_PARSING_PAR_MTID, W);
}

@ [[SUBCATEGORISE_LINE_PAR_MTID]] looks at a single line, after the main parser
has given it a category. The idea is not so much to second-guess the parser
(although we can) but to change to a more exotic category which it would
otherwise never produce.

<<*>>=
enum SUBCATEGORISE_LINE_PAR_MTID

<<*>>=
VOID_METHOD_TYPE(SUBCATEGORISE_LINE_PAR_MTID, programming_language *pl, source_line *L)
void LanguageMethods::subcategorise_line(programming_language *pl, source_line *L) {
	VOID_METHOD_CALL(pl, SUBCATEGORISE_LINE_PAR_MTID, L);
}

@ Comments have different syntax in different languages. The method here is
expected to look for a comment on the [[line]], and if so to return [[TRUE]],
but not before splicing the non-comment parts of the line before and
within the comment into the supplied strings.

<<*>>=
enum PARSE_COMMENT_TAN_MTID

<<*>>=
INT_METHOD_TYPE(PARSE_COMMENT_TAN_MTID, programming_language *pl, text_stream *line, text_stream *before, text_stream *within)

int LanguageMethods::parse_comment(programming_language *pl,
	text_stream *line, text_stream *before, text_stream *within) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, PARSE_COMMENT_TAN_MTID, line, before, within);
	return rv;
}

@ \section{Tangling methods.}
We take these roughly in order of their effects on the tangled output, from
the top to the bottom of the file.

The top of the tangled file is a header called the "shebang". By default,
there's nothing there, but [[SHEBANG_TAN_MTID]] allows the language to add one.
For example, Perl prints [[#!/usr/bin/perl]] here.

<<*>>=
enum SHEBANG_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(SHEBANG_TAN_MTID, programming_language *pl, text_stream *OUT, web *W, tangle_target *target)
void LanguageMethods::shebang(OUTPUT_STREAM, programming_language *pl, web *W, tangle_target *target) {
	VOID_METHOD_CALL(pl, SHEBANG_TAN_MTID, OUT, W, target);
}

@ Next is the disclaimer, text warning the human reader that she is looking
at tangled (therefore not original) material.

<<*>>=
enum SUPPRESS_DISCLAIMER_TAN_MTID

<<*>>=
INT_METHOD_TYPE(SUPPRESS_DISCLAIMER_TAN_MTID, programming_language *pl)
void LanguageMethods::disclaimer(text_stream *OUT, programming_language *pl, web *W, tangle_target *target) {
	int rv = FALSE;
	INT_METHOD_CALL_WITHOUT_ARGUMENTS(rv, pl, SUPPRESS_DISCLAIMER_TAN_MTID);
	if (rv == FALSE)
		LanguageMethods::comment(OUT, pl, I"Tangled output generated by inweb: do not edit");
}

@ Next is the disclaimer, text warning the human reader that she is looking
at tangled (therefore not original) material.

<<*>>=
enum ADDITIONAL_EARLY_MATTER_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(ADDITIONAL_EARLY_MATTER_TAN_MTID, programming_language *pl, text_stream *OUT, web *W, tangle_target *target)
void LanguageMethods::additional_early_matter(text_stream *OUT, programming_language *pl, web *W, tangle_target *target) {
	VOID_METHOD_CALL(pl, ADDITIONAL_EARLY_MATTER_TAN_MTID, OUT, W, target);
}

@ A tangled file then normally declares "definitions". The following write a
definition of the constant named [[term]] as the value given. If the value spans
multiple lines, the first-line part is supplied to [[START_DEFN_TAN_MTID]] and
then subsequent lines are fed in order to [[PROLONG_DEFN_TAN_MTID]]. At the end,
[[END_DEFN_TAN_MTID]] is called.

<<*>>=
enum START_DEFN_TAN_MTID
enum PROLONG_DEFN_TAN_MTID
enum END_DEFN_TAN_MTID

<<*>>=
INT_METHOD_TYPE(START_DEFN_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *term, text_stream *start, section *S, source_line *L)
INT_METHOD_TYPE(PROLONG_DEFN_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *more, section *S, source_line *L)
INT_METHOD_TYPE(END_DEFN_TAN_MTID, programming_language *pl, text_stream *OUT, section *S, source_line *L)

void LanguageMethods::start_definition(OUTPUT_STREAM, programming_language *pl,
	text_stream *term, text_stream *start, section *S, source_line *L) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, START_DEFN_TAN_MTID, OUT, term, start, S, L);
	if (rv == FALSE)
		Main::error_in_web(I"this programming language does not support @d", L);
}

void LanguageMethods::prolong_definition(OUTPUT_STREAM, programming_language *pl,
	text_stream *more, section *S, source_line *L) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, PROLONG_DEFN_TAN_MTID, OUT, more, S, L);
	if (rv == FALSE)
		Main::error_in_web(I"this programming language does not support multiline @d", L);
}

void LanguageMethods::end_definition(OUTPUT_STREAM, programming_language *pl,
	section *S, source_line *L) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, END_DEFN_TAN_MTID, OUT, S, L);
}

@ Then we have some "predeclarations"; for example, for C-like languages we
automatically predeclare all functions, obviating the need for header files.

<<*>>=
enum ADDITIONAL_PREDECLARATIONS_TAN_MTID

<<*>>=
INT_METHOD_TYPE(ADDITIONAL_PREDECLARATIONS_TAN_MTID, programming_language *pl, text_stream *OUT, web *W)
void LanguageMethods::additional_predeclarations(OUTPUT_STREAM, programming_language *pl, web *W) {
	VOID_METHOD_CALL(pl, ADDITIONAL_PREDECLARATIONS_TAN_MTID, OUT, W);
}

@ So much for the special material at the top of a tangle: now we're into
the more routine matter, tangling ordinary paragraphs into code.

Languages have the ability to suppress paragraph macro expansion:

<<*>>=
enum SUPPRESS_EXPANSION_TAN_MTID

<<*>>=
INT_METHOD_TYPE(SUPPRESS_EXPANSION_TAN_MTID, programming_language *pl, text_stream *material)
int LanguageMethods::allow_expansion(programming_language *pl, text_stream *material) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, SUPPRESS_EXPANSION_TAN_MTID, material);
	return (rv)?FALSE:TRUE;
}

@ Inweb supports very few "tangle commands", that is, instructions written
inside double squares [[[[Thus]]]]. These can be handled by attaching methods
as follows, which return [[TRUE]] if they recognised and acted on the command.

<<*>>=
enum TANGLE_COMMAND_TAN_MTID

<<*>>=
INT_METHOD_TYPE(TANGLE_COMMAND_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *data)

int LanguageMethods::special_tangle_command(OUTPUT_STREAM, programming_language *pl, text_stream *data) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, TANGLE_COMMAND_TAN_MTID, OUT, data);
	return rv;
}

@ The following methods make it possible for languages to tangle unorthodox
lines into code. Ordinarily, only [[CODE_BODY_LCAT]] lines are tangled, but
we can intervene to say that we want to tangle a different line; and if we
do so, we should then act on that basis.

<<*>>=
enum WILL_TANGLE_EXTRA_LINE_TAN_MTID
enum TANGLE_EXTRA_LINE_TAN_MTID

<<*>>=
INT_METHOD_TYPE(WILL_TANGLE_EXTRA_LINE_TAN_MTID, programming_language *pl, source_line *L)
VOID_METHOD_TYPE(TANGLE_EXTRA_LINE_TAN_MTID, programming_language *pl, text_stream *OUT, source_line *L)
int LanguageMethods::will_insert_in_tangle(programming_language *pl, source_line *L) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, WILL_TANGLE_EXTRA_LINE_TAN_MTID, L);
	return rv;
}
void LanguageMethods::insert_in_tangle(OUTPUT_STREAM, programming_language *pl, source_line *L) {
	VOID_METHOD_CALL(pl, TANGLE_EXTRA_LINE_TAN_MTID, OUT, L);
}

@ In order for C compilers to report C syntax errors on the correct line,
despite rearranging by automatic tools, C conventionally recognises the
preprocessor directive [[#line]] to tell it that a contiguous extract follows
from the given file; we generate this automatically.

<<*>>=
enum INSERT_LINE_MARKER_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(INSERT_LINE_MARKER_TAN_MTID, programming_language *pl, text_stream *OUT, source_line *L)
void LanguageMethods::insert_line_marker(OUTPUT_STREAM, programming_language *pl, source_line *L) {
	VOID_METHOD_CALL(pl, INSERT_LINE_MARKER_TAN_MTID, OUT, L);
}

@ The following hooks are provided so that we can top and/or tail the expansion
of paragraph macros in the code. For example, C-like languages, use this to
splice [[{]] and [[}]] around the expanded matter.

<<*>>=
enum BEFORE_MACRO_EXPANSION_TAN_MTID
enum AFTER_MACRO_EXPANSION_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(BEFORE_MACRO_EXPANSION_TAN_MTID, programming_language *pl, text_stream *OUT, para_macro *pmac)
VOID_METHOD_TYPE(AFTER_MACRO_EXPANSION_TAN_MTID, programming_language *pl, text_stream *OUT, para_macro *pmac)
void LanguageMethods::before_macro_expansion(OUTPUT_STREAM, programming_language *pl, para_macro *pmac) {
	VOID_METHOD_CALL(pl, BEFORE_MACRO_EXPANSION_TAN_MTID, OUT, pmac);
}
void LanguageMethods::after_macro_expansion(OUTPUT_STREAM, programming_language *pl, para_macro *pmac) {
	VOID_METHOD_CALL(pl, AFTER_MACRO_EXPANSION_TAN_MTID, OUT, pmac);
}

@ It's a sad necessity, but sometimes we have to unconditionally tangle code
for a preprocessor to conditionally read: that is, to tangle code which contains
[[#ifdef]] or similar preprocessor directive.

<<*>>=
enum OPEN_IFDEF_TAN_MTID
enum CLOSE_IFDEF_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(OPEN_IFDEF_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *symbol, int sense)
VOID_METHOD_TYPE(CLOSE_IFDEF_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *symbol, int sense)
void LanguageMethods::open_ifdef(OUTPUT_STREAM, programming_language *pl, text_stream *symbol, int sense) {
	VOID_METHOD_CALL(pl, OPEN_IFDEF_TAN_MTID, OUT, symbol, sense);
}
void LanguageMethods::close_ifdef(OUTPUT_STREAM, programming_language *pl, text_stream *symbol, int sense) {
	VOID_METHOD_CALL(pl, CLOSE_IFDEF_TAN_MTID, OUT, symbol, sense);
}

@ Now a routine to tangle a comment. Languages without comment should write nothing.

<<*>>=
enum COMMENT_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(COMMENT_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *comm)
void LanguageMethods::comment(OUTPUT_STREAM, programming_language *pl, text_stream *comm) {
	VOID_METHOD_CALL(pl, COMMENT_TAN_MTID, OUT, comm);
}

@ The inner code tangler now acts on all code known not to contain CWEB
macros or double-square substitutions. In almost every language this simply
passes the code straight through, printing [[original]] to [[OUT]].

<<*>>=
enum TANGLE_LINE_UNUSUALLY_TAN_MTID

<<*>>=
INT_METHOD_TYPE(TANGLE_LINE_UNUSUALLY_TAN_MTID, programming_language *pl, text_stream *OUT, text_stream *original)
void LanguageMethods::tangle_line(OUTPUT_STREAM, programming_language *pl, text_stream *original) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, TANGLE_LINE_UNUSUALLY_TAN_MTID, OUT, original);
	if (rv == FALSE) WRITE("%S", original);
}

@ We finally reach the bottom of the tangled file, a footer called the "gnabehs":

<<*>>=
enum GNABEHS_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(GNABEHS_TAN_MTID, programming_language *pl, text_stream *OUT, web *W)
void LanguageMethods::gnabehs(OUTPUT_STREAM, programming_language *pl, web *W) {
	VOID_METHOD_CALL(pl, GNABEHS_TAN_MTID, OUT, W);
}

@ But we still aren't quite done, because some languages need to produce
sidekick files alongside the main tangle file. This method exists to give
them the opportunity.

<<*>>=
enum ADDITIONAL_TANGLING_TAN_MTID

<<*>>=
VOID_METHOD_TYPE(ADDITIONAL_TANGLING_TAN_MTID, programming_language *pl, web *W, tangle_target *target)
void LanguageMethods::additional_tangling(programming_language *pl, web *W, tangle_target *target) {
	VOID_METHOD_CALL(pl, ADDITIONAL_TANGLING_TAN_MTID, W, target);
}

@ \section{Weaving methods.}
This metnod shouldn't do any actual weaving: it should simply initialise
anything that the language in question might need later.

<<*>>=
enum BEGIN_WEAVE_WEA_MTID

<<*>>=
VOID_METHOD_TYPE(BEGIN_WEAVE_WEA_MTID, programming_language *pl, section *S, weave_order *wv)
void LanguageMethods::begin_weave(section *S, weave_order *wv) {
	VOID_METHOD_CALL(S->sect_language, BEGIN_WEAVE_WEA_MTID, S, wv);
}

@ This method allows languages to tell the weaver to ignore certain lines.

<<*>>=
enum SKIP_IN_WEAVING_WEA_MTID

<<*>>=
INT_METHOD_TYPE(SKIP_IN_WEAVING_WEA_MTID, programming_language *pl, weave_order *wv, source_line *L)
int LanguageMethods::skip_in_weaving(programming_language *pl, weave_order *wv, source_line *L) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, SKIP_IN_WEAVING_WEA_MTID, wv, L);
	return rv;
}

@ Languages most do syntax colouring by having a "state" (this is now inside
a comment, inside qupted text, and so on); the following method is provided
to reset that state, if so. Inweb runs it once per paragraph for safety's
sake, which minimises the knock-on effect of any colouring mistakes.

<<*>>=
enum RESET_SYNTAX_COLOURING_WEA_MTID

<<*>>=
VOID_METHOD_TYPE(RESET_SYNTAX_COLOURING_WEA_MTID, programming_language *pl)
void LanguageMethods::reset_syntax_colouring(programming_language *pl) {
	VOID_METHOD_CALL_WITHOUT_ARGUMENTS(pl, RESET_SYNTAX_COLOURING_WEA_MTID);
}

@ And this is where colouring is done.

<<*>>=
enum SYNTAX_COLOUR_WEA_MTID

<<*>>=
int colouring_state = PLAIN_COLOUR;

INT_METHOD_TYPE(SYNTAX_COLOUR_WEA_MTID, programming_language *pl,
	weave_order *wv, source_line *L, text_stream *matter, text_stream *colouring)
int LanguageMethods::syntax_colour(programming_language *pl,
	weave_order *wv, source_line *L, text_stream *matter, text_stream *colouring) {
	for (int i=0; i < Str::len(matter); i++) Str::put_at(colouring, i, PLAIN_COLOUR);
	int rv = FALSE;
	programming_language *colour_as = pl;
	if (L->category == TEXT_EXTRACT_LCAT) colour_as = L->colour_as;
	theme_tag *T = Tags::find_by_name(I"Preform", FALSE);
	if ((T) && (Tags::tagged_with(L->owning_paragraph, T))) {
		programming_language *prepl = Languages::find_by_name(I"Preform", wv->weave_web, FALSE);
		if ((L->category == PREFORM_LCAT) || (L->category == PREFORM_GRAMMAR_LCAT))
			if (prepl) colour_as = prepl;
	}
	if (colour_as)
		INT_METHOD_CALL(rv, colour_as, SYNTAX_COLOUR_WEA_MTID, wv, L,
			matter, colouring);
	return rv;
}

@ This method is called for each code line to be woven. If it returns [[FALSE]], the
weaver carries on in the normal way. If not, it does nothing, assuming that the
method has already woven something more attractive.

<<*>>=
enum WEAVE_CODE_LINE_WEA_MTID

<<*>>=
INT_METHOD_TYPE(WEAVE_CODE_LINE_WEA_MTID, programming_language *pl, text_stream *OUT, weave_order *wv, web *W,
	chapter *C, section *S, source_line *L, text_stream *matter, text_stream *concluding_comment)
int LanguageMethods::weave_code_line(OUTPUT_STREAM, programming_language *pl, weave_order *wv,
	web *W, chapter *C, section *S, source_line *L, text_stream *matter, text_stream *concluding_comment) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, WEAVE_CODE_LINE_WEA_MTID, OUT, wv, W, C, S, L, matter, concluding_comment);
	return rv;
}

@ When Inweb creates a new [[^"Theme"]], it lets everybody know about that.

<<*>>=
enum NOTIFY_NEW_TAG_WEA_MTID

<<*>>=
VOID_METHOD_TYPE(NOTIFY_NEW_TAG_WEA_MTID, programming_language *pl, theme_tag *tag)
void LanguageMethods::new_tag_declared(theme_tag *tag) {
	programming_language *pl;
	LOOP_OVER(pl, programming_language)
		VOID_METHOD_CALL(pl, NOTIFY_NEW_TAG_WEA_MTID, tag);
}

@ \section{Analysis methods.}
These are really a little miscellaneous, but they all have to do with looking
at the code in a web and working out what's going on, rather than producing
any weave or tangle output.

The "preweave analysis" is an opportunity to look through the code before
any weaving of it occurs. It's never called on a tangle run. These methods
are called first and last in the process, respectively. (What happens in
between is essentially that Inweb looks for identifiers, for later syntax
colouring purposes.)

<<*>>=
enum ANALYSIS_ANA_MTID
enum POST_ANALYSIS_ANA_MTID

<<*>>=
VOID_METHOD_TYPE(ANALYSIS_ANA_MTID, programming_language *pl, web *W)
VOID_METHOD_TYPE(POST_ANALYSIS_ANA_MTID, programming_language *pl, web *W)
void LanguageMethods::early_preweave_analysis(programming_language *pl, web *W) {
	VOID_METHOD_CALL(pl, ANALYSIS_ANA_MTID, W);
}
void LanguageMethods::late_preweave_analysis(programming_language *pl, web *W) {
	VOID_METHOD_CALL(pl, POST_ANALYSIS_ANA_MTID, W);
}

@ And finally: in InC only, a few structure element names are given very slightly
special treatment, and this method decides which.

<<*>>=
enum SHARE_ELEMENT_ANA_MTID

<<*>>=
INT_METHOD_TYPE(SHARE_ELEMENT_ANA_MTID, programming_language *pl, text_stream *element_name)
int LanguageMethods::share_element(programming_language *pl, text_stream *element_name) {
	int rv = FALSE;
	INT_METHOD_CALL(rv, pl, SHARE_ELEMENT_ANA_MTID, element_name);
	return rv;
}

@ \section{What we support.}

<<*>>=
int LanguageMethods::supports_definitions(programming_language *pl) {
	if (Str::len(pl->start_definition) > 0) return TRUE;
	if (Str::len(pl->prolong_definition) > 0) return TRUE;
	if (Str::len(pl->end_definition) > 0) return TRUE;
	return FALSE;
}