As was apparent in VarEvalFlags_ToString, a bit-set was not the best
data type since most of the flags were not freely combinable. The two
flags that could be combined were keepDollar and keepUndef, but even
these have distinguished names in the debug log.
The downside of struct bit-fields is that they need extra helper
functions in C90 (see nonints.h). Exchange these for a few helper
functions in var.c, to keep the code outside var.c simple.
No functional change.
This makes the code easier to read, especially in var.c. It also makes
debugging sessions easier since some debuggers don't show enum
bit-fields symbolically as soon as more than one bit is set.
The code outside var.c is basically unchanged, except that instead of
passing the individual flags, there are 4 predefined evaluation modes.
These suffice for all practical use cases. Only in the implementation
deep inside var.c, the value of the flags keepDollar and keepUndef
differs.
There is no way of passing the struct to EnumFlags_ToString, which means
the ToString function has to be spelled out explicitly. This allows for
fine-tuning the representation in the debug log, to reduce the amount of
uppercae letters.
No functional change.
The name 'NONE' described the bit pattern, which was not useful to
understand its meaning. Omitting VARE_WANTRES only parses the
expression, without evaluating any part of it.
No functional change, not even in debug mode since Enum_FlagsToString
always returns "none" for all-bits-unset.
These variables all belong to a string variable. Connect them using
FStr, which reduces the number of variables to keep track of.
No functional change.
This change provides for a more natural reading order in the code.
Placing the scope first makes it immediately clear in which context the
remaining parameters are interpreted.
No functional change.
This continues the previous commit, in which VAR_GLOBAL, VAR_INTERNAL
and VAR_CMDLINE were renamed.
Renaming the variable 'ctxt' was trivial since that word is used nowhere
else. In the comments though, each occurrence of the word 'context' had
to be checked individually since the word 'context' was not only used
for referring to a variable scope. It is also used to distinguish
different situations where characters are escaped in a certain way
('parsing context') and in a few other expressions.
The word "context" does not fit perfectly to the variables that are
associate with a GNode, as the context is usually something from the
outside and the variables are more like properties inherent to the
GNode.
The term "global context" fits even less. Since the thing where
variables are looked up is commonly named a scope, use that term
instead.
This commit only renames the global variables VAR_GLOBAL, VAR_INTERNAL
and VAR_CMDLINE, plus a few very closely related comments. These are:
GNode.vars (because of line breaks)
GNode_Free (dito)
varname-make_print_var_on_error.mk
varname-make_print_var_on_error-jobs.mk
The debug message in Var_Stats is left as-is since there is no unit test
for it yet.
The other renamings (variable names "context", "ctxt", as well as
further comments) will be done in a follow-up commit.
After doing the textual renaming across all files, I added a new
function Var_Set that does not expand the variable name. I then undid
the renaming for all calls where the variable name cannot ever contain a
dollar sign. I omitted the word "Expand" from the textual references in
the unit tests and in the debug logging messages since the focus is
usually on the "Set" part, not on the "Expand".
No functional change.
Most previous calls to Var_Exists use constant variable names. Only the
two calls in parse.c need to expand the variable name.
It may be a good idea to expand the variable name once in VarAssign_Eval
instead of repeating the expansion in each of its special cases.
No functional change.
All callers with a variable name that is guaranteed to not contain a
dollar sign have been converted to call Global_Append instead of the
previous Global_AppendExpand. After that, Global_AppendExpand was
unused, therefore it was effectively just renamed.
The plain Var_Append now does not expand the variable name anymore. It
is used in situations where the variable name is known to not contain a
dollar sign.
This is a preparation for adding Global_Append, corresponding to
Global_AppendExpand.
There are many places where global variables are set or appended to. To
reduce clutter and code size, encode the VAR_GLOBAL in the function
name.
The word Expand in the function names says that the variable name is
expanded. In most of the cases, this is not necessary, but there are no
corresponding functions Global_Set or Global_Append yet.
Encoding the information whether the name is expanded or not in the
function name will make inconsistencies obvious in future manual code
reviews. Letting the compiler check this by using different types for
unexpanded and expanded variable names is probably not worth the effort.
There are still a few bugs to be fixed, such as in SetVar, which expands
the variable name twice in a row.
The previous error message is not easily understandable since it is
missing a crucial detail, the column where the operator is needed.
Without this information, the author of the makefile gets no useful
hint. Furthermore, there are several types of operators in makefiles:
the dependency operators ':', '!', '::',
the variable assignment operators '=', '!=', '+=', '?=', ':=',
the conditional operators '&&', '||', '!',
the comparison operators '==', '!=', '>', '>=', '<', '<='.
This leaves too much ambiguity.
Replace this error message with "Invalid line type", which is more
generic, more accurate and thus less misleading.
In all cases except one, the boolean argument to Buf_Destroy was
constant. Removing that argument by splitting the function into two
separate functions makes the intention clearer on the call site. It
also removes the possibility for using the return value of Buf_Done,
which would have made no sense.
The function Buf_Done now pairs with Buf_Init, just as in HashTable and
Lst.
Even though Buf_Done is essentially a no-op, it is kept as a function,
both for symmetry with Buf_Init and for clearing the Buffer members
after use (this will be done only in CLEANUP mode, in a follow-up
commit).
This makes it easier to track down where a warning or an error
originated from. This information could be further enriched for .for
loops, to also include the variable names and their values. For now,
it's good enough to replace the large comment describing how a stack
trace _could_ be generated with actual code that implements that idea.
The syntax of the locations is <filename>:<lineno>, which intentionally
differs from the traditional "<filename>" line <lineno>, since the
former is more widely supported by editors and IDEs.
Having this stacktrace information is especially intended for
complicated systems built upon make, such as pkgsrc.
Most of the make code already followed the style of explicitly writing
(ptr != NULL) instead of the shorter (ptr) in conditions.
The remaining 50 instances have been found by an experimental,
unpublished check in lint(1) that treats bool expressions as
incompatible to any other scalar type, just as in Java, C#, Pascal and
several other languages.
The only unsafe operation on Boolean that is left over is (flags &
FLAG), for an enum implementing a bit set. If Boolean is an ordinary
integer type (the default), some high bits may get lost. But if Boolean
is the same as _Bool (by compiling with -DUSE_C99_BOOLEAN), C99 6.3.1.2
defines that a conversion from any scalar to the type _Bool acts as a
comparison to 0, which cannot lose any bits.
Using mmap is beneficial if the loaded data is read-only, or if it is
accessed in random order. Neither of these applies here. When loading
a file, make reads it strictly from top to bottom, once. During
parsing, the loaded data is modified in-place to insert '\0' and '\n'
for terminating strings and lines. Because of all of this, there is no
benefit in using mmap.
Reading the file using 2 calls to read(2) (one for the data, one for
checking for EOF) loads the data in a single pass, instead of producing
a page fault whenever the parser passes another page boundary.
Use a Buffer for loading the file data, to avoid calling bmake_realloc
directly.
Do not resize the loaded buffer at the end. Each loaded file is
short-lived anyway, and only a few files are loaded at the same time, so
there is no point in optimizing this part for low memory usage.
Controlling the expansion of variable expressions using a global
variable and a VARE flag was inconsistent.
Converting the global variable into a flag had to prerequisites:
1. The unintended duplicate variable assignment had to be fixed, as
done in parse.c 1.520 from 2020-12-27. Without this fix, it would have
been necessary to add more flags to Var_Exists and Var_SetWithFlags, and
this would have become too complex.
2. There had to be a unit test demonstrating that VARE_KEEP_DOLLAR only
applies to the top-level expression and is not passed to the
subexpressions, while VARE_KEEP_UNDEF applies to all subexpressions as
well. This test is in var-op-expand.mk 1.10 from 2020-12-28, at least
for the ':@word@' modifier. In ParseModifierPartSubst, VARE_KEEP_UNDEF
is not passed down either, in the same way.
This allows the -q option to distinguish errors from out-of-date
targets. Granted, it's an edge case but it should be solved
consistently anyway.
The majority of cases in which make exits with exit status 1, even in -q
mode, is when there are parse errors. These have been kept as-is for
now as they affect many of the unit tests.
The technical errors, on the other hand, occur so rarely that it's hard
to write reliable tests for them that fail consistently on all platforms
supported by make.
When running lint(1) on the code, it defines the preprocessor macro
"lint" to 1, which generated a syntax error in the declaration "Boolean
lint", as that became "Boolean 1".
Previously, mmapped files didn't always have the final newline added.
Only those that ended at a page boundary did.
This confused ParseRawLine, which assumed (and since parse.c 1.510 from
moments ago also asserted) that every line ends with a newline, which
allows the code to assume that after a backslash, there is at least one
other character in the buffer, thereby preventing an out-of-bounds read.
This bug had been there at least since parse.c 1.170 from 2010-12-25
04:57:07, maybe even earlier, I didn't check.
Now line_end always points to the trailing newline, which allows
ParseGetLine to overwrite that character to end the string.
Passing a struct as printf argument for the %s conversion doesn't work.
On NetBSD-8.0-x86_64, the output looks normal, but on SunOS-5.9, the
output is garbled, containing bytes 0xFF and 0xFE.
This bug had been introduced in parse.c 1.507 from 2020-12-20 14:52:16.
Thanks to sjg for finding this bug so quickly.