Using mmap is beneficial if the loaded data is read-only, or if it is
accessed in random order. Neither of these applies here. When loading
a file, make reads it strictly from top to bottom, once. During
parsing, the loaded data is modified in-place to insert '\0' and '\n'
for terminating strings and lines. Because of all of this, there is no
benefit in using mmap.
Reading the file using 2 calls to read(2) (one for the data, one for
checking for EOF) loads the data in a single pass, instead of producing
a page fault whenever the parser passes another page boundary.
Use a Buffer for loading the file data, to avoid calling bmake_realloc
directly.
Do not resize the loaded buffer at the end. Each loaded file is
short-lived anyway, and only a few files are loaded at the same time, so
there is no point in optimizing this part for low memory usage.
Controlling the expansion of variable expressions using a global
variable and a VARE flag was inconsistent.
Converting the global variable into a flag had to prerequisites:
1. The unintended duplicate variable assignment had to be fixed, as
done in parse.c 1.520 from 2020-12-27. Without this fix, it would have
been necessary to add more flags to Var_Exists and Var_SetWithFlags, and
this would have become too complex.
2. There had to be a unit test demonstrating that VARE_KEEP_DOLLAR only
applies to the top-level expression and is not passed to the
subexpressions, while VARE_KEEP_UNDEF applies to all subexpressions as
well. This test is in var-op-expand.mk 1.10 from 2020-12-28, at least
for the ':@word@' modifier. In ParseModifierPartSubst, VARE_KEEP_UNDEF
is not passed down either, in the same way.
At that point, the expression can never be varUndefined. At the
beginning of ParseVarnameLong, the expression is initialized to a simple
empty string, and that string is only ever converted to varUndefined at
the very end of Var_Parse.
Back in 1993, the variables in a context were stored in a linked list.
Searching such a list indeed required literally thousands of calls to
strcmp. In make.h 1.22 from 1999-09-15, the linked list was replaced
with a hash table, requiring much fewer string comparisons. Since then,
the rationale doesn't apply anymore.
This allows the -q option to distinguish errors from out-of-date
targets. Granted, it's an edge case but it should be solved
consistently anyway.
The majority of cases in which make exits with exit status 1, even in -q
mode, is when there are parse errors. These have been kept as-is for
now as they affect many of the unit tests.
The technical errors, on the other hand, occur so rarely that it's hard
to write reliable tests for them that fail consistently on all platforms
supported by make.
Extracting the character-level details makes the essence of Var_Subst
visible in the code, which is to iterate over the given text, handling a
few types of tokens.
The many constants were invented because at that time I didn't quite
understand the actual outcomes of Var_Parse that need to be
distinguished. There are only a few:
(1) Errors, whether they are parse errors, or evaluation errors or
undefined variables. The old constants VPR_PARSE_MSG and
VPR_UNDEF_MSG are merged into VPR_ERR.
(2) Undefined expressions in a situation in which they are allowed.
Previously the documentation for VPR_UNDEF_SILENT talked about
undefined expressions in situations where they were not allowed.
That case is fully covered by VPR_ERR instead.
(3) Errors that are silently ignored. These are probably bugs.
(4) Everything went fine, the expression has a defined value.
Right now, Var_Subst always returns VPR_OK, even if there had been parse
errors or evaluation errors. If that is no longer true, the errors will
be reported properly.
Since make uses vfork if available, re-exporting the variables happens
in the address space of the main process anyway, so there is no point in
mentioning anything about "our client process" anywhere.
Previously, running lint mode didn't define MAKE_RCSID at all, which
resulted in a syntax error.
While here, reduced the indentation and nesting of the preprocessor
directives.
When running lint(1) on the code, it defines the preprocessor macro
"lint" to 1, which generated a syntax error in the declaration "Boolean
lint", as that became "Boolean 1".
This macro was supposed to return a boolean expression all the time, it
just hadn't been implemented this way. This resulted in wrong output
for the test sh-flags, in compilation modes -DUSE_UCHAR_BOOLEAN and
-DUSE_CHAR_BOOLEAN, since in ParseCommandFlags, the expression
DEBUG(LOUD) didn't fit into a boolean.
Since make doesn't support variable names containing spaces, this edge
case is not enough reason to stop this feature. Having multiple
variable names as arguments nicely aligns with other directives such as
.for and .export.
Previously, mmapped files didn't always have the final newline added.
Only those that ended at a page boundary did.
This confused ParseRawLine, which assumed (and since parse.c 1.510 from
moments ago also asserted) that every line ends with a newline, which
allows the code to assume that after a backslash, there is at least one
other character in the buffer, thereby preventing an out-of-bounds read.
This bug had been there at least since parse.c 1.170 from 2010-12-25
04:57:07, maybe even earlier, I didn't check.
Now line_end always points to the trailing newline, which allows
ParseGetLine to overwrite that character to end the string.
For the modifiers :gmtime and :localtime, the excess newline had been
added in var.c 1.631 from 2020-10-31 21:40:20.
For the modifiers :range and :ts, the excess newline had been added in
var.c 1.635 from 2020-11-01 14:36:25.
It's easier to have both the expressions and the expected values in a
single file. This also allows for flexible handling of multiple
acceptable outputs, in this case for 32-bit time_t.
Passing a struct as printf argument for the %s conversion doesn't work.
On NetBSD-8.0-x86_64, the output looks normal, but on SunOS-5.9, the
output is garbled, containing bytes 0xFF and 0xFE.
This bug had been introduced in parse.c 1.507 from 2020-12-20 14:52:16.
Thanks to sjg for finding this bug so quickly.
This makes all intermediate strings constant. For this simple
search-and-replace refactoring, all intermediate locations where the
"current value of the expression" was stored had to be of the type
MFStr.
Using FStr instead of MFStr allows to save a few memory allocations,
which will be done in the follow-up commits.
Previously, memory management had been split among several variables.
The general idea was very simple though. The current value of the
expression needs to be kept in memory, and each modifier either keeps
that value or replaces it with its own newly allocated result, or
var_Error or varUndefined.
Using MFStr, it does not matter anymore that var_Error and varUndefined
are statically allocated since these are assigned using MFStr_InitRefer.
The complexity of the implementation is now closer to the actual
complexity. Most probably the code can be simplified even more.
Memory management is still complicated in this area. To clean this up,
the previous value of the expression needs to be converted to an MFStr
first, and later to an FStr.
Do not increment a null pointer.
Do not assign to a variable twice in the same statement. To be fair,
this may be safe because of the sequence point when the function is
called, but anyway, it looks too close to undefined behavior.
Before, make printed an "error message" that did not include the word
error and thus was not easily identified as such. This "error message"
also did not influence the exit status in the default mode but only in
-dL mode. The error message also didn't include any line number
information and was thus rude.
The next commit will error out on unknown modifiers and influence the
exit status. The test modmisc.mk contains both parse time tests and run
time tests. To prevent the latter from being run, the parse error is
moved to varmod-indirect.mk, which only contains parse time tests.
Previously, the parameter out_freeIt was not guaranteed to be
initialized in every case, at least when looking only at EvalUndefined.
This contradicted the variable name.
Replace the two parameters with a single FStr to ensure that these
variables are always initialized together.
In GetVarnamesToUnexport, there is no need to free the local FStr since
the only place where it is assigned an allocated string is at the very
end.
Having separate functions for the two main use cases of a possibly
allocated string makes the calling code simpler. This is a preparatory
commit for making the memory allocation in ApplyModifiers easier to
understand.
Put the simple tests at the top, demonstrating that there are already
some cases in which the misspelled directive is detected. It's not
detected though if the surrounding conditional branch is skipped.
The address of readMoreArg is hardly useful when stepping through this
part of the code, therefore omit it. Instead of mentioning the exact
function names of the data source, describe them in words, which helps
especially in the case of .for loops.
This ensures that the line numbers for messages are the expected onces
in .for loops.
While experimenting with the backslash continuation lines, I noticed
that the reported line numbers for these are based on the number of
completely parsed physical lines, which nicely cancels out the + 1 that
has to be added for producing human-readable 1-based line numbers. It
would be more correct to report the parse errors on the first affected
line.
The previous variable names had been chosen at a time when compilers
didn't merge variables into the same registers. Luckily, these times
are gone, and it's no longer necessary to use a variable for 2 or more
completely unrelated purposes.
In function names, the word "get" was not used consistently to look up
or compute data, in several cases "get" was a synonym for "read", just
like in the standard C library (fgetc).
The really confusing part is that there are two functions now, called
ParseGetLine and ParseReadLine, and both were underdocumented.
At that point, the variable expression has already been expanded. To
avoid the impression that the token might be relevant, pass FALSE
instead of TRUE. No change of behavior.
This is needed to compile bmake with GCC 2.8.1 on SunOS 5.9.
To support ancient systems like this, the whole code of usr.bin/make is
supposed to use only ISO C90 features, except for filemon, which is not
used on these systems.
There is no use case for accessing or even modifying the capacity of a
vector, therefore there is no need to hide it using the prefix "priv_".
This way, the member names are aligned between Buffer and Vector.
The function basename from POSIX has a few unfortunate properties, it is
allowed to return a pointer to static memory. This is too unreliable,
therefore this trivial own implementation.
The comment "execute the commands" had once been correct but not
anymore. Since a few years, not only the commands of the .BEGIN and
.END nodes are executed, instead the nodes are made as usual, including
their dependencies.
A race between child and parent means that we cannot
guarantee whether all child output is seen before we call
JobClosePipes, thus intervening debug output can appear
before or after the last child output.
Now that the parsing of the directives is unified and strict, there is
no need anymore for the dispatched functions to check for unknown
directives. These functions don't even get the information to decide
that since this decision is already done.
Since a line is not an iterator and since the expression *line typically
means "the current element", not "the first character", replacing *line
with line[0] more directly expresses the idea of accessing the first
character of a string.
Before, make accepted misspellings like .warnings, .export-literally and
a few others, all of which are unlikely to occur in practice. See the
test directive-misspellings.mk for further details.
This test allows the other directive-* tests to focus on the purpose of
the individual directive, allowing these tests to continue after
parsing, without errors.
It had been conceptually wrong to modify cmdFlags.echo just to suppress
echoing while enabling error checking.
Now the code in JobPrintCommand speaks for itself and no longer needs
any comments. The few lines at the end have the sole purpose of
restoring the default state (echo + errChk) in the shell file.
The field job->echo is initialized in JobStart (and in JobOpenTmpFile).
After that, it is not modified anymore. Therefore it is not necessary
to run these test cases redundantly.
The field job->ignerr, on the other hand, is modified later on. For
these cases, the many remaining test cases are still needed.
This flag was placed wrong in the Job since it is only necessary as long
as the shell commands are written to the shell file.
Resetting it in JobStart and JobExec was completely misguided since that
is far away from writing the shell commands; this should have been done
in JobPrintCommands instead.
The status of this flag doesn't need to be printed in debugging mode
since it is controlled by a single command line option (-dx) and does
not interact with all the other switches.
Right now, the test sh-flags.mk demonstrates many variants to configure
echoing of the shell commands (-s, .SILENT, '@'), error handling (-i,
.IGNORE, '-') and whether the commands are run (-n, -N, .MAKE,
.RECURSIVE, '+').
Even more variants are possible by configuring the shell to have error
control. None of the built-in shell definitions has error control, so
it is unlikely that anybody uses them, but who knows.
Being able to configure these details at 3 levels is good, but what
makes all this really hard to understand is that some of these switches
interact in non-obvious ways. For example, in jobs mode, a single
command can change job->ignerr (in JobPrintSpecialsEchoCtl), which will
affect all further commands of that job.
The goal of this refactoring is to make the code easier to understand by
making the switches on the job level constant and by moving all
modifications to them to the ShellWriter.