This was an easy part since it affects only a few variables. 250 lines
for a single function is still quite a lot, so further refactorings will
follow.
The main property of the former "str" is not being a string but pointing
at the start of the expression to be parsed.
The main property of the former "tstr" is not being a string but being
the moving pointer, the current parsing position. No idea what the "t"
in "tstr" stood for.
There was no need to export Targ_PrintNode at all. All the callers need
is a high-level API for printing a single node or a list of nodes. The
implementation detail that Targ_PrintNode was used as a callback to
Lst_ForEach should have never leaked into the API.
By convention, exported identifiers are written with underscore.
The prototype of an exported function must not use void * just because
it is used in Lst_ForEach. This is an implementation detail and must
remain so.
The prefix "cond" was needed when this struct field was a global
variable. The main name "expr" was not precise enough since this code is
about parsing a condition, not an expression.
During parsing, this variable does not contain the whole expression but
a pointer to the remaining part of the condition, therefore the name
"expr" had been confusing.
Instead of having 3 global variables, the struct clearly communicates
that the 3 variables belong together. During debugging, it's easy to
just "p *lex" instead of remembering the names of the 3 former global
variables.
Converting the global variables into a local variable makes it
immediately clear that the functions in this file operate on this
struct. Keeping the global variables in mind is more difficult. Having
a local variable also gets rid of the 3 sv_* variables in
Cond_EvalExpression, which were also a sign that these "global
variables" were not that global at all.
This commit only contains the minimal code changes for converting the
variables into a local struct. It was tempting to add functions like
CondLexer_SkipWhitespace, but this is better left for a follow-up
commit.
In suff.c r1.144 from yesterday, in the line "cp += nested_p - cp", I
accidentally removed the "- 1". Since these "- 1" lines lead to slow
execution, each branch now increments the pointer separately by the
actually needed amount.
Fixing this bug posed way more new questions than it answered, and it
revealed an inconsistency in the parser about how characters are to be
escaped, and missing details in the documentation of Var_Parse, as well
as a parse error that unexpectedly doesn't stop make from continuing.
This makes the output a bit more reproducible. There are still the file
descriptors, which may differ between different runs, but at least the
nextbuf function is printed using a symbolic name instead of a meaningless address.
Besides loadedfile_nextbuf, the only other function is ForIterate.
Initializing a Buffer or a strlist_t with zero-valued bytes only works
by conincidence, but because it would be the correct way. In the code
path "missing `in' in for", that zero-filled Buffer is freed using
Buf_Destroy, which could have invoked undefined behavior.
This only makes a difference for Hash_Table keys outside the ASCII
character set, and these are barely used in practice, if at all.
The effects of this change can only be seen in debug mode, when printing
the full contents of the variable namespaces. In this output, the order
of the entries might change.
All other use cases stay the same as before.
Hash collisions may slow down make in certain special situations. There
is no point though in maliciously triggering such a situation since
anyone who can inject values into makefiles can easily run shell
commands using the :!cmd! modifier or similar mechanisms. Crafting
variable names just to slow down make is thus not an attack vector.
The word "get" implies a cheap operation without side effects. Parsing
instead has lots of side effects, even if it's only that the parsing
position is updated.
The test had failed in the releng build because it assumed it were run
with .CURDIR == .PARSEDIR. This assumption is true when the tests are
run directly from usr.bin/make, but not when they are run from
tests/usr.bin/make.
By using a Stack instead of a Lst, the available API is reduced to the
very few functions that are really needed for a stack. This prevents
accidental misuse (such as confusing Lst_Append with Lst_Prepend) and
clearly communicates what the expected behavior is.
A stack also needs fewer calls to bmake_malloc than an equally-sized
list, and the memory is contiguous. For the nested include path, all
this doesn't matter, but the type is so generic that it may be used in
other places as well.
In the previous brute force search, it seemed there was no string with
that hash code. That was probably an oversight or a little programming
mistake. Anyway, it's possible to get that hash value, so keep the
example.
The previous test vectors didn't contain any hash with a leading zero.
This could have been a simple programming mistake by using %8x instead
of the intended %08x. Using snprintf wouldn't have been possible anyway
since the hex digits are printed in little-endian order, but without
reversing the bits of each digit. Kind of unusual, but doesn't affect
the distribution of the hashes.
The ApplyModifier functions already use this pattern. For simplicity
and consistency Var_Parse should do the same. This saves a parameter to
be passed.
The migration takes place step by step, just like for the Lst functions
a few days ago.
The previous lenient rule came from the sprite.h header that was not
specific to make. To avoid confusion, only the expected values should
be stored in a Boolean variable. To help find obvious violations and
inconsistencies, there are different possibilities for the Boolean type,
during development.
In C there is no way to actually enforce this restriction at runtime.
It would be possible in C++, but the code is not ready to be compiled
with a C++ compiler.
Lst_IsEmpty does not belong in the "create and destroy" group, but in
"query information without modifying anything".
The functions named LstNode_* all belong together. They do not provide
much abstraction, but still they restrict the API and hide a few struct
fields that are only used internally by Lst_Open/Lst_Close and
Lst_ForEach.
Use consistent wording in the documentation of the functions (list,
node, datum).
For a long time, I had assumed that the iteration variables of a .for
loop are just normal global variables. This assumption was wrong but
didn't have any consequences.
The iteration variables of a .for loop can just be accessed like global
variables, therefore it is not obvious that they are implemented in a
completely different way.
There are some edge cases in conditions used inside .for loops, in which
the iteration variables cannot be used like normal variables. An
example is brought up in https://gnats.netbsd.org/47888, which observes
that the defined() and empty() functions in conditions only work with
variables but ignore the iteration "variables", simply because these are
not variables but only expressions.
When the struct stat was used for both calling the actual stat and for
returning the result, no copying was needed. This also had the side
effect that for the first call of cached_stat, the returned struct stat
included all the fields properly filled in, and on later calls, these
fields were all zeroed out.
These two variables are separate now, thus the fields need to be copied
explicitly. There are no existing unit tests for this, but ./build.sh
failed reliably.
Only st_mtime and st_mode are actually filled, the remaining fields had
been set to zero. To prevent these from ever being accessed, a custom
struct make_stat replaces the previously used struct stat.
The fields in struct make_stat are intentionally named different from
the fields in struct stat because NetBSD and some other operating
systems define st_mtime as a macro, and that would not work in a field
declaration.
Back in the 1980s it made sense to have the type information encoded in
the variable names. At the time when make was imported into the NetBSD
tree (1993-03-21), the functions did indeed not have prototypes, they
only had return types. The void type was already invented at that time.
Since the compiler could not verify the types of function parameters, it
made perfect sense to have each variable tell whether it was a pointer
or not.
Since ISO C90 this is no longer necessary since the compiler checks
this. The variable names can now focus on the application level and
their high-level meaning, expressing the relationship to other
variables instead of encoding redundant type information.
The name HTSIZE didn't provide any explanation for the value 191, and it
is obvious that this is a hash table size. Therefore giving the
constant a name didn't explain anything or make it less magic.
The successful cases can be easily tested in the .if conditions. Around
these conditions, there is enough space for explaining the test cases
and their purpose.
The failure cases have been left in the file for now since they still
produce unwanted characters in the output. These characters are not
produced when the parse error occurs in a conditional.
Big thanks go to sjg, who discovered the bug and did the main work to
track it down.
In the unit tests for the :u modifier from the previous commit, I had
forgotten to actually add the :u modifier at the end. I added it now
and also added a few other tests. It's better to have a few more tests
than too few.
The :u modifier had been broken in var.c 1.479 from 2020.08.30.19.56.02.
The code that implements the :u modifier was well-covered in the unit
tests, except for the single line that actually deals with adjacent
duplicate words.
The "refactoring" commit that replaced brk_string with Str_Words had not
taken into account that the number of words (in ac) had to be passed to
WordList_JoinFree. Instead, the number of words was always preserved,
and the words at the end were therefore duplicated in the result.
The fix for this bug will be in the follow-up commit.
The -O3 option of GCC 5.5 is unsure about whether s and t are always
defined, since SuffParseTransform only defines them if it returns TRUE.
Therefore assert that it does. When compiled with -NDEBUG, this would
result in an unused variable, therefore use it using the well-known
cast-to-void pattern.
Just in case anyone wants to use them for copy-and-paste.
The invocations of these macros are left cautious since the
system-provided definition of these macros may have forgotten the
parentheses as well.
This bug had been there since the initial import of make, on 1993-03-21.
It just never broke the build because of the missing assertion.
https://mail-index.netbsd.org/tech-toolchain/2020/08/30/msg003847.html
The error message "cd: can't cd to include" that I got when I first
applied the fix was unrelated. It was caused by an extra directory
"include" in src/tools/compat that didn't belong there. With that
directory removed, running "./build.sh -j8 tools" succeeds as expected.
At the point where Str_Words is called, the string contains no
meta-characters. This means that the parameter "expand" in Str_Words is
never looked at. This in turn means that this parameter can be set to
FALSE, thereby making it impossible that Str_Words returns NULL.
The API is much simpler, and there is less detail that is exposed by
default and fewer punctuation to type on the caller's side. To see that
there is some memory to be freed, one would have to look into the
struct. Having part of the return value as the actual return value and
the rest in output parameters was unnecessarily asymmetrical.
According to jemalloc(3), the variable must be called _malloc_options,
with a leading underscore, to have an effect.
Renaming the variable indeed enables the option. There's not much point
having this variable around though, since it neither detects a trivial
double-free nor freeing an invalid pointer in the following code
snippet:
char *asdf = bmake_malloc(10);
fprintf(stderr, "%c\n", *asdf);
free(asdf + 8);
free(asdf);
free(asdf);
exit(1);
Instead, it just crashes with a segmentation fault.
For one-liners, Lst_ForEach creates a lot of overhead, both at runtime
and for reading and understanding the code.
In this simple case where the structure of the traversed nodes is not
modified, a simple loop is enough. This avoids a lot of conversions to
void * and thus prevents type mistakes.
This occurred in the posix1.mk test, even though it is disabled in
unit-tests. But in tests/usr.bin/make it still runs. There, it should
have produced an "expected failure" but crashed instead.
The archive-suffix test is the stripped-down version of the posix1 test.
In SuffParseTransform, the parameter names have been renamed to make the
"side effects" comment redundant.
In Suff_AddSuffix and Suff_AddLib, the parameter has been made const.
In SuffRemoveSrc, the unused variable has been removed, and the return
type has been fixed.
Lst_Find is called with a "comparison" function that returns the integer
0 if the desired node is found. This leads to confusion since there are
so many different return value conventions for int, such as 0/1 for
mimicking false/true, -1/0 as in close(2), and the sign as in strcmp(3).
This API is much easier to understand if the "comparison" function is
not called a comparison function (since that is too close to strcmp),
but a "match" function that just returns a boolean.
In Lst_FindFromB, the node argument may be null. This deviates from the
other Lst functions, which require Lst and LstNode to generally be
non-null. In this case it is useful though to make the calling code
simpler.
In arch.c, this makes a lot of the previous documentation redundant.
In cond.c, the documentation is reduced a little bit since it had
already been cleaned up before. It also removes the strange negation
from CondFindStrMatch.
In dir.c, the documentation collapses as well.
In main.c, separating the ReadMakefile function from the callbacks for
Lst_FindB allows the former to get back its natural function signature,
with proper types and no unused parameters.
To catch any accidental mistakes during the migration from Lst_Find to
Lst_FindB, the code can be compiled with -DUSE_DOUBLE_BOOLEAN, which
will complain about incompatible function pointer types.
This is quite hard to trigger in a real-life scenario since it requires
precise timing.
Instead, I created /tmp/sys.mk and ran "./make -m /tmp -f /dev/null" in
the debugger, after setting a breakpoint in this line:
ln = Lst_Find(sysMkPath, ReadMakefile, NULL);
Once this line was reached, I removed /tmp/sys.mk to make ReadMakefile
fail. Just adding a few parse errors won't help since ReadMakefile only
fails if the file cannot be found.
Having Boolean aliased to int creates ambiguities since int is widely
used. Allow to occasionally compile make with -DUSE_DOUBLE_BOOLEAN to
check that the type definitions still agree.
Var_Subst never returns NULL.
In Main_ExportMAKEFLAGS, don't compare ints with booleans.
In MainParseArgs, use char for the current character. First, that's
more precise and correct, and second, it makes debugging easier for
those who don't know the ASCII table by heart.
The old name implied that the function would read multiple files, which
was not the case.
The comment above that function was highly confusing. It's not that the
function returns a boolean, but rather 0 or non-zero, and 0 means that
Lst_Find should stop searching.
One of the next refactorings will be to make Lst_Find return the first
list node for which the function returns TRUE. This will reduce the
confusion about the several functions called SomethingP in suff.c. The
P suffix means to return TRUE or FALSE, not 0 or non-zero.
The functions LstIsValid and LstNodeIsValid are only used in assertions.
Without the always-false assertion, Enum_ValueToString could have
returned undefined behavior.
Remove redundant parts of the function comments. Move the "side
effects" to the main section, since these effects are main effects, not
side effects.
Remove the redundant prototype for ArchFree.
In the first version of this test, I had completely misunderstood the
whole topic.
To test the interrupt, the make process has to be interrupted, not the
shell. This generates the correct message that the target is removed.
The filename for .PHONY targets is removed even though .PHONY targets
usually don't correspond to a file. The message is only printed if
there actually is a corresponding file. That's why this message does
not appear when interrupting "make clean".
Finally, since files get created and removed during a single run of
make, the file cache needs to be disabled. This is done by prefixing
the filenames with "././", see Dir_FindFile.
When I migrated the Lst_FindFrom to the strict API variant, I forgot
that Lst_FindFrom requires both arguments (list and node) to be
non-null. I had only checked that the list is non-null.
There are only very few calls to Lst_FindFrom, and they are all ok now.
Extract the null check for path to the top level. This has the
side-effect of only incrementing dotLast.refCount if that entry is
actually used.
Reduce the indentation of the code by returning early from the simple
branches.
The migration from null-passing Lst functions to argument-checking Lst
functions is completed.
There were 2 surprises: The targets list may be NULL, and in Dir_AddDir,
the path may be NULL. The latter case is especially surprising since
that function turns into an almost-nop in that case. This is another
case where probably 2 independent functions have been squeezed into a
single function. This may be improved in a follow-up commit.
All other lists were fine. They were always defined and thus didn't
need much work.