Initializing a Buffer or a strlist_t with zero-valued bytes only works
by conincidence, but because it would be the correct way. In the code
path "missing `in' in for", that zero-filled Buffer is freed using
Buf_Destroy, which could have invoked undefined behavior.
This only makes a difference for Hash_Table keys outside the ASCII
character set, and these are barely used in practice, if at all.
The effects of this change can only be seen in debug mode, when printing
the full contents of the variable namespaces. In this output, the order
of the entries might change.
All other use cases stay the same as before.
Hash collisions may slow down make in certain special situations. There
is no point though in maliciously triggering such a situation since
anyone who can inject values into makefiles can easily run shell
commands using the :!cmd! modifier or similar mechanisms. Crafting
variable names just to slow down make is thus not an attack vector.
The word "get" implies a cheap operation without side effects. Parsing
instead has lots of side effects, even if it's only that the parsing
position is updated.
The test had failed in the releng build because it assumed it were run
with .CURDIR == .PARSEDIR. This assumption is true when the tests are
run directly from usr.bin/make, but not when they are run from
tests/usr.bin/make.
By using a Stack instead of a Lst, the available API is reduced to the
very few functions that are really needed for a stack. This prevents
accidental misuse (such as confusing Lst_Append with Lst_Prepend) and
clearly communicates what the expected behavior is.
A stack also needs fewer calls to bmake_malloc than an equally-sized
list, and the memory is contiguous. For the nested include path, all
this doesn't matter, but the type is so generic that it may be used in
other places as well.
In the previous brute force search, it seemed there was no string with
that hash code. That was probably an oversight or a little programming
mistake. Anyway, it's possible to get that hash value, so keep the
example.
The previous test vectors didn't contain any hash with a leading zero.
This could have been a simple programming mistake by using %8x instead
of the intended %08x. Using snprintf wouldn't have been possible anyway
since the hex digits are printed in little-endian order, but without
reversing the bits of each digit. Kind of unusual, but doesn't affect
the distribution of the hashes.
The ApplyModifier functions already use this pattern. For simplicity
and consistency Var_Parse should do the same. This saves a parameter to
be passed.
The migration takes place step by step, just like for the Lst functions
a few days ago.
The previous lenient rule came from the sprite.h header that was not
specific to make. To avoid confusion, only the expected values should
be stored in a Boolean variable. To help find obvious violations and
inconsistencies, there are different possibilities for the Boolean type,
during development.
In C there is no way to actually enforce this restriction at runtime.
It would be possible in C++, but the code is not ready to be compiled
with a C++ compiler.
Lst_IsEmpty does not belong in the "create and destroy" group, but in
"query information without modifying anything".
The functions named LstNode_* all belong together. They do not provide
much abstraction, but still they restrict the API and hide a few struct
fields that are only used internally by Lst_Open/Lst_Close and
Lst_ForEach.
Use consistent wording in the documentation of the functions (list,
node, datum).
For a long time, I had assumed that the iteration variables of a .for
loop are just normal global variables. This assumption was wrong but
didn't have any consequences.
The iteration variables of a .for loop can just be accessed like global
variables, therefore it is not obvious that they are implemented in a
completely different way.
There are some edge cases in conditions used inside .for loops, in which
the iteration variables cannot be used like normal variables. An
example is brought up in https://gnats.netbsd.org/47888, which observes
that the defined() and empty() functions in conditions only work with
variables but ignore the iteration "variables", simply because these are
not variables but only expressions.
When the struct stat was used for both calling the actual stat and for
returning the result, no copying was needed. This also had the side
effect that for the first call of cached_stat, the returned struct stat
included all the fields properly filled in, and on later calls, these
fields were all zeroed out.
These two variables are separate now, thus the fields need to be copied
explicitly. There are no existing unit tests for this, but ./build.sh
failed reliably.
Only st_mtime and st_mode are actually filled, the remaining fields had
been set to zero. To prevent these from ever being accessed, a custom
struct make_stat replaces the previously used struct stat.
The fields in struct make_stat are intentionally named different from
the fields in struct stat because NetBSD and some other operating
systems define st_mtime as a macro, and that would not work in a field
declaration.
Back in the 1980s it made sense to have the type information encoded in
the variable names. At the time when make was imported into the NetBSD
tree (1993-03-21), the functions did indeed not have prototypes, they
only had return types. The void type was already invented at that time.
Since the compiler could not verify the types of function parameters, it
made perfect sense to have each variable tell whether it was a pointer
or not.
Since ISO C90 this is no longer necessary since the compiler checks
this. The variable names can now focus on the application level and
their high-level meaning, expressing the relationship to other
variables instead of encoding redundant type information.
The name HTSIZE didn't provide any explanation for the value 191, and it
is obvious that this is a hash table size. Therefore giving the
constant a name didn't explain anything or make it less magic.
The successful cases can be easily tested in the .if conditions. Around
these conditions, there is enough space for explaining the test cases
and their purpose.
The failure cases have been left in the file for now since they still
produce unwanted characters in the output. These characters are not
produced when the parse error occurs in a conditional.
Big thanks go to sjg, who discovered the bug and did the main work to
track it down.
In the unit tests for the :u modifier from the previous commit, I had
forgotten to actually add the :u modifier at the end. I added it now
and also added a few other tests. It's better to have a few more tests
than too few.
The :u modifier had been broken in var.c 1.479 from 2020.08.30.19.56.02.
The code that implements the :u modifier was well-covered in the unit
tests, except for the single line that actually deals with adjacent
duplicate words.
The "refactoring" commit that replaced brk_string with Str_Words had not
taken into account that the number of words (in ac) had to be passed to
WordList_JoinFree. Instead, the number of words was always preserved,
and the words at the end were therefore duplicated in the result.
The fix for this bug will be in the follow-up commit.
The -O3 option of GCC 5.5 is unsure about whether s and t are always
defined, since SuffParseTransform only defines them if it returns TRUE.
Therefore assert that it does. When compiled with -NDEBUG, this would
result in an unused variable, therefore use it using the well-known
cast-to-void pattern.
Just in case anyone wants to use them for copy-and-paste.
The invocations of these macros are left cautious since the
system-provided definition of these macros may have forgotten the
parentheses as well.
This bug had been there since the initial import of make, on 1993-03-21.
It just never broke the build because of the missing assertion.
https://mail-index.netbsd.org/tech-toolchain/2020/08/30/msg003847.html
The error message "cd: can't cd to include" that I got when I first
applied the fix was unrelated. It was caused by an extra directory
"include" in src/tools/compat that didn't belong there. With that
directory removed, running "./build.sh -j8 tools" succeeds as expected.
At the point where Str_Words is called, the string contains no
meta-characters. This means that the parameter "expand" in Str_Words is
never looked at. This in turn means that this parameter can be set to
FALSE, thereby making it impossible that Str_Words returns NULL.
The API is much simpler, and there is less detail that is exposed by
default and fewer punctuation to type on the caller's side. To see that
there is some memory to be freed, one would have to look into the
struct. Having part of the return value as the actual return value and
the rest in output parameters was unnecessarily asymmetrical.
According to jemalloc(3), the variable must be called _malloc_options,
with a leading underscore, to have an effect.
Renaming the variable indeed enables the option. There's not much point
having this variable around though, since it neither detects a trivial
double-free nor freeing an invalid pointer in the following code
snippet:
char *asdf = bmake_malloc(10);
fprintf(stderr, "%c\n", *asdf);
free(asdf + 8);
free(asdf);
free(asdf);
exit(1);
Instead, it just crashes with a segmentation fault.
For one-liners, Lst_ForEach creates a lot of overhead, both at runtime
and for reading and understanding the code.
In this simple case where the structure of the traversed nodes is not
modified, a simple loop is enough. This avoids a lot of conversions to
void * and thus prevents type mistakes.