Only some callers actually needed the updated time, and because of the
many branches, it was difficult to see that the return value was indeed
gn->mtime all the time.
* Replace character literal 0 with '\0'.
* Replace pointer literal 0 with NULL.
* Remove redundant parentheses.
* Parentheses in multi-line conditions are not redundant at the
beginning of a line.
* Replace a few !ptr with ptr == NULL.
* Replace a few ptr with ptr != NULL.
* Replace (expr & mask) == 0 with !(expr & mask).
* Remove redundant braces for blocks in cases where the generated code
stays the same. (Assertions further down in the code would get
different line numbers.)
* Rename parameters in CondParser_String to reflect the data flow.
* Replace #ifdef notdef with #if 0.
The generated code stays exactly the same, at least with GCC 5.5.0 on
NetBSD 8.0 amd64 using the default configuration.
This makes it easier to spot mismatches between the function name and
its first parameter, although the compiler should already catch most of
them. Except for void pointers.
The DirLookup functions work on "name", which may be a complete path,
and on "base" or "cp", which is the basename of the file. Don't use
"name" for the basename, since that would be confusing.
That code is called so seldom that one more memory allocation doesn't
hurt. It needs a wildcard character in a dependency declaration, which
is rare in practice; see dep-wildcards.mk for an example.
This mainly affects the void pointers in callback functions for lists.
These had been necessary once when the parameter type was still
ClientData instead of void pointer.
The function call variant takes more screen space than the direct field
access. Having an abstract API is usually a good idea, in this case of
simple read-only member access it makes the code more difficult to read.
LstNode_Set has been kept as a function since it is not a read-only
accessor function.
While trying to compile the code with GCC's -Wformat-truncation, the
snprintf calls felt quite complicated. The function Dir_FindHereOrAbove
is not in a bottleneck execution path, therefore it doesn't hurt to
dynamically allocate the memory instead of using size-limited stack
memory.
Some types have changed from int to unsigned int, size_t or time_t.
The variable i in hash.c has been kept as int since it counts down to
-1, which generates efficient machine code, at least on x86_64.
The bug had been introduced with dir.c 1.155 on 2020-10-02 22:20:25. In
that commit, openDirectories was replaced with a combination of a list
with a hash table, for more efficient lookup by name.
Upon cleanup, OpenDirs_Done is called, which in turn called
Dir_ClearPath. Dir_ClearPath takes full ownership of the given list and
empties it. This was no problem before since afterwards the list was
empty and calling Lst_Free just frees the remaining list pointer.
With OpenDirs, this list was combined with a hash table, and the hash
table contains the list nodes, assuming that the OpenDirs functions have
full ownership of both the list and the hash table. This assumption was
generally correct, except for the one moment during cleanup where full
ownership of the list was passed to Dir_ClearPath, while the hash table
still contained pointers to the (now freed) list nodes. This by itself
was not a problem since the hash table would be freed afterwards. But
as part of Dir_ClearPath, OpenDirs_Remove was called, which looked up
the freed directory by name and now found the freed list node, trying to
free it again. Boom.
Fixed by replacing the call to Dir_ClearPath with code that only frees
the directories, without giving up control over the list.
The bug had been introduced with dir.c 1.155 on 2020-10-02 22:20:25. In
that commit, openDirectories was replaced with a combination of a list
with a hash table, for more efficient lookup by name.
Upon cleanup, OpenDirs_Done is called, which in turn called
Dir_ClearPath. Dir_ClearPath takes full ownership of the given list and
empties it. This was no problem before since afterwards the list was
empty and calling Lst_Free just frees the remaining list pointer.
With OpenDirs, this list was combined with a hash table, and the hash
table contains the list nodes, assuming that the OpenDirs functions have
full ownership of both the list and the hash table. This assumption was
generally correct, except for the one moment during cleanup where full
ownership of the list was passed to Dir_ClearPath, while the hash table
still contained pointers to the (now freed) list nodes. This by itself
was not a problem since the hash table would be freed afterwards. But
as part of Dir_ClearPath, OpenDirs_Remove was called, which looked up
the freed directory by name and now found the freed list node, trying to
free it again. Boom.
Fixed by replacing the call to Dir_ClearPath with code that only frees
the directories, without giving up control over the list.
As long as there are less than 20 open directories, it's perfectly fine
to use a doubly-linked list for name lookup. A singly linked list or
even an array list would have been better, but anyway.
When the number of directories rises above 1000, which happens with
dirdeps.mk, linear list lookup becomes too expensive, especially since
each list entry is compared using a strcmp call, in a callback function
that is not inlined.
Using a hash table is much more efficient than linear lookup. While
here, abstract all operations regarding the openDirectories list into a
new data type that provides a simple and straight-forward API. This
strongly typed API is especially important since the current
implementation of the list and hash table is weakly typed, using void *
for the actual data, and StringList and CachedDirList refer to the
exactly same type, they just have different names to help the human
readers but don't provide any type safety.
These functions made the code larger than necessary. The prev and next
fields are published intentionally since navigating in a doubly-linked
list is simple to do and there is no need to wrap this in a layer of
function calls, not even syntactically. (On the execution level, the
function calls had been inlined anyway.)
Since the callback function returns a terminating condition, this is not
really a foreach loop.
Many of the calls to Lst_ForEachUntil don't make use of the terminating
condition, and several don't modify the list structurally, which means
they don't need this complicated implementation.
In a follow-up commit, Lst_ForEach will be added back with a much
simpler implementation that iterates over the list naively, without a
terminating condition and without taking the iteration state from
Lst_Open/Lst_Next/Lst_Close into account. The migration to this simpler
implementation will be done step by step since each callback function
needs to be examined closely.
The word "path" is commonly used either as an abbreviation for pathname
(a string consisting of several directory or file names) or as an
abbreviation for search path (a list of directory names used for
searching files), but not for a single directory.
These blocks mostly consisted of redundant structure, following the same
#ifndef pattern over and over, with only minimal variation.
It's easier to maintain if the common structure is only written once and
encapsulated in a macro.
To avoid "defined but unused" warnings from GCC in the case where
MAKE_NATIVE is not defined, I had to add volatile. Adding
MAKE_ATTR_UNUSED alone would not preserve the rcsid variable in the
resulting binary.
When the openDirectories path list is cleaned up, each path from it is
first dequeued and then freed via Dir_Destroy. At this point, the path
is no longer in openDirectories, which triggered an assertion in
Lst_Remove, called by Dir_Destroy.
When the struct stat was used for both calling the actual stat and for
returning the result, no copying was needed. This also had the side
effect that for the first call of cached_stat, the returned struct stat
included all the fields properly filled in, and on later calls, these
fields were all zeroed out.
These two variables are separate now, thus the fields need to be copied
explicitly. There are no existing unit tests for this, but ./build.sh
failed reliably.
Only st_mtime and st_mode are actually filled, the remaining fields had
been set to zero. To prevent these from ever being accessed, a custom
struct make_stat replaces the previously used struct stat.
The fields in struct make_stat are intentionally named different from
the fields in struct stat because NetBSD and some other operating
systems define st_mtime as a macro, and that would not work in a field
declaration.
Back in the 1980s it made sense to have the type information encoded in
the variable names. At the time when make was imported into the NetBSD
tree (1993-03-21), the functions did indeed not have prototypes, they
only had return types. The void type was already invented at that time.
Since the compiler could not verify the types of function parameters, it
made perfect sense to have each variable tell whether it was a pointer
or not.
Since ISO C90 this is no longer necessary since the compiler checks
this. The variable names can now focus on the application level and
their high-level meaning, expressing the relationship to other
variables instead of encoding redundant type information.
Lst_Find is called with a "comparison" function that returns the integer
0 if the desired node is found. This leads to confusion since there are
so many different return value conventions for int, such as 0/1 for
mimicking false/true, -1/0 as in close(2), and the sign as in strcmp(3).
This API is much easier to understand if the "comparison" function is
not called a comparison function (since that is too close to strcmp),
but a "match" function that just returns a boolean.
In Lst_FindFromB, the node argument may be null. This deviates from the
other Lst functions, which require Lst and LstNode to generally be
non-null. In this case it is useful though to make the calling code
simpler.
In arch.c, this makes a lot of the previous documentation redundant.
In cond.c, the documentation is reduced a little bit since it had
already been cleaned up before. It also removes the strange negation
from CondFindStrMatch.
In dir.c, the documentation collapses as well.
In main.c, separating the ReadMakefile function from the callbacks for
Lst_FindB allows the former to get back its natural function signature,
with proper types and no unused parameters.
To catch any accidental mistakes during the migration from Lst_Find to
Lst_FindB, the code can be compiled with -DUSE_DOUBLE_BOOLEAN, which
will complain about incompatible function pointer types.
Extract the null check for path to the top level. This has the
side-effect of only incrementing dotLast.refCount if that entry is
actually used.
Reduce the indentation of the code by returning early from the simple
branches.
The migration from null-passing Lst functions to argument-checking Lst
functions is completed.
There were 2 surprises: The targets list may be NULL, and in Dir_AddDir,
the path may be NULL. The latter case is especially surprising since
that function turns into an almost-nop in that case. This is another
case where probably 2 independent functions have been squeezed into a
single function. This may be improved in a follow-up commit.
All other lists were fine. They were always defined and thus didn't
need much work.
In most cases the Lst functions are only called when the arguments are
indeed valid. It's not guaranteed though, therefore each function call
needs to be analyzed and converted individually.
While here, remove a few statements that were only useful when the Lst
functions handled circular lists.
Lst_Duplicate would have passed through any null pointer, which was not
needed for make. It was the last function that used Lst_AtEnd, which in
turn was the last function that used LstInsertAfter. As a result, these
two functions have been removed.
Since the lists don't contain null pointers, it doesn't make sense to
search for a null pointer. All calls but one already had obviously
non-null arguments. The one remaining call using targ->suff has been
guarded for now.
The code for Lst_Member became much simpler than before. Partly because
the old code had an extra condition for circular lists, which are not
used by make.