This allows to process lib/libc/gen/sysctl.c 1.38 from 2021-03-30, as
well as its precedessor 1.37, which had a workaround just for lint.
While unusual, C99 allows these.
This check is not strictly necessary since any C99 compiler must
diagnose them as well, it is rather meant for demonstrating how to do
the check in lint, and for symmetry with the 'unknown member' error
message. These provide insight into how the data structures in init.c
are meant to be accessed.
The previous implementation had a wrong model of how initialization
happens in C99, its assertions failed in all kind of edge cases and it
was not possible to fix the remaining bugs one at a time without running
into even more obscure assertion failures.
The debug logging was detailed but did not help to clarify the
situation. After about 20 failed attempts at fixing the small details I
decided to start all over and rewrite the initialization code from
scratch. I left the low-level parts of handling designators, the code
that is independent of brace_level and the high-level parts of how the
parser calls into this module. Everything else is completely new.
The concept of a brace level stays since that is how C99 describes
initialization. The previous code could not handle multi-level
designations (see d_init_pop_member.c). There are no more assertion
failures in the initialization code.
Some TODO comments have been left in the tests to keep the line numbers
the same in this commit. These will be cleaned up in a follow-up
commit.
The new implementation does not handle initialization with "missing"
braces. This is an edge case that both GCC and Clang warn about, so it
is not widely used. If necessary, it may be added later.
The new implementation does not use any global variables in the vast
majority of the functions, to make all dependencies and possible
modifications obvious.
For consistency with its type prefix and the other variables. This
variable is used so often that it makes sense to abbreviate it.
No functional change.
The 'cnt = level->bl_type->t_tspec == STRUCT ? 2 : 1;' in
initialization_push_struct_or_union is obviously wrong since not every
struct has exactly 1 remaining member after the first member that has an
initializer with designation.
This bug started its life in init.c 1.12 from 2002-10-21, a little over
18 years ago.
This removes 7 wrong warnings when running lint in -t mode.
Surprisingly, this added a warning that had not been there before in
msg_189.c. This is because check_variable_usage skips the checks when
an error occurred before. All diagnostics that happened were warnings,
but the -w option treats them as errors, see vwarning.
The following code is valid:
int valid = {{{ 3 }}};
C90 3.5.7 and C99 6.7.8 both say that the "initializer for a scalar
shall be a single expression, optionally enclosed in braces". They
don't put any upper bound on the amount of braces, not even in the
"Translation limits" section.
It did not make sense to have this code between the code for the
designation and the brace level. Since it is independent of all these
types, move it to the top.
No functional change.
Previously, the code accessed the global variable for the designator
several times, even though the designator cannot change during this part
of the code. Make this obvious by passing this designator as a
parameter instead.
No functional change.
The '#ifndef' in tyname.c is meant to distinguish between lint1 and
lint2, it is not meant to be defined from anywhere outside the lint code
itself.
No functional change.
In add_directory_replacement, the expression 'r->repl - r->orig' looked
strange, as if two pointers into separate objects were subtracted.
The code was probably optimized to a particular compiler on a particular
platform to generate fast and simple code. Since compilers have made
considerable progress over the last 25 years, optimize the code for
human legibility instead. The compilers will somehow cope with that.
No functional change.
Before: error: expected undefined [99]
After: error: 'expected' undefined [99]
Seen in external/mpl/bind, which for Clang defines in stdatomic.h:
> #define atomic_exchange_explicit(obj, desired, order) \
> __c11_atomic_exchange_explicit(obj, expected, order)
Note the mismatch between 'desired' and 'expected'.
That statement may have been true in 1993, but definitely is not true
anymore, as of 2021.
The part about "needs to be completely redone" is still true though
since indent cannot even format its own source code in an acceptable
way.
The new format follows the common conventions for file locations and
allows quick navigation in IDEs.
To trigger an internal error, it suffices to have 2 tokens in lint1's
input, after preprocessing: 'void __pure'.
C99 does not define names for the head parts of the 'for' statements, it
just calls them clause-1, expression-2 and expression-3. Therefore the
rather abstract name 'expr3'.
No functional change.
The previous names were highly ambiguous. The 'decl' could have meant
'declaration', which would be the usual abbreviation. It could also be
split into 'dec' and 'l', meaning 'declaration level', which would make
more sense in this particular context.
To avoid having to guess anything about these names, rename the
functions. Instead of 'push' and 'pop', I renamed them to 'begin' and
'end' since these are the high-level operation that are of interest.
That the hierarchy of declaration levels is implemented as a stack is
nice to know but not as important to understand the whole situation.
No functional change.
This makes the code several lines longer but way more readable. In the
previous dense expression it was hard to see what was going on at all
and that there are two completely separate situations in which this
warning applies.
No functional change.
From the previous commit, there was an off-by-one error left, which was
due to the interaction between designation_add_subscript and
extend_if_array_of_unknown_size.
The other crucial point was to call initstack_pop_nobrace before
accessing the "current initialization stack element". Without this
call, in msg_168.c the "current element" would point to the initializer
level for 'const char *' instead of the one for 'array of const char *'.
One more step towards supporting C99.
Initialization is still buggy but better than before. The remaining bug
is that only the first designator determines the array size, and after
that, the array is no longer considered of unknown size. This
contradicts C99. More improvements to come.
One of the latest "refactorings" introduced a small and practically
unimportant memory leak. If the last initializer in an initialization
had a designator, that designator was not freed.
When the "current initialization" was still a global variable with
unlimited lifetime, it was freed at the beginning of the next
initialization. After the refactorings, this "next initialization"
could no longer see anything from the previous initialization since all
references have been cleaned up at that point. Freeing the memory so
late and in an almost totally unrelated place was a bad idea anyway.
There are far too many places that modify the top element of the
initializer stack. Each of these places should get its own logging, as
long as the code is still complicated. These places are now clearly
marked with initstk_lvalue.
No functional change.
This makes it possible to accurately model C99 initializers, including
their optional designators. Previously, array subscripts had not been
modeled at all.
In the previous commit, debug_designation crashed immediately since I
had not run the code in debug mode even once. The condition 'name !=
head' was a left-over from the old times where the designator was still
a circular list.
No functional change outside debug mode.
This has been a long-standing limitation of lint. Now it is almost
ready for C99, see the list of "major changes" in the foreword of C99.
One known remaining bug in the area of initialization is designators
with several levels, such as '.member[2].member.member'. Oh, and
designators for arrays are only supported in the parser but not in the
type checker. There's still some work to do.
Missing braces after 'if', since init.c 1.68 from 2021-02-20.
GCC 10 doesn't complain about this even with -Wmisleading-indentation
since at least one of the involved lines is a macro invocation (in this
case both lines). GCC 11 will warn about this.
Clang warns about this, but the regular Clang build currently fails for
other reasons, so this problem didn't show up there either.
Now that the code contains explicit markers for starting and ending an
initialization, and having the guarantee that an assertion fails
whenever some code accesses the state of the "current initialization"
even though there is no ongoing initialization gives me much more
confidence in the correctness of the code. The calls to
begin_initialization and end_initialization always appear in pairs,
enclosing the minimal amount of code necessary for initialization.
In a nutshell, global modifiable state is error-prone and hard to
understand.
A nice side effect is that the grammar no longer needs a special rule
for the outermost initializer since the functions for the debug logging
are now called explicitly.
The code that misuses the initialization state just because it needs to
temporarily store a sym_t somewhere is now clearly marked as such. A
GCC statement expression can appear anywhere and is therefore
independent of the initialization. Most probably the code can simply
refer to the local variable in the grammar rule itself, or this variable
needs to be encoded in the grammar %union. For sure there is a better
way to handle this.
There is no longer a need that the function 'declare' initializes the
initialization state, it was just the wrong place to do this.
This indirection will be needed to handle nested initializations, which
are a new feature of C99. These are currently not handled correctly,
see msg_171.c.
No functional change.
While here, reword the message, avoiding operators and parentheses.
Since 2021-01-02, providing the precise type name is as easy as the
broad type classification (just replace tspec_name with type_name), and
it's definitely more useful to the human readers.
When determining the reachability of a statement, the idea was that
whenever 'reached' was set to false, 'rchflg' (the abbreviation for "do
not warn about unreachable statements") would be reset as well.
In some (trivial) cases, this was done, but many more interesting cases
simply forgot to set this second variable. To prevent this in the
future, encapsulate this in a simple helper function.
Now even if a statement is reachable, 'rchflg' gets reset. This does
not hurt since as long as the current statement is reachable, the value
of 'rchflg' does not matter.
No functional change. There would be quite a big functional change
though if check_statement_reachable were to reset 'rchflg' instead of
'reached', as the comment already suggests. In that case, with the
current code, many legitimate warnings about unreachable statements
would be skipped, especially those involving 'if' statements, since
these didn't reset 'rchflg' properly before.
Previously, only loop statements were considered for reachability. This
ignored the possibility of an early return in an if statement, or
unreachable branches.
This makes it easy to click on the location in the IDE instead of having
to manually parse the location and navigate to it.
No functional change outside debug mode.
Several tokens can only ever map to a single operator and thus do not
need to encode the operator. Indeed, they already encoded it as NOOP,
and it was not used by any grammar rule.
No functional change.
It's enough to have modtab, which describes the properties of the
various operators. There is no need to have a second table imods that
holds the same content. Rather make modtab constant as well.
The only possible functional change is that the names of the internal
operators 'no-op', '++', '--', 'real', 'imag' and 'case' may appear in
diagnostics, where previously lint invoked undefined behavior by passing
a null pointer for a '%s' conversion specifier.
The abbreviations in the table of operator properties had been wrong
since ops.def 1.10 from 2021-01-12, when strict bool mode was added. In
an earlier working draft, I had named that column 'takes_others' instead
of 'requires_bool', that's where the 'o' came from.
The names of the macro arguments had been wrong since op.h 1.11 from
2021-01-09, when the order of the columns changed and the macros were
not adjusted accordingly. Since all the properties of the operator
table are uniform, this didn't result in any bugs, it was just confusing
for human readers.
Clang-tidy suggests to enclose the macro arguments in oper.c in
parentheses but that is not possible since the arguments are either
empty or 1, and the syntactical ambiguity of the '+ 0' being either a
unary or a binary operator is needed here.
No change to the resulting binary.
Since today, lint's strict bool mode requires initializers to have the
correct type. The flags in kwtab are of type bool and were initialized
with an int, for brevity. Keep the brevity and do the conversion from
int to bool in a macro.
By defining several macros for the different kinds of keywords, reduce
the clutter of having 2 additional zeroes per line. The macros also
remove the redundancy and thereby the possible inconsistency of filling
the wrong fields since these depend on the token type.
No functional change. The only change to the binary is due to the
changed line numbers in the calls to lint_assert.
In lint's strict bool mode, initialization must be of the correct type.
This affects the bool fields in ttab_t, which are initialized with int.
To keep the code brief, preserve these ints and let a macro do the
actual work of converting them to bool.
No change to the generated binary.
C99 6.7.8p11 says for initialization that "the same type constraints and
conversions as for simple assignments apply", so actually apply them.
(I had just forgotten this "operator" when I first implemented strict
bool mode.)
Even though clear_warning_flags and its companions are implemented as
macros, they act like ordinary functions. Do not distract the reader by
using uppercase names for them.
No functional change.
The new code may not be the most beautiful, but it fixes all bugs that
occurred while testing message 327. The grammar rules are taken from
C99 6.8.2, so it's no surprise they work well.
The code in the C grammar is generated by yacc and is not checked by
lint's strict bool mode, therefore the replacement was done manually.
No change to the resulting change.
My previous commit message was wrong in saying that the '%prec' was
necessary. It is not necessary.
Most probably I misspelled the name of the grammar rule as opt_comma
instead of comma_opt, which would lead to the same number of conflicts
in the grammar plus a warning, but no build failure.
The '%prec T_COMMA' is necessary to avoid lots of parse errors in the
lint1 unit tests. Curiously, further down in the grammar, for compound
literals, the '%prec T_COMMA' is not necessary, even though the context
looks very similar.
No functional change.
Previously, the grammar syntactically accepted the following code:
int var = .member = 12345;
The designation '.member =' can only be used with brace-enclosed
initializers.
In d_c99_init.c, the initialization of array_with_designator failed.
The designator '.member' from that initialization was not cleaned up
before starting the next initialization.
These two functions are supposed to model the designator that is used
for initializing structs and arrays. The implementation is still buggy
and does not work at all for C99 designators with multiple names, see
d_init_pop_member.c.
For now, just rename the functions to head in the right direction.
No functional change.
The operator NAME has the name 'name', therefore no special case is
needed.
Having the words 'with type' in the message makes the message easier to
find from the debug log. Given that the operator name is used unquoted,
the log message 'name: int value=111' was nearly impossible to find in
the code.
Replace the '()' with an actual word, to avoid any confusion about
whether the type name might be a function type without prototype.
Reduce the amount of '=' signs, instead use commas to separate the
properties of the node.
No functional change outside debug mode.
The previous name could be too easily confused with the type qualifier
'const'. The operator name is mainly used in the debug log, only
occasionally in the output. Since 'constant' is not a "real" operator,
it probably doesn't occur in messages at all.
The debugging code is needed by the soon-to-be-added proper handling of
array subscript initializers such as '.member[123].member = 12345'.
No functional change.
This way, the code is covered by running 'make lint'. The code from the
grammar is not covered, therefore it still uses int instead of bool in a
few places.
Inline the comparison functions for uint64_t. These functions didn't
add any clarity to the code.
No functional change.
The new name accurately describes the structural element that holds such
properties as the separator character and whether the expression value
is considered a single word. The old name ApplyModifiersState was too
long and was meant as a placeholder anyway, when I introduced it in
var.c 1.236 from 2020-07-03.
This is an edge case that doesn't occur in practice since pretty much
nobody dares to use variable names that contain an actual '$' in their
name. This is not about the fairly common VAR.${param} (as written in
the makefile), but instead about the variable whose name is literally
'VAR.${param}'.
The test demonstrates that after the fix, the variable name is taken
exactly as-is for the simple assignment modifier '::='. There are no
such tests for the modifiers '::+=', '::!=' and '::?=', but that's ok.
The code in ApplyModifier_Assign would look assymetrical and suspicious
enough if one of these modifiers would expand its variable name and the
others wouldn't.
In CLEANUP mode, was originally meant to track memory allocations but is
useful during debugging as well, initialize the list. There is no
distinct constant representing an invalid pointer, otherwise that would
have been an even better choice.
An enum with 32 bits would lead to signed integer overflow anyway, so
that definition is not worth keeping even if it works on typical
2-complement platforms.
The definitions for 2, 4 and 8 enum have been unused for several months
now.
No functional change.
This makes the code easier to read, especially in var.c. It also makes
debugging sessions easier since some debuggers don't show enum
bit-fields symbolically as soon as more than one bit is set.
The code outside var.c is basically unchanged, except that instead of
passing the individual flags, there are 4 predefined evaluation modes.
These suffice for all practical use cases. Only in the implementation
deep inside var.c, the value of the flags keepDollar and keepUndef
differs.
There is no way of passing the struct to EnumFlags_ToString, which means
the ToString function has to be spelled out explicitly. This allows for
fine-tuning the representation in the debug log, to reduce the amount of
uppercae letters.
No functional change.
The name 'NONE' described the bit pattern, which was not useful to
understand its meaning. Omitting VARE_WANTRES only parses the
expression, without evaluating any part of it.
No functional change, not even in debug mode since Enum_FlagsToString
always returns "none" for all-bits-unset.
This is just to keep the code consistent among the various variable
modifiers. The performance gain is negligible.
The actual assignment to the variable had already been skipped
previously.
No functional change.
No functional change in practical usage. Theoretically this change can
be observed by looking at the generated random numbers for the ':Ox'
modifier, but the quality or exact sequence of these random numbers is
not guaranteed anyway.
In parse-only mode, variable expressions in the argument to that
modifier are not resolved. This led to the error message about the 'Bad
modifier' in var-eval-short.mk.
This affects the modifiers ':E', ':H', ':P', ':Q', ':R', ':T', ':hash',
':q', ':range', ':tl', ':ts', ':tu', and ':u'. All these modifiers are
side-effect free.
Skipping the evaluation for these modifiers is purely for code
consistency and performance.
No functional change.
No functional change, just a tiny bit of performance improvement,
probably not even measurable. Having the code nevertheless serves as a
copy-and-paste template for implementing other modifiers that might
perform more costly tasks.
The test 'var-eval-short' had produced the output 'unexpected' before,
on stderr. It had been generated by '${:Uword:@${FAIL}@expr@}' by
combining the following obscure "features" of make:
1. the ':@' modifier loops over the words of the variable. This
modifier is not really obscure, it still takes some time to get used
to it.
2. the ':@' modifier allows a '$' sign in the variable name, which is
useless in practice.
3. the ':@' modifier creates a temporary loop variable in the global
namespace. Luckily there are only few collisions with other
variable names since their naming conventions differ.
4. after looping over the words of the expression, the temporary global
loop variable is deleted, and at that point the '$' is expanded,
being interpreted as the start of a variable expression.
5. The ':@' modifier deleted the global variable even when it was
called in parse-only mode (without VARE_WANTRES).
When the modifier ':@' was initially added to make in var.c 1.40 from
2000-04-29, Var_Delete didn't expand the variable name. That feature
was added in var.c 1.174 from 2013-05-18, probably without thinking of
this very edge-casey combination of features.
This commit fixes item 5 from the above list. The other obscurities
remain for now.
This way, parsing and evaluating the modifier is only written once in
the code. The downside is that the variable name is allocated even if
VARE_WANTRES is not set, but since this modifier is so obscure and
seldom used this doesn't matter in practice.
This edge case had been so obscure that even discovering this takes
quite some time and requires reading the source code of make.
The manual page doesn't document whether the variable name is expanded
or not, it doesn't even give an example. When this obscure modifier was
initially added in var.c 1.210 from 2017-01-30, Var_Set always expanded
the variable name once, and there was no way around it. Therefore this
expansion has probably been unintentional.
See var-eval-short.mk:46 for the test demonstrating this change.
Previously, the expression ${:Uword:_=VAR} was evaluated including all
its side effects even though it was in an irrelevant branch of the
condition.
No functional change since the only caller of TryParseIntBase0 already
handles all possible parse errors. Without this check, the code just
looked wrong though.
TryParseIntBase0 wrongly returns successful for a string that does not
start with a number at all. Its only caller, ApplyModifier_Words,
already handles all error cases properly.
No functional change.
This aligns the implementation of these modifiers with the requirements
in the long comment starting with 'The ApplyModifier functions'.
No functional change.