When compiling the grammar with Bison, it complains:
error: $$ for the midrule at $2 of 'struct' has no declared type
Yacc does not complain, instead it assumes that a midrule has the same
type as the rule itself.
The assignment '$$ = $1' in the midrule action does not influence the $$
of the whole rule, it only assigns to $2. The assignment to $$ was done
via the default action, therefore everything worked as expected. Any
missing assignment in this rule would have been caught quickly by the
strict assertion in mktag.
No functional change.
Since cgram.y 1.270 from today (a "cleanup" commit), the enum constants
were only registered in the symbol table, but they were not added to the
enum type (en_first_enumerator). That information is used for
validating switch statements on enum types.
The actual bug is an off-by-one error in the grammar, in the grammar
rule 'enum_declaration'. Yacc does not notice this obvious error.
Bison does, but it is not involved in building lint.
In the grammar rule 'enum_declaration', the intended $3 contains the
first enumeration constant of the type, while $2, which yacc interprets
as a symbol, contains a null pointer, at least on x86_64.
The existing tests did not cover this scenario, so the bug went
unnoticed.
Since type attributes (and GCC attributes as well) often modify the
preceding identifier, without referring to $$, the easiest way to
integrate them into the grammar is to define a separate grammar rule
that can be placed wherever these attributes are allowed. This avoids
duplicate actions like in direct_param_decl.
No functional change.
enum_decl_lbrace was only used once and was small enough to be inlined.
Renamed expr_statement and added block_item_list_opt to match the
wording from C99.
Added references to C99.
No functional change.
These cannot be resolved as easily as those from the previous commit.
Anyway, the relevant code from the grammar is not yet covered by the
tests, this needs to be done first.
There is no need for extra rules for '__real__(term)' since that is
already handled by the simpler '__real__ term', just a few lines further
up in the grammar. Likewise for __imag__.
The GCC manual does not mention anything about parentheses either.
In GCC 2.95.3, attributes had already been available for functions,
variables and types. At that time they were indeed related to
declarations, and that's where they ended up in lint's grammar. Later,
attributes were extended to labels, enumerators and statements as well.
To keep the grammar for declarations short and comprehensible, move the
rather large part about __attribute__ at the bottom of the grammar,
creating a new section called "GCC extensions".
The grammar rules are not named accurately (and never were). They are
called "type attributes" but apply not only to types. These names will
be improved in a follow-up commit.
No functional change.
An array size is used in several grammar rules for different types of
declarations, therefore it doesn't make sense to place that rule
somewhere in the middle, where it disrupted the flow of notype/type
rules. The whole point of having the notype/type rules grouped is to be
able to quickly compare them, since they are almost equal.
No functional change.
In all but one case, the use of type_attribute_list introduced an
unnecessary ambiguity in the grammar. It appeared in a place where it
could be repeated either by the type_attribute_list or by the enclosing
rule. Both variants have the same effect.
No functional change.
When talking about alignment, offset and size of a type, the measurement
unit must be mentioned in the variable name, especially when it differs
from the standard unit of measurement, which is a byte, not a bit.
No functional change.
The plain character strings in strg_t are saved as an array of unsigned
char. When such a character is passed to ch_isdigit, it requires a
cast. This is exactly the situation that ch_isdigit tried to avoid in
the first place.
No functional change.
a regression in sh. In addition to the intended change (based on the
commit message), an apparently unintended change was made, inverting a
comparison. This broke sh builds and our workaround (so far) was to
compile xlint/lint1 with -O0.
Revert the comparison to what it was before and remove the -O0 hack
from xlint/lint1.
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=91f66e78cc141da77ff9e0e3c8519e1af3f26c07
This theoretically enables strict bool mode for the few remaining code
in scan.l. Since scan.l is not yet detected as generated code, all
interesting errors have to be suppressed though.
Since err.c 1.12 from 2000-07-06, lint allows to suppress individual
error messages. Suppressed error messages do not increment nerr.
Keeping nerr at 0 had triggered the assertion.
If a controlling expression is not of type bool but of any other scalar
type, keep the expression. Its value is still useful for control flow
analysis.
This prevents an assertion failure when running lint on the generated
scan.c, which contains a "while (1)" that does not stem from a system
header. If it did, lint would accept it, see tn_from_system_header. But
"scan.c" is not considered a system header. Maybe lint's definition of
a system header needs to be revisited.
After fixing this, there is another assertion failure though, so scan.c
is not yet ready to be inspected by lint.
Previously, the test dependend on implementation details of the system's
printf command.
Thank you sjg for the detailed analysis on macOS, FreeBSD and Linux.
This is a tiny change in an edge case that does not occur in practice,
which is that the left-hand side of the '&' is explicitly cast to an
enum type. The apparent "loss of information" from the deleted comment
has already been explained in the previous commit.
Since tree.c 1.294 from 2021-06-28 (two days ago), lint errored out on
an implicit function declaration. In principle it is correct to do so
since C99 requires it, but in practice there are a several functions
that are not declared in the translation unit itself since they are
provided by the compiler. Typical examples for GCC and Clang are the
various functions named '__builtin_*' and '__atomic_*'.
For now, only warn about these but don't error out.
In the regular NetBSD builds, this happened in swab.c:65. That line
contains __predict_false, which may or may not be a macro. In other
cases, there may be more than one function call in a single line.
C99 6.2.5p13 says that LCOMPLEX has the same representation and
alignment requirements as an array type containing exactly two LDOUBLE.
When support for _Complex was added to lint in inittyp.c 1.10 from
2008-09-27, there was no explanation for making the bit-size of LCOMPLEX
different from what C99 says, so it was probably a mistake that went
unnoticed for more than 12 years. It's an edge case anyway.
This is a very small step towards having all shared type_t objects only
referenced via const pointers. Since the types may be shared, it is a
bad idea to try to modify them, so better let the compiler check this.
It's a long way to reach this goal, but this small step is already
possible.
No functional change.
The 'identifier' in type_direct_decl is necessary, as demonstrated in
the test d_typename_as_var. Replacing T_NAME with 'identifier' in
notype_direct_decl would increase the shift/reduce conflicts by 6. To
keep this number low, keep everything as-is.
Some tests simply do not work in some environments.
Eg. shell-ksh on macos/arm64
Allow local site to set BROKEN_TESTS to skip those they know
will not work.
Reviewed by: rillig
Originally I only needed a message that would output the type name from
an abstract-declarator (C99 6.7.6), to see whether lint interprets the
types correctly.
Message 155 looked like a good candidate, but it only revealed more
incomplete and untested code in lint.
The previous name 'decl' was ambiguous, it could have meant declaration
as well as declarator. The new names are aligned with C99.
No functional change.
The 'inf' from the type name meant 'information' and was redundant. Each
object of that type represents a single pointer level, which made the
documentation about 'pointers' a bit confusing.
The members of struct qual_ptr are now in the canonical reading order,
which is 'const volatile pointer'.
No functional change.
Previously, lint accepted comma-expressions where only
assignment-expressions are allowed.
This change does not make a difference in practice though since lint is
usually only run on source code that properly compiles. Nevertheless,
rather be precise and accurate since the grammar might some day be
reused on less reliable input.
This does not have any effect in practice since the option -g
(originally meant for GCC extensions to the C standards) implicitly
allows all features from C11, since err.c 1.111 from 2021-04-14.
Since the default lint flags for NetBSD builds include the option -g,
this allows all C11 features.
Currently it is not possible to say "allow GNU extensions but not C11".
Previously, selecting the option -Ac11 allowed features from C11 but at
the same time prohibited 'long long', which was added in C99. This was
caused by the option -s, which is interpreted as "allow features from
C90, but no later".
The test for _Generic, which has been added in C11, demonstrates that
the current implementation is broken. Lint currently thinks that the
return type of a _Generic selection is the type of the expression, but
it really is the type of the selected expression. In the current tests,
this is always 'const char *', but C11 does not require that the types
of a generic selection are compatible.
After the fix from the previous commit (a missing assignment to $$ in an
error case), make sure that there are no other bugs of the same kind, by
manually checking that each rule with a %type assigns $$ in each and
every case. There is one more instance in block_item_list, but that
does not lead to a crash since it affects only a boolean variable, not a
pointer.
It should not be necessary to check for this class of bugs manually, but
neither BSD yacc nor GNU Bison provide any warning option to help with
this scenario. They should have remarked that the %type for
type_attribute is never used, since that is easy to detect. They should
have also warned that the rule for block_item_list does not mention $$
at all.
Detecting the bug from the previous commit would probably be too much to
ask since it involves control flow analysis in the C code. In this
particular case, it would have been necessary to visit each possible
branch from the 'if' statement and ensure that there is a $$ on the
left-hand side of an assignment.
While here, note down several small inconsistencies in the grammar that
should be fixed in follow-up commits.
When a value of a .for loop contained a literal newline, such as from
the expression ${.newline}, that newline was passed verbatim to the
"expanded current body" of the .for loop. There it was interpreted as a
literal newline, which ended the current line and started a new one.
This resulted in several syntax errors.
In cases like these, print a more precise error message.
While size_t is most appropriate for array indexes, make needs to be
compatible with C90, which does not support the %zu printf conversion.
To avoid type casts, use a simple unsigned int here, which is more than
enough for storing a single decimal digit.
No functional change.
This test lived together with a few unrelated tests in moderrs.mk, it is
better placed in varmod-subst-regex.mk though.
While here, extend, document and explain the test since its purpose was
not obvious from reading the code alone.
It is not always an error for a subexpression to have not matched,
since the regex library can/does not convey how many matches are
expected, only report an error if opts.strict (-dL)
Reviewed by: christos
For the possible operators that occur in message 324, print_tnode is
equivalent to op_name, and the latter is simpler.
When the function print_node was added to the code base, it had another
use in init.c, for understanding how initialization works in lint. That
code has since been rewritten completely, therefore print_tnode is no
longer needed. For debugging, display_expression is the better choice
since it has multi-line output and does not suffer from a fixed-length
buffer.
No functional change.
The name v_unsigned suggested that the value would be interpreted as
unsigned, which was wrong. Whether a value is signed or unsigned is
decided by v_tspec instead.
Revert the previous commit for boolen constants since their value is
already interpreted as unsigned, and there is no need for any warning
about differences between traditional C and ANSI C since the _Bool type
has only been added ten years later in C99.
The code for printing a tree node was also confused by this struct
member, even with its old name v_ansiu. That code will be fixed in a
follow-up commit.
No functional change.
In strict bool mode, bool is not an arithmetic type anyway, therefore it
doesn't matter whether the type is signed or unsigned.
C99 6.2.5p6 defines _Bool as one of the "standard unsigned integer
types", so making the constants unsigned is more accurate.
No functional change.
When lint was written in 1995, traditional C was still nearby since C90
had been around for only 5 years. 26 years later, almost all code
adheres to C90 or even C99 or C11, therefore "C90 or later" can safely
be assumed as the default.
No functional change.
Before C99, these tokens were only used in member access expressions.
C99 reused the operator '.' in initializations of structs and unions.
Let the grammar check for syntax errors instead of writing custom code.
No functional change.
A translation unit that contains just 'void x' without the trailing
semicolon had crashed lint1 before:
assertion "dcs->d_ctx == AUTO" failed in declare at decl.c:2049
Found using afl.
If the code contains an unfinished string or character literal at the
EOF, the lexer got into an endless loop. Curiously, inpc() returned 0
in such a case instead of the common EOF.
Found by making lint1 with CC=afl-gcc and then running:
afl-fuzz \
-m 200 \
-i in_dir \
-o lint1 \
$src/usr.bin/xlint/lint1/lint1 @@ /dev/null
String literals may contain null bytes, and these must be passed further
on.
This reintroduces the endless loop in the lexer, but that must be fixed
in another way that doesn't destroy the error handling.
This piece of code did not match the function name and thus could not
reasonably be expected in that function.
In job.c 1.399 from 2021-01-29 I missed exactly this little detail when
I added code to skip the apparently unnecessary creation of empty shell
files. The code I added only handled the happy case, not the case where
the target could not be made.
That code path then differed, leading to a much more verbose error
message than before.
before:
don't know how to make ../missing/no-such.o. Stop
after:
don't know how to make ../missing/no-such.o. Stop
...
`../missing/no-such.o' was not built (made BEINGMADE, ...)!
`muck' was not built (made DEFERRED, type OP_DEPENDS|...)!
`muck' has .ORDER dependency against build-all (made DEFERRED, ...)
Thanks to sjg for finding and reproducing this unintended change of
behavior.
The parameter 'flags' was renamed in job.c 1.354 from 2020-12-10 without
adjusting the documentation.
The parameter 'previous' was removed in job.c 1.108 from 2006-03-12,
also without adjusting the documentation of JobStart.
No functional change.
First and foremost, the test d_c99_complex_split accessed the array
qlmasks out-of-bounds, with an index of 128 for the type 'double
_Complex'. This invoked undefined behavior since the maximum allowed
index was 64.
Replacing the raw array accesses with function calls allows for bounds
checks to catch these errors early.
Determining the value bits for a 'double _Complex' does not make sense
at all since it is not an integer type. This means that lint didn't
handle these types correctly for several years. Support for int128_t
has been added in inittyp.c 1.12 from 2018-09-07, support for _Complex
has been added in inittyp.c 1.9 from 2008-04-26.
Determining the value bits for an int128_t would make sense, but the
unit tests don't contain examples for this type since at the moment all
unit tests must produce the same results on 32-bit and 64-bit platforms,
and the 32-bit platforms don't support int128_t.
The grammar above the parsing code says that a Leaf has nothing to do
with function calls, so don't mix these in the actual code.
No functional change.
Adapt the SMALLPROG / -UWITH_SSL build to also use the fetch_*()
methods from ssl.c, instead of using stdio, as stdio isn't robust
when using interruptable signals.
Disable ssl-specific support in the fetch_*() methods if WITH_SSL
isn't defined, so SMALLPROG still doesn't have ssl support (as expected).
The resulting SMALLPROG binary is slightly larger than before
(e.g., 157KiB vs 153KiB on amd64).
Set version to 20210603 for this fix and the SO_KEEPALIVE fix for PR 56129.
PR install/56219
Attempt to prevent timeouts of the control connection by setting SO_KEEPALIVE.
This matches the equivalent behaviour in ftpd.
Note: This is a much simpler change than adding a background polling event
to invoke "STAT" (or "NOOP") on the control connection during a transfer.
(It's unclear from RFC 959 whether "NOOP" is even permitted during a transfer).
PR bin/56129
In 'make test-coverage', the number of uncovered lines for inline
functions in headers was reported too high. The cause for this is that
gcov reports the coverage for these functions multiple times, once per
translation unit. If some of the translation units don't use these
inline functions, summing the lines containing '#####' quickly leads to
numbers that are obviously too high.
The algorithm is easier to understand when each line of code only
focuses on a single topic.
No change to the resulting binary, except for line numbers in assertion
messages.
1. exit(1) with an error message on stderr if an I/O error occurs.
1a. To work properly when built into /bin/sh sprinkle clearerr() at
appropriate places.
2. Verify that when a 'X data value is used with one of the numeric
conversions, that nothing follows the 'X'. It used to be unclear
in the standard whether this was required or not, it is clear that
with numeric conversions the entire data value must be used, or an
error must result. But with string conversions, that isn't the case
and unused parts are simply ignored. This one is a numeric conversion
with a string value, so which applies? The standard used to contain
an example of '+3 being converted, producing the same as '+ ignoring
the '3' with no mention of any error, so that's the approach we adopted,
The forthcoming version now explicitly states that an error would also
be generated from that case, as the '3' was not used by the numeric
conversion.
2a. We support those conversions with floating as well as integer conversions,
as the standard used to suggest that was required (but it makes no sense,
the values are always integers, printing them in a floating format is
dumb). The standard has been revised to make it clear that only the
integer numeric conversions %d %u %x (etc) are supposed to handle the 'X
form of data value. We still allow it with the floating formats as an
extension, for backward compat, just in case someone (other than the ATF
tests) is using it. It might go away.
2b. These formats are sypposed to convert 'X where 'X' is a character
(perhaps multibyte encoded) in the current LC_CTYPE locale category.
We don't handle that, only 1 byte characters are handled currently.
However the framework is now there to allow code to (one hopes, easily)
be added to handle multi-byte locales. (Note that for the purposes of
#2 above, 'X' must be a single character, not a single byte.)