column == 1 + indentation.
In addition, indentation is a relative distance while column is an
absolute position. Therefore, don't confuse these two concepts, to
prevent off-by-one errors.
No functional change.
Previously, the '/*' in the string literal had been interpreted as the
beginning of a comment, which was wrong. Because of that, the variable
declaration in the following line was still interpreted as part of the
comment. The comment even continued until the end of the file.
Due to indent's forgiving nature, it neither complained nor even
mentioned that anything had gone wrong. The decision of rather
producing wrong output than failing early is a dangerous one.
At least, there should have been an error message that at the end of the
file, the parser was still in a a comment, expecting the closing '*/'.
Since several years (maybe even decades) compilers know how to inline
static functions that are only used once. Therefore there is no need to
have overly long functions anymore, especially not 'main', which is only
called a single time and thus does not add any noticeable performance
degradation.
No functional change.
The word 'col' should only be used for the 1-based column number. This
name is completely inappropriate for a line length since that provokes
off-by-one errors. The name 'cols' would be acceptable although
confusing since it sounds so similar to 'col'.
Therefore, rename variables that are related to the maximum line length
to 'line_length' since that makes for obvious code and nicely relates to
the description of the option in the manual page.
No functional change.
These two functions operated on column numbers instead of indentation,
which required adjustments of '+ 1' and '- 1'. Their names were
completely wrong since these functions did not count anything, instead
they computed the column.
No functional change.
The goal is to only ever be concerned about the _indentation_ of a
token, never the _column_ it appears in. Having only one of these
avoids off-by-one errors.
No functional change.
Using the invariant 'column == 1 + indent'. This removes several overly
complicated '+ 1' from the code that are not needed conceptually.
No functional change.
Together with the results of the tokenizer and the 4 buffers for token,
label, code and comment, the debug log now provides a good high-level
view on how the indentation happens and where to look for the many
remaining bugs.
Whenever the code to be output contained the magic byte 0x80, instead of
writing this byte, indent wrote the column number at the beginning of
the code snippet, times 7. Especially the 'times 7' does not make any
sense at all.
In ISO-8859-1, this character position is not assigned. In Microsoft
Codepage 1252 it is the Euro sign. In UTF-8 (which was probably not on
the author's list when the code was originally written) it occurs as the
middle byte for code points like U+2026 (horizontal ellipsis) from the
block General Punctuation.
Remove this strange code, thereby fixing indent for UTF-8 code. The
code had been there since at least 1993-04-09, when it was first
imported to NetBSD.
Calculating the indentation is simpler than calculating the column,
since that saves the constant addition and subtraction of the 1.
No functional change.
Since C90, there is no need to repeat the type of the function
parameters.
In the whole code of indent, there is a lot of confusion between the
concepts of a 'column' (which is a position on the screen, counting
starts at 1) and 'indentation' (which is a length, not a position). To
avoid this confusion, the code will be rewritten anyway very soon.
Repeatedly adding and subtracting 1 from the 'current column' is not
elegant, this should rather be done by consistently measuring only the
indentation from the left border (at offset 0), as a distance, not as an
absolute position.
Column counting starts at 1. This 1 should rather be at the beginning
of the formula since it is thought of being added at the very beginning
of the line, not at the end.
When adding a tab, the newly added tab is added at the end of the
string, therefore that '+ 1' should be at the end of the formula as
well.
No functional change.
This allows to add debug logging to these few functions instead of all
other places that might output something.
Reducing the possible output formats to a few primitives makes dump_line
simpler, especially the fprintf calls. It also removes the non-constant
printf string.
The call to output_int may be meant for debugging, as the character 0x80
is unlikely to appear in any real-world code.
No functional change.
Having it directly below the table makes it easier understandable.
I also tried to omit this function entirely by moving the code into the
initializer itself, but that made the code redundant and furthermore
increased the size of the resulting binary, probably because of the new
relocation records.
No functional change.
This check has been too quick and broke the lint build. Among others,
lib/libpuffs has -w included in LINTFLAGS, which means that the build
can fail even for new warnings, not only for errors.
libpuffs compares a uint16_t with constants from an unnamed enum type.
Since the enum type is completely unnamed (neither a tag nor a typedef),
there is no way to define a struct member having this type. This was a
scenario that I just didn't consider when I added the check to lint.
For now, disable the new check completely. The previously existing lint
checks stay enabled, including the one that warns about mismatched
anonymous enum types in the '==' operator, which is very similar to the
now disabled check.
The previous 'casestmt' was wrong since a case label is not a statement
at all.
The previous 'swstmt' was overly short, and wrong as well, since it
represents only the 'switch (expr)' part, which is not a complete switch
statement. Same for 'ifstmt', 'whilestmt', 'forstmt'.
The previous word 'head' was not precise enough since it didn't specify
exactly where the head ends and the body starts. Especially for
handling the dangling else, this distinction is important.
No functional change.
This refactoring reduces the indentation of the code, as well as
removing any ambiguity as to which 'switch' statement a 'break' belongs,
as there are no more nested 'switch' statements.
No functional change.
It's strange that indent's own code is not formatted by indent itself,
which would be a good demonstration of its capabilities.
In its current state, I don't trust indent to get even the tokenization
correct, therefore the only safe way is to format the code manually.
It may have been a clever trick to use the same memory layout for struct
templ and a string pointer, but it's not worth the extra comment and
difficulty in understanding the code.
No functional change.
No functional change since neither rw_jump nor rw_inline_or_restrict is
mentioned in any switch statement, and lint didn't find any other
suspicious enum operations.
This reduces the magic numbers in the code. Most of these had their
designated constant name written in a nearby comment anyway.
The one instance where arithmetic was performed on this new enum type
(in indent.c) was a bit tricky to understand.
The combination rw_continue_or_inline_or_restrict looks strange, the
'continue' should intuitively belong to the other control flow keywords
in rw_break_or_goto_or_return.
No functional change.
It's a funny idea to do something like 'case a = 13:', but since any
compiler will reject this code, there is no point in lint supporting it.
No functional change since everywhere the grammar allows a constant
expression, there is no ambiguity where an assignment could be
interpreted differently.
C99 6.4.4 already defines a grammar rule named 'constant' for an number
literal or an enum constant, so don't use that name for something else.
No functional change.
According to the GCC documentation[1], the high end of the range is
inclusive as well, which makes sense since otherwise there would be no
way of specifying a range that includes the maximum representable
number.
Since the range is not used at all in the code, none of the tests could
possibly fail.
[1] https://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html
No functional change.
There was no point in having a separate grammar rule for the '3...' part
of a range expression, it just made the code more complicated than
necessary.
No functional change.
before: array of unsigned int[4]
now: array[4] of unsigned int
Listing the array dimension first keeps it in contact with the keyword
'array'. This reduces confusion, especially for nested arrays.
The main ingredient for understanding how indent works is the tokenizer
and the 4 buffers in which the text is collected.
Inspecting this debug log for the test comment-line-end makes it obvious
why indent messes up code that contains '//' comments. The cause is
that indent interprets '//' as an operator, just like '&&' or '||'. The
sequence '/////' is interpreted as a single operator as well, by the
way.
Since '//' is interpreted as an ordinary operator, any words following
it are plain identifiers, usually several of them in a row, which is a
syntax error. Depending on the context, the operator '//' is either a
unary operator (no space around) or a binary operator (space around).
This explains why the word 'line-end' is expanded to 'line - end'.
No functional change outside of debug mode.
This is a prerequisite for converting the token types to an enum instead
of a preprocessor define, since the return type of lexi will become
token_type. Having the enum will make debugging easier.
There was a single naming collision, which forced the variable in
scan_profile to be renamed. All other token names are used nowhere
else.
No change to the resulting binary.
This is something that neither GCC 10 nor Clang 8 do, even though it
seems useful. Lint didn't do it up to now, but that was probably an
oversight since it is easy to miss the implicit '==' operator in the
switch statement.
assuming that everything that isn't a list is a tailq. Fixes random
reads from kmem that either fail or return incorrect data for the vcache
hash table.
Previously, the test msg_056.c warned twice about the integer literal,
but only on 32-bit platforms. On 64-bit platforms, there was only a
single warning since the integer constant was converted to type
__uint128_t, and this prevented the second warning. On 32-bit targets,
there is no __uint128_t though.
Fixes part of PR bin/55976.
The broad type of a value is indeed stored in the value itself, in the
member v_tspec. For nodes that refer to this value, it is redundantly
stored, it always equals tn->tn_type->t_tspec.
After initialization, neither tn->tn_type nor val->v_tspec are modified.
This is not ensured by the compiler but has to be analyzed manually.
No functional change.
Having the measurement unit in the variable name prevents accidental
confusion between bits and bytes, especially since usually the word
'size' means the size in bytes, while 'width' means the size in bits, at
least for integer types.
No functional change.
Each of the tests named msg_*.c repeats the template of the message, to
make the test somewhat self-contained when viewed in isolation.
This creates a redundancy, and keeping track of this manually is next to
impossible. I tried it and failed in 9 cases, even though it has just
been 2 months since I myself created the initial files and I knew all
the time that this redundancy exists.
Be fool-proof for the future by checking this automatically.
These expressions are indeed constant for a specific platform, but on
another platform their value may change. This makes them unsuspicious
and legitimate for portable code.
Seen in rump_syscalls.c, as 'sizeof(int) > sizeof(register_t)'.
Previously, 'typedef enum { E } name' was output as 'name', which
omitted the information that this was an enum type. Now it is output as
'enum typedef name'.
Previously, 'typedef struct { int member; } name' was output as 'struct
<unnamed>', which omitted the typedef name. Now it is output as 'struct
typedef name'.
Message 153 didn't state obviously which of the pointer types was the
one before conversion (or cast) and which was the resulting type.
Message 229 didn't have any type information at all.
This warning occurs more than 7400 times in a regular NetBSD build, and
without giving any type information, leaves the reader clueless about
what the underlying issue might be. Add type information since that is
a no-brainer to implement.
For performance reasons, the implementation of the simple rule "cmdline
overrides global" grew into code that is much more complicated than a
straight-forward implementation. This added complexity made it easy for
bugs to sneak in.
The extra condition had been necessary before FStr made memory
management simpler.
The Coverity annotation got out-of-date when the parameter was converted
to FStr since that type is not allocated on the heap, only its inner
members are.
No functional change.
Breaking the loop once for depth == 0 and once for depth == 1 was
unnecessarily confusing, as was the nested 'if'. Start counting with 0
since there is no reason to start at 1.
Evaluating the common subexpression '*p == endc' is left as an exercise
to the compiler.
No functional change.
In the expression ${:U}, the variable name is empty. Since these
expressions are generated by .for loops, the error messages for them
must not end with a trailing space. Putting the variable name in quotes
helps against that.
Replace "variable specification" with the more modern "variable
expression", reduce the number of parentheses, output more than a single
character for modifiers, make it obvious that in expressions such as
${:Serror}, the "" means a variable name.
Back in 1995, the modifiers were all single-character, and it made sense
to print only the first character. Nowadays, with ':S', ':@var@...@',
'::=' and several others, a little more context is useful to see where
the exact error is. The actual modifier is still guessed, and the guess
may be wrong as soon as backslashes get involved, but it is still better
than before.
For a very long time now, I had thought that it would be impossible to
undefine global variables during the evaluation of variable expressions.
This is something that the memory management in Var_Parse relies upon,
see the comment 'the value of the variable must not change'.
After several unsuccessful attempts at referring to an already freed
previous value of a variable, today I discovered how to unset a global
variable while evaluating an expression, which has the same effect. To
demonstrate that this use-after-free can reliably crash make, it would
need a memory allocator with a debug mode that never re-allocates the
same memory block after it has been used once. This is something that
jemalloc cannot do at the moment. Valgrind would be another idea, but
that has not been ported to NetBSD.
Undefining a global variable while evaluating an expression is made
possible by an implementation detail of the modifier ':@'. That
modifier undefines the loop variable, without restoring its previous
value, see ApplyModifier_Loop.
By the very old conventions of ODE Make, these loop variables are named
'.V.' and thus do not conflict with variables from other naming
conventions. In NetBSD and pkgsrc, these loop variables are typically
called 'var', sometimes '_var' with a leading underscore, which also
doesn't conflict with the typical form 'VAR' of variables in the global
namespace. Therefore, in practice these loop variables don't interfere
with other variables.
One case that can practically arise is when an outer variable has a
modifier ':@word@${VAR.${word}}@' and one of the referenced variables
uses the same variable name in the modifier, see varmod-loop.mk 1.10
line 91 for a detailed explanation.
By using the ${:@VAR@@} modifier in a place that is evaluated with
cmdline scope, it is not only possible to undefine global variables, it
is possible to undefine cmdline variables as well. When evaluated in a
specific make target, the expression ${:@\@@@} can even be used to
undefine the variable '.TARGET', which will probably crash make with an
assertion failure.
The variable name 'arg' was misleading since after a successful
TryParseTime, it would no longer point to the argument of the variable
modifier, but to the _end_ of the argument. To reduce confusion, use p
instead, like everywhere else. This name is less specific, which is
still better than a wrong name.
Addition is easier than subtraction, and the expression 'word + wordLen'
obviously means 'the end of the word', which was not as easy to spot
before.
No functional change.
These variables all belong to a string variable. Connect them using
FStr, which reduces the number of variables to keep track of.
No functional change.
There was only a single case where this parameter was false. Inline
that case. That was the only case that needed the return value, so
remove that as well.
When
1. there is a global variable containing a dollar in its expanded name
(very unlikely since there are lots of undocumented edge cases that
make variable names containing dollar signs fragile), and
2. after that (unlikely since that requires .MAKEFLAGS instead of a
normal command line)
3. there is a command line variable of the same name (again very
unlikely since that variable name would contain a dollar sign as
well in the expanded form),
the global variable would not be undefined as promised by the comments
since its name was expanded once more than intended.
Because of the two 'very unlikely' above, this edge case hopefully does
not affect any practical use cases.
Note that this is not about VAR.${param} (which has a dollar sign in its
unexpanded form), but about the case where param itself would expand to
a dollar sign, such as after param=$$.
This is a preparation to extract the code for exporting a cmdline
variable. That code differs in several details from the other code in
ExportVar.
No functional change.
Make prevents global variables from being or becoming visible when a
command line variable of the same name is already defined.
There is a double safety net here. Even if the call to Var_DeleteExpand
were removed, there would be no noticeable effect, other than one less
line in the debug log.
No functional change.
On NetBSD 8.0 it still worked. Maybe gcov doesn't support .c files as
arguments anymore. Using the .gcda files works and is more reliable
anyway since it covers the inline functions in the headers as well.
It is possible that the type name 'array[unknown_size]' may spill into
the user-visible diagnostics. The current test suite does not cover
such a case. Anyway, saying 'array[unknown_size]' is still better than
saying 'array[0]', which would be misleading.
In initstack_pop_nobrace, if anything happens to the initstack, it will
be logged by initstack_pop_item.
In init_using_expr, the address of the node is irrelevant, the node's
contents has already been logged above.
Before, the caller was responsible for initializing the return values
from the function. This was an unexpected burden.
Ensure that in each branch that returns true, both return values are
properly set.
Strangely, the only caller of that function, init_using_expr, uses
neither of the return values. It just tests whether the expression is
constant or not.
No functional change.
The debug logging contained much redundant information and was
misleading in a few places. For example, "pop" did not actually pop an
item, plus there are several things that could be popped, so that didn't
help either.
Sprinkle some comments in places where the code needs to become clearer.
No functional change outside debug mode. The condition
'istk->i_remaining >= 0' was redundant due to the assertion directly
above it.
The longer name is more expressive and more correct. The previous name
called each stack element a stack itself, which was unnecessarily
confusing.
No functional change.
This is the central data structure of the initializations, it keeps
track of the objects that still need to be initialized. Seeing its
contents in debug mode helps in finding and understanding the still
incomplete C99 support.
The previous code accepted '# 123 "file.c" 23' as specifying a system
header, just because that number ends with '3'. The original intention
was to compare the complete word, not its suffix. Fix that.
No practical change since the only flags that are used by GCC are all
single-digit.
This also changes the conditions to their positive form, which is easier
to read.
No functional change. The resulting binary would have been the same as
before, were it not for the changed line numbers in the lint_assert
calls further down in the code.
Internally the code confuses the concept of "the user doesn't want
a backup file" and "the user hasn't defined a type of backup file".
Introduce a new "undefined" backup type to serve the purpose "none"
previously did, and make "none" not generate backup files, as expected.
http://mail-index.netbsd.org/tech-userlevel/2021/02/19/msg012901.html
XXX pullup?