Previously, on an ILP32 platform, the option '-ts30000000000000000'
resulted in the error message 'must be an integer', on LP64 platforms it
resulted in the error message 'must be between 1 and 80'. Remove this
unnecessary difference.
This saves 3 kB of binary size since the parser state is rather large
and only very few members are initialized to non-zero values.
No functional change.
Adding an arbitrary integer to a pointer may result in an out of bounds
pointer, so replace the addition with a pointer subtraction.
In the buffer handling functions, handle 'buf' and 'l' before 's' and
'e', since they are pairs.
In inbuf_read_line, use 's' instead of 'buf' to make the code easier to
understand for human readers.
No functional change.
The buffer 'inp' comes first. From there, a single token is read into
the buffer 'token'. From there, it usually ends up in 'code'. The buffer
'token' does not belong to the group of the other 3 buffers, which
together make up a line of formatted output.
No functional change.
The word 'dump' in 'ps.dumped_decl_indent' was too close to dump_line,
which led to confusion since the variable controls whether the
indentation has been added to the code buffer, which happens way before
actually dumping the current line to the output file.
The function name 'indent_declaration' was too unspecific, it did not
reveal where the indentation of the declaration actually happened.
No functional change.
Since the previous commit, lexi is always called with the same argument,
so remove that parameter.
The previous commit broke the debug logging by not printing "transient
state" anymore. Replace this with "rolled back parser state" at the
caller's site.
No functional change.
Having accurate names for the lexer symbols and the parser symbols makes
most of the comments redundant. Remove these.
Rename process_decl to process_type, to match the name of the
corresponding lexer symbol. In this phase, it's just a single type
token, not a whole declaration.
No functional change.
NetBSD's indent has deviated enough from FreeBSD's indent to warrant a
different product name. When indent was copied from FreeBSD in 2019,
that update introduced several new bugs, some of which have been fixed
in the NetBSD version.
NetBSD indent, unlike FreeBSD indent, supports C99 comments and C99
initializer designators.
There is no undefined behavior since the compared characters are always
from the basic execution character set. All other cases are covered by
the condition above for now_len.
Fix debug logging for non-ASCII characters, previously a character was
output as \xffffffc3.
Previously, ps.keyword did not have any documentation and was not
straight-forward. In some cases it was reset to kw_0, in others it was
set to an interesting value. The idea behind it was to remember the kind
of word of the previous token, to decide whether to have a space between
sizeof or offsetof and a following '('.
No functional change.
With manual corrections afterwards, to compensate for the remaining bugs
in indent.
Without the type definitions in .indent.pro, the opening braces of the
functions kw_name and lexi_alnum would not be at the beginning of the
line.
When the parse stack is manipulated, the text of the token is not
relevant anymore and may even be confusing, for example when parsing
if_expr, the token may contain "}".
The 'sp' probably meant 'space-enclosed'; no idea what 'sw' was meant to
mean. Maybe 'switch', but that would have been rather ambiguous when
talking about control flow statements.
No functional change.
Previously, token_type was used for 3 different purposes:
1. symbol types from the lexer
2. symbol types on the parser stack
3. kind of control statement for 'if (expr)' and similar statements
Splitting the 41 constants into separate types makes it immediately
clear that the parser stack never handles comments, preprocessing lines,
newlines, form feeds, the inner structure of expressions.
Previously, the constant switch_expr was especially confusing since it
was used for 3 different purposes: when returned from lexi, it
represented the keyword 'switch', in the parser stack it represented
'switch (expr)', and it was used for a statement head as well.
The only overlap between the lexer symbols and the parser symbols are
'{' and '}', and the keywords 'do' and 'else'. To increase confusion,
the constants of the previous token_type were in apparently random
order and before 2021, they had cryptic, highly abbreviated names.
No functional change.
It was unnecessarily confusing to have the token types keyword_do_else,
keyword_do and keyword_else at the same time, without any hint in what
they differed.
Some of the token types seem to be used by the lexer while others are
used in the parse stack. Maybe all token types can be partitioned into
these groups, which would suggest to use two different types for them.
And if not, it's still clearer to have this distinction in the names of
the constants.
No functional change.
The new name aligns with other similar variables like ind_level,
case_ind_level and ifdef_level. The old name 'seen' is mainly used for
bool variables.
No functional change.
With manual corrections afterwards. Indent still does not get
extra_expr_indent correctly, it also indents global variables after
tagged declarations too deep.
No functional change.
Previously, warnings and errors had the form of C block comments. Before
NetBSD io.c 1.20 from 2019-10-19, this format made sense because the
diagnostics could end up in the same output stream as the formatted
output.
Since NetBSD io.c 1.20 from 2019-10-19, all diagnostics are redirected
to stderr. This change was not mentioned in the commit message back
then, it makes sense nevertheless. Since stdout and stderr now are
properly separated, there is no need anymore to keep the weird format
for warnings and errors. Switch to the standard 'error: file:line'
format.
Move the function 'diag' to indent.c to have access to the name of the
current input file.