Commit Graph

350 Commits

Author SHA1 Message Date
rillig 9115a16335 indent: fix missing blank before binary operator 2021-10-29 21:56:36 +00:00
rillig 9261ba3854 indent: in debug mode, log only differences for most ps members 2021-10-29 21:31:29 +00:00
rillig a7188a1bf1 indent: add detailed debug logging for the parser state 2021-10-29 21:22:05 +00:00
rillig f9d6328469 indent: merge isblank and is_hspace into ch_isblank
No functional change.
2021-10-29 20:27:42 +00:00
rillig 8add9435a6 indent: replace segmentation fault with assertion 2021-10-29 20:05:58 +00:00
rillig d271515dfe indent: parse options in a platform-independent way
Previously, on an ILP32 platform, the option '-ts30000000000000000'
resulted in the error message 'must be an integer', on LP64 platforms it
resulted in the error message 'must be between 1 and 80'. Remove this
unnecessary difference.
2021-10-29 19:52:59 +00:00
rillig 861a8afcf9 indent: initialize 'ps' via code
This saves 3 kB of binary size since the parser state is rather large
and only very few members are initialized to non-zero values.

No functional change.
2021-10-29 19:31:24 +00:00
rillig b9453ae00b indent: clean up main_init_globals
No functional change.
2021-10-29 19:22:55 +00:00
rillig e710fb7cfc indent: fix undefined behavior in buffer handling
Adding an arbitrary integer to a pointer may result in an out of bounds
pointer, so replace the addition with a pointer subtraction.

In the buffer handling functions, handle 'buf' and 'l' before 's' and
'e', since they are pairs.

In inbuf_read_line, use 's' instead of 'buf' to make the code easier to
understand for human readers.

No functional change.
2021-10-29 19:12:48 +00:00
rillig 1759bd2fc2 indent: mark obviously broken code 2021-10-29 18:50:52 +00:00
rillig 94bded1f14 indent: reorder global variables to be more intuitive
The buffer 'inp' comes first. From there, a single token is read into
the buffer 'token'. From there, it usually ends up in 'code'. The buffer
'token' does not belong to the group of the other 3 buffers, which
together make up a line of formatted output.

No functional change.
2021-10-29 18:18:03 +00:00
rillig 1c800bc064 indent: use prev/curr/next to refer to the current token
The word 'last' just didn't match with 'next'.

No functional change.
2021-10-29 17:50:37 +00:00
rillig 4f8fa1b77b indent: group members of parser_state by topic
No functional change.
2021-10-29 17:41:56 +00:00
rillig e741110689 indent: rename ps.dumped_decl_indent and indent_declaration
The word 'dump' in 'ps.dumped_decl_indent' was too close to dump_line,
which led to confusion since the variable controls whether the
indentation has been added to the code buffer, which happens way before
actually dumping the current line to the output file.

The function name 'indent_declaration' was too unspecific, it did not
reveal where the indentation of the declaration actually happened.

No functional change.
2021-10-29 17:32:22 +00:00
rillig a28555e377 indent: keep p_l_follow nonnegative, use consistent comparison
No functional change.
2021-10-29 16:59:35 +00:00
rillig 8c5268b68e indent: spell 'parentheses' properly in messages and comments 2021-10-29 16:54:51 +00:00
rillig 534324ea5b indent: clean up indentation, comments, reduce
No functional change.
2021-10-28 22:20:08 +00:00
rillig da03beab3b indent: remove unused local variable in lexi
Since the previous commit, lexi is always called with the same argument,
so remove that parameter.

The previous commit broke the debug logging by not printing "transient
state" anymore. Replace this with "rolled back parser state" at the
caller's site.

No functional change.
2021-10-28 22:06:23 +00:00
rillig f6487c67ee indent: reduce negations in search_stmt_lookahead
No functional change.
2021-10-28 21:56:26 +00:00
rillig 49e9bdaa70 indent: clean up comments and function names
Having accurate names for the lexer symbols and the parser symbols makes
most of the comments redundant. Remove these.

Rename process_decl to process_type, to match the name of the
corresponding lexer symbol. In this phase, it's just a single type
token, not a whole declaration.

No functional change.
2021-10-28 21:51:43 +00:00
rillig 31ac01f81d indent: fix error message for buffer overflow during option parsing
At this early time, the input file has not been opened yet, so there is
no reason to output either the input file name or the line number.
2021-10-28 21:35:57 +00:00
rillig d89bec6832 indent: make error messages for option parsing more precise 2021-10-28 21:32:48 +00:00
rillig 499bc8b313 indent: parse option '-cli' strictly 2021-10-28 21:02:04 +00:00
rillig bdd9debcf6 indent: topologically sort functions
No functional change.
2021-10-28 20:49:36 +00:00
rillig 486efd1088 indent: change product name, update version number
NetBSD's indent has deviated enough from FreeBSD's indent to warrant a
different product name. When indent was copied from FreeBSD in 2019,
that update introduced several new bugs, some of which have been fixed
in the NetBSD version.

NetBSD indent, unlike FreeBSD indent, supports C99 comments and C99
initializer designators.
2021-10-28 20:31:17 +00:00
rillig 49d4a9675b indent: fix indentation of local variable declarations
This had been broken since the import of FreeBSD indent in 2019.
2021-10-27 00:04:51 +00:00
rillig f2bfa6df8e indent: clean up process_comment
There is no undefined behavior since the compared characters are always
from the basic execution character set. All other cases are covered by
the condition above for now_len.

Fix debug logging for non-ASCII characters, previously a character was
output as \xffffffc3.
2021-10-26 21:37:27 +00:00
rillig 3d85947add indent: reduce indentation in process_comment
No functional change.
2021-10-26 21:23:52 +00:00
rillig d4b3b19ada indent: make reformatting of comments simpler
No functional change.
2021-10-26 21:04:03 +00:00
rillig 3ce143e32b indent: make ps.keyword easier to understand
Previously, ps.keyword did not have any documentation and was not
straight-forward. In some cases it was reset to kw_0, in others it was
set to an interesting value. The idea behind it was to remember the kind
of word of the previous token, to decide whether to have a space between
sizeof or offsetof and a following '('.

No functional change.
2021-10-26 20:43:35 +00:00
rillig 0d39b02648 indent: fix debug logging
The parser state is not always 'ps', so the debug logging must use the
correct state as well.
2021-10-26 20:17:42 +00:00
rillig 65460602c1 indent: run indent on its own source code
With manual corrections afterwards, to compensate for the remaining bugs
in indent.

Without the type definitions in .indent.pro, the opening braces of the
functions kw_name and lexi_alnum would not be at the beginning of the
line.
2021-10-26 19:36:30 +00:00
rillig 48417ce252 indent: merge duplicate code in lexi_alnum 2021-10-26 18:36:25 +00:00
rillig e6dc41fbfc indent: improve debug logging
Output the various details in chronological order.
2021-10-25 21:33:24 +00:00
rillig bb9d519138 indent: do not output token in debug mode
When the parse stack is manipulated, the text of the token is not
relevant anymore and may even be confusing, for example when parsing
if_expr, the token may contain "}".
2021-10-25 20:32:38 +00:00
rillig 0394a46876 indent: rename search_brace to search_stmt
No functional change.
2021-10-25 19:56:03 +00:00
rillig 87cf165700 indent: rename local variable sp_sw to spaced_expr
The 'sp' probably meant 'space-enclosed'; no idea what 'sw' was meant to
mean. Maybe 'switch', but that would have been rather ambiguous when
talking about control flow statements.

No functional change.
2021-10-25 01:06:13 +00:00
rillig 294e9d799c indent: split type token_type into 3 separate types
Previously, token_type was used for 3 different purposes:

1. symbol types from the lexer
2. symbol types on the parser stack
3. kind of control statement for 'if (expr)' and similar statements

Splitting the 41 constants into separate types makes it immediately
clear that the parser stack never handles comments, preprocessing lines,
newlines, form feeds, the inner structure of expressions.

Previously, the constant switch_expr was especially confusing since it
was used for 3 different purposes: when returned from lexi, it
represented the keyword 'switch', in the parser stack it represented
'switch (expr)', and it was used for a statement head as well.

The only overlap between the lexer symbols and the parser symbols are
'{' and '}', and the keywords 'do' and 'else'. To increase confusion,
the constants of the previous token_type were in apparently random
order and before 2021, they had cryptic, highly abbreviated names.

No functional change.
2021-10-25 00:54:37 +00:00
rillig f33923b1d5 indent: rename form_feed to tt_lex_form_feed
No functional change.
2021-10-24 22:44:13 +00:00
rillig 9f86a545bb indent: split kw_for_or_if_or_while into separate constants
No functional change.
2021-10-24 22:38:20 +00:00
rillig 848e9c2333 indent: split kw_do_or_else into separate constants
It was unnecessarily confusing to have the token types keyword_do_else,
keyword_do and keyword_else at the same time, without any hint in what
they differed.

Some of the token types seem to be used by the lexer while others are
used in the parse stack. Maybe all token types can be partitioned into
these groups, which would suggest to use two different types for them.
And if not, it's still clearer to have this distinction in the names of
the constants.

No functional change.
2021-10-24 22:28:06 +00:00
rillig 24a6d2ea17 indent: rename seen_quest to quest_level
The new name aligns with other similar variables like ind_level,
case_ind_level and ifdef_level. The old name 'seen' is mainly used for
bool variables.

No functional change.
2021-10-24 20:57:11 +00:00
rillig 50ae70092f indent: define lexi_end as function instead of macro 2021-10-24 20:47:00 +00:00
rillig 471ee99278 indent: fix indentation of ad-hoc tagged variables
Seen among others in usr.bin/indent/lexi.c, variable 'keywords'.
2021-10-24 20:43:27 +00:00
rillig fd452b015e indent: initialize variables in main_loop in declaration
No functional change.
2021-10-24 19:33:26 +00:00
rillig 3acd5f6fd1 indent: run indent on its own source code
With manual corrections afterwards. Indent still does not get
extra_expr_indent correctly, it also indents global variables after
tagged declarations too deep.

No functional change.
2021-10-24 19:14:33 +00:00
rillig ccfddf6ada indent: clean up format of warnings and errors
Previously, warnings and errors had the form of C block comments. Before
NetBSD io.c 1.20 from 2019-10-19, this format made sense because the
diagnostics could end up in the same output stream as the formatted
output.

Since NetBSD io.c 1.20 from 2019-10-19, all diagnostics are redirected
to stderr. This change was not mentioned in the commit message back
then, it makes sense nevertheless. Since stdout and stderr now are
properly separated, there is no need anymore to keep the weird format
for warnings and errors. Switch to the standard 'error: file:line'
format.

Move the function 'diag' to indent.c to have access to the name of the
current input file.
2021-10-24 17:19:48 +00:00
rillig 08637ac7ec indent: fix line number counting at beginning of function body 2021-10-24 16:51:44 +00:00
rillig 2ae20d0041 indent: rename nitems to array_length 2021-10-24 11:19:25 +00:00
rillig 7016f86c95 indent: replace global variable use_ff with function parameter 2021-10-24 11:17:05 +00:00