ecpg: clean up documentation of parse.pl, and add more input checking.

README.parser is the user's manual, such as it is, for parse.pl.
It's rather poorly written if you ask me; so try to improve it.
(More could be written here, but this at least covers the same
info in a more organized fashion.)

Also, the single solitary line of usage info in parse.pl itself
was a lie.  Replace.

Add some error checks that the ecpg.addons entries meet the syntax
rules set forth in README.parser.  One of them didn't, but
accidentally worked anyway because the logic in include_addon is
such that 'block' is the default behavior.

Also add a cross-check that each ecpg.addons entry is matched exactly
once in the backend grammar.  This exposed that there are two dead
entries there --- they are dead because the %replace_types table in
parse.pl causes their nonterminals to be ignored altogether.
Removing them doesn't change the generated preproc.y file.

(This implies that check_rules.pl is completely worthless and should
be nuked: it adds build cycles and maintenance effort while failing
to reliably accomplish its one job of detecting dead rules.  I'll
do that separately.)

Discussion: https://postgr.es/m/2011420.1713493114@sss.pgh.pa.us
This commit is contained in:
Tom Lane 2024-10-14 13:29:36 -04:00
parent 7be4ba4a9d
commit 00b0e7204d
3 changed files with 121 additions and 63 deletions

View File

@ -1,42 +1,77 @@
ECPG modifies and extends the core grammar in a way that
1) every token in ECPG is <str> type. New tokens are
defined in ecpg.tokens, types are defined in ecpg.type
2) most tokens from the core grammar are simply converted
to literals concatenated together to form the SQL string
passed to the server, this is done by parse.pl.
3) some rules need side-effects, actions are either added
or completely overridden (compared to the basic token
concatenation) for them, these are defined in ecpg.addons,
the rules for ecpg.addons are explained below.
4) new grammar rules are needed for ECPG metacommands.
These are in ecpg.trailer.
5) ecpg.header contains common functions, etc. used by
actions for grammar rules.
ECPG's grammar (preproc.y) is built by parse.pl from the
backend's grammar (gram.y) plus various add-on rules.
Some notes:
In "ecpg.addons", every modified rule follows this pattern:
ECPG: dumpedtokens postfix
where "dumpedtokens" is simply tokens from core gram.y's
rules concatenated together. e.g. if gram.y has this:
ruleA: tokenA tokenB tokenC {...}
then "dumpedtokens" is "ruleAtokenAtokenBtokenC".
"postfix" above can be:
a) "block" - the automatic rule created by parse.pl is completely
overridden, the code block has to be written completely as
it were in a plain bison grammar
b) "rule" - the automatic rule is extended on, so new syntaxes
are accepted for "ruleA". E.g.:
ECPG: ruleAtokenAtokenBtokenC rule
| tokenD tokenE { action_code; }
...
It will be substituted with:
ruleA: <original syntax forms and actions up to and including
"tokenA tokenB tokenC">
| tokenD tokenE { action_code; }
...
c) "addon" - the automatic action for the rule (SQL syntax constructed
from the tokens concatenated together) is prepended with a new
action code part. This code part is written as is's already inside
the { ... }
1) Most input matching core grammar productions is simply converted
to strings and concatenated together to form the SQL string
passed to the server. parse.pl can automatically build the
grammar actions needed to do this.
2) Some grammar rules need special actions that are added to or
completely override the default token-concatenation behavior.
This is controlled by ecpg.addons as explained below.
3) Additional grammar rules are needed for ECPG's own commands.
These are in ecpg.trailer, as is the "epilogue" part of preproc.y.
4) ecpg.header contains the "prologue" part of preproc.y, including
support functions, Bison options, etc.
5) Additional terminals added by ECPG must be defined in ecpg.tokens.
Additional nonterminals added by ECPG must be defined in ecpg.type.
Multiple "addon" or "block" lines may appear together with the
new code block if the code block is common for those rules.
ecpg.header, ecpg.tokens, ecpg.type, and ecpg.trailer are just
copied verbatim into preproc.y at appropriate points.
ecpg.addons contains entries that begin with a line like
ECPG: concattokens ruletype
and typically have one or more following lines that are the code
for a grammar action. Any line not starting with "ECPG:" is taken
to be part of the code block for the preceding "ECPG:" line.
"concattokens" identifies which gram.y production this entry affects.
It is simply the target nonterminal and the tokens from the gram.y rule
concatenated together. For example, to modify the action for a gram.y
rule like this:
target: tokenA tokenB tokenC {...}
"concattokens" would be "targettokenAtokenBtokenC". If we want to
modify a non-first alternative for a nonterminal, we still write the
nonterminal. For example, "concattokens" should be "targettokenDtokenE"
to affect the second alternative in:
target: tokenA tokenB tokenC {...}
| tokenD tokenE {...}
"ruletype" is one of:
a) "block" - the automatic action that parse.pl would create is
completely overridden. Instead the entry's code block is emitted.
The code block must include the braces ({}) needed for a Bison action.
b) "addon" - the entry's code block is inserted into the generated
action, ahead of the automatic token-concatenation code.
In this case the code block need not contain braces, since
it will be inserted within braces.
c) "rule" - the automatic action is emitted, but then the entry's
code block is added verbatim afterwards. This typically is
used to add new alternatives to a nonterminal of the core grammar.
For example, given the entry:
ECPG: targettokenAtokenBtokenC rule
| tokenD tokenE { custom_action; }
what will be emitted is
target: tokenA tokenB tokenC { automatic_action; }
| tokenD tokenE { custom_action; }
Multiple "ECPG:" entries can share the same code block, if the
same action is needed for all. When an "ECPG:" line is immediately
followed by another one, it is not assigned an empty code block;
rather the next nonempty code block is assumed to apply to all
immediately preceding "ECPG:" entries.
In addition to the modifications specified by ecpg.addons,
parse.pl contains some tables that list backend grammar
productions to be ignored or modified.
Nonterminals that construct strings (as described above) should be
given <str> type, which is parse.pl's default assumption for
nonterminals found in gram.y. That can be overridden at need by
making an entry in parse.pl's %replace_types table. %replace_types
can also be used to suppress output of a nonterminal's rules
altogether (in which case ecpg.trailer had better provide replacement
rules, since the nonterminal will still be referred to elsewhere).

View File

@ -497,7 +497,7 @@ ECPG: opt_array_boundsopt_array_bounds'['']' block
$$.index2 = mm_strdup($3);
$$.str = cat_str(4, $1.str, mm_strdup("["), $3, mm_strdup("]"));
}
ECPG: opt_array_bounds
ECPG: opt_array_bounds block
{
$$.index1 = mm_strdup("-1");
$$.index2 = mm_strdup("-1");
@ -510,15 +510,6 @@ ECPG: IconstICONST block
ECPG: AexprConstNULL_P rule
| civar { $$ = $1; }
| civarind { $$ = $1; }
ECPG: ColIdcol_name_keyword rule
| ECPGKeywords { $$ = $1; }
| ECPGCKeywords { $$ = $1; }
| CHAR_P { $$ = mm_strdup("char"); }
| VALUES { $$ = mm_strdup("values"); }
ECPG: type_function_nametype_func_name_keyword rule
| ECPGKeywords { $$ = $1; }
| ECPGTypeName { $$ = $1; }
| ECPGCKeywords { $$ = $1; }
ECPG: VariableShowStmtSHOWALL block
{
mmerror(PARSE_ERROR, ET_ERROR, "SHOW ALL is not implemented");

View File

@ -1,7 +1,13 @@
#!/usr/bin/perl
# src/interfaces/ecpg/preproc/parse.pl
# parser generator for ecpg version 2
# call with backend parser as stdin
# parser generator for ecpg
#
# See README.parser for some explanation of what this does.
#
# Command-line options:
# --srcdir: where to find ecpg-provided input files (default ".")
# --parser: the backend gram.y file to read (required, no default)
# --output: where to write preproc.y (required, no default)
#
# Copyright (c) 2007-2024, PostgreSQL Global Development Group
#
@ -148,6 +154,14 @@ dump_buffer('trailer');
close($parserfh);
# Cross-check that we don't have dead or ambiguous addon rules.
foreach (keys %addons)
{
die "addon rule $_ was never used\n" if $addons{$_}{used} == 0;
die "addon rule $_ was matched multiple times\n" if $addons{$_}{used} > 1;
}
sub main
{
line: while (<$parserfh>)
@ -487,7 +501,10 @@ sub include_addon
my $rec = $addons{$block};
return 0 unless $rec;
my $rectype = (defined $rec->{type}) ? $rec->{type} : '';
# Track usage for later cross-check
$rec->{used}++;
my $rectype = $rec->{type};
if ($rectype eq 'rule')
{
dump_fields($stmt_mode, $fields, ' { ');
@ -668,10 +685,10 @@ sub dump_line
}
=top
load addons into cache
load ecpg.addons into %addons hash. The result is something like
%addons = {
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ] },
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ] }
stmtClosePortalStmt => { 'type' => 'block', 'lines' => [ "{", "if (INFORMIX_MODE)" ..., "}" ], 'used' => 0 },
stmtViewStmt => { 'type' => 'rule', 'lines' => [ "| ECPGAllocateDescr", ... ], 'used' => 0 }
}
=cut
@ -681,17 +698,25 @@ sub preload_addons
my $filename = $srcdir . "/ecpg.addons";
open(my $fh, '<', $filename) or die;
# there may be multiple lines starting ECPG: and then multiple lines of code.
# the code need to be add to all prior ECPG records.
my (@needsRules, @code, $record);
# There may be multiple "ECPG:" lines and then multiple lines of code.
# The block of code needs to be added to each of the consecutively-
# preceding "ECPG:" records.
my (@needsRules, @code);
# there may be comments before the first ECPG line, skip them
# there may be comments before the first "ECPG:" line, skip them
my $skip = 1;
while (<$fh>)
{
if (/^ECPG:\s(\S+)\s?(\w+)?/)
if (/^ECPG:\s+(\S+)\s+(\w+)\s*$/)
{
# Found an "ECPG:" line, so we're done skipping the header
$skip = 0;
# Validate record type and target
die "invalid record type $2 in addon rule for $1\n"
unless ($2 eq 'block' or $2 eq 'addon' or $2 eq 'rule');
die "duplicate addon rule for $1\n" if (exists $addons{$1});
# If we had some preceding code lines, attach them to all
# as-yet-unfinished records.
if (@code)
{
for my $x (@needsRules)
@ -701,20 +726,27 @@ sub preload_addons
@code = ();
@needsRules = ();
}
$record = {};
my $record = {};
$record->{type} = $2;
$record->{lines} = [];
if (exists $addons{$1}) { die "Ga! there are dups!\n"; }
$record->{used} = 0;
$addons{$1} = $record;
push(@needsRules, $record);
}
elsif (/^ECPG:/)
{
# Complain if preceding regex failed to match
die "incorrect syntax in ECPG line: $_\n";
}
else
{
# Non-ECPG line: add to @code unless we're still skipping
next if $skip;
push(@code, $_);
}
}
close($fh);
# Deal with final code block
if (@code)
{
for my $x (@needsRules)