matches the internal CTL* chars.
The earlier fixes handled CTL* char values in var expansions,
but not in various other places they can occur (positional
parameters, $@ $* -- even potentially $0 and ~ expansions,
as well as byte strings generated from a \u in a $'' string).
These should all be correctly handled now. There is a new
ISCTL() macro to make the test, rather than using the old
BASESYNTAX[c]==CCTL form (which us still a viable alternative)
as the new way allows compiler optimisations, and less mem
references, so it should be smaller and faster.
Also, be sure in all cases to remove any CTLESC (or other)
CTL* chars from all strings before they are made available
for any external use (there was one case missed - which didn't
matter when we weren't bothering to escape the CTL* chars at
all.)
XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes.
to have them, they should work as documented, not cause core
dumps, reference after free, incorrect replacements, failing
to implement alias after alias, ...
The big comment that ended:
This is a good idea ------- ***NOT***
and the hack it was describing are gone.
Note that most of this was from original CVS version 1.1
code (ie: came from the original import, even before 4.4-Lite
was merged. That is, May 1994. And no-one in 24.5 years
noticed (or at least complained about) all the bugs (or at
least, most of them)).
With these changes, aliases ought to work (if you can call it
that) as they are expected to by POSIX. Now if only we could
get POSIX to delete them (or make them optional)...
Changes partly inspired by similar changes made by FreeBSD,
(as was the previous change to alias.c, forgot ack in commit
log for that one, apologies) but done a little differently,
and perhaps with a slightly better outcome.
to hide meta-characters in the result when the expansion was
in (double) quotes, and so should not be further processed.
Most of this has been OK for a long while, but \ needs hiding
as well, which complicates things, as \ cannot simply be hidden
in the syntax tables as one of the group of random special characters.
This was fixed earlier for simple variable expansions, but
every variety has its own code path ($var uses different code
than $n which is different than $(...), which is different
again from ~ expansions, and also from what $'...' produces).
This could be fixed by moving them all to a common code path,
but that's harder than it seems. The form in which the data
is made available differs, so one common routine would need
a whole bunch of different "get the next char or indicate end"
methods - probably via passing in an accessor function.
That's all a lot of churn, and would probably slow the shell.
Instead, just make macros for doing the standard tests, and
use those instead of open coding (differently) each time.
This way some of the code paths don't end up forgetting to
handle '\' (which is different than all the others).
This removes one optimisation ... when no escaping is needed
(like just $var (unquoted) where magic chars (think '*') in
the value are intended to remain magic), the code avoided doing
two tests for each char ("do we need escapes" and "is this char
one that needs escaping") by choosing two different syntax
tables (choice made outside the loop) - one of which never
returns the magic "needs escaping" result, and the other does
when appropriate, and then just avoiding the "do we need escapes"
test for each character processed. Then when '\' was fixed,
there needed to be another test for it, as it cannot (for other
reasons) be the same as all the others for which "this char
need escaping" is true. So that added a 2nd test for each char...
Not all the code paths were updated. Hence the bugs...
nb: this is all rarely seen in the wild, so it is no big
surprised that no-one ever noticed.
Now the "use two different syntax tables" is gone (the two
returned the same for '\' which is why '\' needed special
processing) - and in order to avoid two tests for each
char (plus the \ test) we duplicate the loops, one of which
tests each char to see if it needs an escape, the 2nd just
copies them. This should be faster in the "no escapes"
code path (though that is not the point) and perhaps also
in the "escapes needed" path (no indirect reference to
the syntax table - though that would probably be in a
register) but makes the code slightly bigger. For /bin/sh
the text segment (on amd64) has grown by 48 bytes. But
it still uses the same number of 512 byte pages (and hence
also any bigger page size). The resulting file size
(/bin/sh) is identical before and after. So is /rescue/sh
(or /rescue/anything-else).
Implementation largely obtained from FreeBSD, with adaptations to meet the
needs and style of this sh, some updates to agree with the current POSIX spec,
and a few other minor changes.
The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 )
[see note 2809 for the current proposed text] is yet to be approved,
so might change. It currently leaves several aspects as unspecified,
this implementation handles those as:
Where more than 2 hex digits follow \x this implementation processes the
first two as hex, the following characters are processed as if the \x
sequence was not present. The value obtained from a \nnn octal sequence
is truncated to the low 8 bits (if a bigger value is written, eg: \456.)
Invalid escape sequences are errors. Invalid \u (or \U) code points are
errors if known to be invalid, otherwise can generate a '?' character.
Where any escape sequence generates nul ('\0') that char, and the rest of
the $'...' string is discarded, but anything remaining in the word is
processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd.
Differences from FreeBSD:
FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C,
but the current sh proposal differs.) reeBSD also continues consuming
as many hex digits as exist after \x (permitted by the spec, but insane),
and reject \u0000 as invalid). Some of this is possibly because that
their implementation is based upon an earlier proposal, perhaps note 590 -
though that has been updated several times.
Differences from the current POSIX proposal:
We currently always generate UTF-8 for the \u & \U escapes. We should
generate the equivalent character from the current locale's character set
(and UTF8 only if that is what the current locale uses.)
If anyone would like to correct that, go ahead.
We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate
the appropriate control character (SOH for \cA for example) with whatever
value that has in the current character set. Apart from EBCDIC, which
we do not support, I've never seen a case where they differ, so ...
the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)).
(Code to invert the LINENO hack when required, like when de-compiling the
execution tree to provide the "jobs" command strings, is still included,
that can be deleted when the LINENO hack is completely removed - look for
refs to VSLINENO throughout the code. The var funclinno in parser.c can
also be removed, it is used only for the LINENO hack.)
This version produces accurate results: $((LINENO)) was made as accurate
as the LINENO hack made ${LINENO} which is very good. That's why the
LINENO hack is not yet completely removed, so it can be easily re-enabled.
If you can tell the difference when it is in use, or not in use, then
something has broken (or I managed to miss a case somewhere.)
The way that LINENO works is documented in its own (new) section in the
man page, so nothing more about that, or the new options, etc, here.
This version introduces the possibility of having a "reference" function
associated with a variable, which gets called whenever the value of the
variable is required (that's what implements LINENO). There is just
one function pointer however, so any particular variable gets at most
one of the set function (as used for PATH, etc) or the reference function.
The VFUNCREF bit in the var flags indicates which func the variable in
question uses (if any - the func ptr, as before, can be NULL).
I would not call the results of this perfect yet, but it is close.
compiled for DEBUG.)
Add debug builtin command, and corresponding -D command line option.
As usual, for DEBUG related stuff, read the source for info, that's
all there is about this.
This completes the infrastructure changes for the updated DEBUG TRACE
mechanism, so now converting the rest of the shell's internal tracing
can happen as desired - piecemeal.
(undoing the effect of that commit on syntax.h when it was
being dynamically generated) from 1996. This means that the shell
parser is now locale independent, so scripts that work anywhere will
work consistently everywhere. Inspired by a similar change in
FreeBSD's sh (from 2010) - the original change in the other direction
came from FreeBSD as well.... Note that this does not in any way
add any kind of support for locales to sh (which is a whole different
problem.) (from kre)
Kill mksyntax.c - no longer possible to get the 'wrong sort of chars'.
/bin/sh now has no helper binaries.
syntax.c uses C99 initialisers, run time initialisation could be used
for systems where the compiler doesn't support them.
I've used some #defines to help make this possible - but writing the code
starts making it rather messy.
Use CHAR_MIN (from limits.h) to determine whether target char are signed
or unsigned - the syntax tables will not be indexed properly.
Rip out all the stuff from mksyntax.c that wrote syntax.h.
syntax.c can stiff be generated incorrectly...