Implementation largely obtained from FreeBSD, with adaptations to meet the
needs and style of this sh, some updates to agree with the current POSIX spec,
and a few other minor changes.
The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 )
[see note 2809 for the current proposed text] is yet to be approved,
so might change. It currently leaves several aspects as unspecified,
this implementation handles those as:
Where more than 2 hex digits follow \x this implementation processes the
first two as hex, the following characters are processed as if the \x
sequence was not present. The value obtained from a \nnn octal sequence
is truncated to the low 8 bits (if a bigger value is written, eg: \456.)
Invalid escape sequences are errors. Invalid \u (or \U) code points are
errors if known to be invalid, otherwise can generate a '?' character.
Where any escape sequence generates nul ('\0') that char, and the rest of
the $'...' string is discarded, but anything remaining in the word is
processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd.
Differences from FreeBSD:
FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C,
but the current sh proposal differs.) reeBSD also continues consuming
as many hex digits as exist after \x (permitted by the spec, but insane),
and reject \u0000 as invalid). Some of this is possibly because that
their implementation is based upon an earlier proposal, perhaps note 590 -
though that has been updated several times.
Differences from the current POSIX proposal:
We currently always generate UTF-8 for the \u & \U escapes. We should
generate the equivalent character from the current locale's character set
(and UTF8 only if that is what the current locale uses.)
If anyone would like to correct that, go ahead.
We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate
the appropriate control character (SOH for \cA for example) with whatever
value that has in the current character set. Apart from EBCDIC, which
we do not support, I've never seen a case where they differ, so ...
Avoid mangling history when editing is enabled, and the prompt contains a \n
Also, allow empty input lines into history when they are being appended to
a previous (partial) command (but not when they would just make an empty entry).
For all the gory details, see the PR.
Note nothing here actually makes prompts containing \n work correctly
when editing is enabled, that's a libedit issue, which will be addressed
some other time.
Don't ignore unexpected reserved words after ';'
Don't allow any random token type as a case stmt pattern, only a word.
Those are ancient ash bugs and do not affect correct scripts.
Don't ignore redirects in a case stmt list where the list is nothing but
redirects (if the pattern matches, the redirects should be performed).
That was introduced when a redirect only case stmt list was allowed
(older shells had generated a syntax error.)
Random cleanups/refactoring taken from or inspired by the FreeBSD sh
parser ... use makename() consistently to create a NARG node - we
were using it in a couple of places but most NARG node creation was open
coded. Introduce consumetoken() (from FreeBSD) to handle the fairly
common case where exactly one token type must come next, and we need to
check that, and skip past it when found (or error) and linebreak() (new)
to handle places where optional \n's are permitted.
Both previously open coded.
Simplify list() by removing its second arg, which was only ever used when
handling the end of a `` (old style command substitution). Simply move
the code from inside list() to just after its call in the `` case (from
FreeBSD.)
(operators all come first, then TWORD, then keywords), and switch
from using TIF to define KWDOFFSET to using TWORD (the barrier,
rather than the token that happens to be first after it.)
to cause (when set, which it is not by default) the exit status of a
pipe to be 0 iff all commands in the pipe exited with status 0, and
otherwise, the status of the rightmost command to exit with a non-0
status.
In the doc, while describing this, also reword some of the text about
commands in general, how they are structured, and when they are executed.
by rudolf at eq.cz on tech-userlevel (July 15, 2017.)
Also correct a typo, de-correct some entirely proper English so
the doc remains written in American instead. And note that
interactive mode is set when stdin & stderr are terminals, not
stding and stdout.
Absent other information, the shell should be interactive if reading
from stdin, and stdin and stderr are ttys, not stdin and stdout.
So sayeth the great lord posix.
Silence nuisance testing environments - avoid << of a negative number
(a signed char -- in a hash function, the result is irrelevant, as long
as it is repeatable).
ALso, cause exec failures to always cause the shell to exit with
status 126 or 127, whatever the cause. 127 is intended for lookup
failures (and is used that way), 126 is used for anything else that
goes wrong (as in several other shells.) We no longer use 2 (more easily
confused with an exit status of the command exec'd) for shell exec failures.
the new format. Also #if 0 a function definition that is used nowhere.
While here, change the function of pushfile() slightly - it now sets
the buf pointer in the top (new) input descriptor to NULL, instead of
simply leaving it - code that needs a buffer always (before and after)
must malloc() one and assign it after the call. But code which does not
(which will be reading from a string or similar) now does not have to
explicitly set it to NULL (cleaner interface.) NFC intended (or observed.)
configure script, ie: $(( which is intended to be a sub-shell in a
command substitution, but is an arith subst instead, it needs to be
written $( ( to do as intended. Instead of just blindly carrying on to
find the missing )) somewhere, anywhere, give up as soon as we have seen
an unbalanced ')' that isn't immediately followed by another ')' which
in a valid arith subst it always would be.
While here, there has been a comment in the code for quite a while noting a
difference in the standard between the text descr & grammar when it comes to
the syntax of case statements. Add more comments to explain why parsing it
as we do is in fact definitely the correct way (ie: the grammar wins arguments
like this...).
in prompts when expanded at prompt time, but all available for general use.
Many of the new ones are not available in SMALL shells (they work as normal
if assigned, but the shell does not set or use them - and there is no magic
in a SMALL shell (usually for install media.))
This fallback code wouldn't work anyway.
times(3) is an obsolete interface by getrusage(2) and gettimeofday(2).
In future it will be swiched to more modern interfaces.
No functional change intended.