Commit Graph

170 Commits

Author SHA1 Message Date
msaitoh 8012ca3f0e Remove extra semicolon. 2020-05-14 08:34:17 +00:00
kre 7af1d9b731 Correct a typo in a comment, 08x0 was meant to be 0x80 (duh!). NFC. 2019-12-10 09:18:37 +00:00
kre b7fc669e75 Fix an (apparent) ancient ash bug, that was apparently fixed sometime
in the past, but managed to re-surface...

The expression "${0+\}}" should expand to "}" not "\}"
Almost all other shells handle it that way (incl FreeBSD & dash).

Issue pointed out by Martijn Dekker.

Add ATF sub-tests for the 4 old var expand operators (${var+word}
${var-word} ${var-word} and ${var?word} - including the forms
with the ':' included) and amongst those tests include test cases
for this issue, so if the bug tries to appear again, we can squash
it quicker.   (The newer pattern matching operators are already
well tested as part of testing patterns.)
2019-05-04 02:52:22 +00:00
kre 256d645df3 Finish the fixes from Feb 4 for handling of random data that
matches the internal CTL* chars.

The earlier fixes handled CTL* char values in var expansions,
but not in various other places they can occur (positional
parameters, $@ $* -- even potentially $0 and ~ expansions,
as well as byte strings generated from a \u in a $'' string).

These should all be correctly handled now.   There is a new
ISCTL() macro to make the test, rather than using the old
BASESYNTAX[c]==CCTL form (which us still a viable alternative)
as the new way allows compiler optimisations, and less mem
references, so it should be smaller and faster.

Also, be sure in all cases to remove any CTLESC (or other)
CTL* chars from all strings before they are made available
for any external use (there was one case missed - which didn't
matter when we weren't bothering to escape the CTL* chars at
all.)

XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes.
2019-02-27 04:10:56 +00:00
kre 4d2988311a Add a check that the file descriptor mentioned in a N> or N< type
redirect operator is within range of what the code tree node can
hold.   Currently this is a no-op change (the new error can never
occur) as the code already checks that N is in range for an int
(and errors if not) and the field in the node in which we store N
is also an int, so we cannot overflow - but fd's do not really need
to be that big (the max a typical kernel supports is < 10000) so
this just adds validation in case it ever happens that we decide we
can save some node size (ie: sh memory) by making that field smaller.

Note this is parse time error detection, and has no bearing upon
the execution time error that will occur if a script attempts to use
an fd that exceeds the process's max fd limit.

NFCI (for now anyway.)
2019-02-09 09:50:31 +00:00
kre 4084f829ec PR bin/53919
Suppress shell error messages while expanding $ENV (which also causes
errors while expanding $PS1 $PS2 and $PS4 to be suppressed as well).

This allows any random garbage that happens to be in ENV to not
cause noise when the shell starts (which is effectively all it did).

On a parse error (for any of those vars) we also use "" as the result,
which will be a null prompt, and avoid attempting to open any file for ENV.

This does not in any way change what happens for a correctly parsed command
substitution (either when it is executed when permitted for one of the
prompts, or when it is not (which is always for ENV)) and commands run
from those can still produce error output (but shell errors remain suppressed).
2019-02-04 11:16:41 +00:00
kre 4e25d54034 lexical analysis fixes. This fixes the tests just committed in
src/tests/bin/sh/t_here.sh

The "magicq" magic was all wrong - it cannot be simply a parameter
to readtoken1() as its value needs to alter during that routine
(eg: when magicq is set - processing here doc text, or whatever)
and we encountered ${var%pattern} "magicq" needs to be off for
"pattern" - and it wasn't.

To handle this magicq needs to be included in the token stack struct,
and simply init'd from the arg to readtoken1 (which we rename).
Then it can be manipulated as required.

Once we no longer have that problem, some other issues can be cleaned
up as well (some of this unbelievably fragile code was attempting to
cope with this in various ad-hoc - and mostly broken - ways).

Also, remove the magicq parameter from parsebackq() - it was not
used (at all) and should never be, a command substitution, wherever
it appears, always starts a new parsing context.  How that applies
to old style command substitutions is less clear, but until we see
some real examples where we're not doing the right thing (slightly
less likely now than before ... nothing has changed here in the
way command substitutions are parsed, but quoting in general is
slightly better) I don't plan on worrying about it.

There are a couple of other minor cleanups, which make no actual
difference (like adding () around the use of the parameter in the
RETURN macro ... which is generally better, but makes no difference
here as the param is always a simple constant.

All the current ATF tests pass.
2019-01-22 14:32:17 +00:00
kre a672c6e148 NFCI - DEBUG mode only change.
Add tracing of lexical analyser operations.   This is deliberately
kept out of the normal "all on" set as it makes a *lot* of noise
when enabled (especially in verbose mode) - but when needed, it
helps (evidence for which is coming soon).

As usual, no doc, you need the sources (and of course, a specially
built sh to even be able to enable it.)
2019-01-22 13:48:28 +00:00
kre 9cef82b269 Fix an amazing crazy botch (of mine) when expanding prompt strings
(PS1 etc) which, if the shell were already exiting, and a prompt
were to be expanded (which only really happens if -x is enabled,
and an exit trap is set, so the commands in the trap need PS4
expanded and written, last thing, before the shell exits) the shell
would instead simply exit when it finished expanding PS4 (before
even writing it, or the xtrace output).

There were more conditions required to set up the environment for
this to actually occur (it seems to only happen when the exit trap
is set in a function, called in a command substitution) but that's
unimportant, the code was nonsense.

Problem noticed by Martijn Dekker.

XXX pullup -8
2019-01-21 14:24:44 +00:00
kre be7c7b5cbb pgetc_linecont() needs to use pgetc() rather than pgetc_macro()
so the fake char returned by the latter when an alias ends (which
is there so we can correctly avoid alias recursion) is correctly
ignored where it is not wanted.
2019-01-15 14:17:49 +00:00
kre a559cfeaae A similar fix to that added in 1.169 of eval.c, but here for when
processing command substitutions.   If there is an error while processing,
the any pending queued input should be discarded.   From FreeBSD.
2019-01-09 10:59:20 +00:00
kre e3847ee4a9 PR standards/42829
Implement parameter and arithmetic expansion of $ENV
before using it as the name of a file from which to
read startup commands for the shell.   This continues
to happen for all interactive shells, and non-interactive
shells for which the posix option is not set (-o posix).

On any actual error, or if an attempt is made to use
command substitution, then the value of ENV is used
unchanged as the file name.

The expansion complies with POSIX XCU 2.5.3, though that
only requires parameter expansion - arithmetic expansion
is an extension (but for us, it is much easier to do, than
not to do, and it allows some weird stuff, if you're so
inclined....)   Note that there is no ~ expansion (use $HOME).
2018-12-11 13:31:20 +00:00
christos eb31074ab5 comment out unused. 2018-12-09 17:33:38 +00:00
kre c9f333ad14 Yet another foray into the mysterious world of $@ -- this time
to fix the (unusual) idiom "${1+$@}"  (the quotes are part of it).
This seems to have broken about 5 or 6 years ago (somewhere
between -6 and -7), I believe.

Note this is not the same as "$@" and also not the same as ${1+"$@"}
(much more common idioms) which both worked.

Also attempt to deal with "" more correctly, especially when it
appears adjacent to "$@" (or one of the similar constructs.)

This stuff is still all as ugly and hackish (and fragile) as is
possible to imagine, but in an effort to allow some of the weirdness
to eventually go away, the parser output has been made more
regular and all quoted (parts of) words always now start with
CTLQUOTEMARK and end with CTLQUOTEEND regardless of where the
quotes appear.

This allows us to tell the difference between """$@" and "$@"
which was impossible before - yet they are required to generate
different output when there are no args (when "$@" simply vanishes).

Needless to say that change had ramifications all over the place.
To simplify any similar change in the future, there are some new
macros that can generally be used to detect the "noise" data when
processing words, rather than open coding that every time (which
meant that there would *always* be one which missed getting
updated...)

Several other bugs (of my making, and older ones) are also fixed.

The aim is that (aside from anything that is detecting the cases
that were broken before - which were all unlikely uses of sh
syntax) these changes should have no external visible impact.

Sure...
2018-12-03 06:41:30 +00:00
kre 021ba5091c Revamp aliases - as dumb an idea as they are, if we're going
to have them, they should work as documented, not cause core
dumps, reference after free, incorrect replacements, failing
to implement alias after alias, ...

The big comment that ended:
	  This is a good idea ------- ***NOT***
and the hack it was describing are gone.

Note that most of this was from original CVS version 1.1
code (ie: came from the original import, even before 4.4-Lite
was merged.   That is, May 1994.  And no-one in 24.5 years
noticed (or at least complained about)  all the bugs (or at
least, most of them)).

With these changes, aliases ought to work (if you can call it
that) as they are expected to by POSIX.   Now if only we could
get POSIX to delete them (or make them optional)...

Changes partly inspired by similar changes made by FreeBSD,
(as was the previous change to alias.c, forgot ack in commit
log for that one, apologies) but done a little differently,
and perhaps with a slightly better outcome.
2018-12-03 06:40:26 +00:00
kre 17e8cd6768 Rename the internal function "makename" to "makeword" to better reflect
what it actually does (makearg would have been an alternative).
While here, notice the one remaining place where it should have been
used, but was left open coded, and consume that one.

NFCI.
2018-12-01 07:02:23 +00:00
kre ae1a47886f NFC. Need a grain of const 2018-12-01 01:21:06 +00:00
kre df073671e8 Rationalise (slightly) the way that expansions are processed
to hide meta-characters in the result when the expansion was
in (double) quotes, and so should not be further processed.

Most of this has been OK for a long while, but \ needs hiding
as well, which complicates things, as \ cannot simply be hidden
in the syntax tables as one of the group of random special characters.

This was fixed earlier for simple variable expansions, but
every variety has its own code path ($var uses different code
than $n which is different than $(...), which is different
again from ~ expansions, and also from what $'...' produces).

This could be fixed by moving them all to a common code path,
but that's harder than it seems.  The form in which the data
is made available differs, so one common routine would need
a whole bunch of different "get the next char or indicate end"
methods - probably via passing in an accessor function.
That's all a lot of churn, and would probably slow the shell.

Instead, just make macros for doing the standard tests, and
use those instead of open coding (differently) each time.
This way some of the code paths don't end up forgetting to
handle '\' (which is different than all the others).

This removes one optimisation ... when no escaping is needed
(like just $var (unquoted) where magic chars (think '*') in
the value are intended to remain magic), the code avoided doing
two tests for each char ("do we need escapes" and "is this char
one that needs escaping") by choosing two different syntax
tables (choice made outside the loop) - one of which never
returns the magic "needs escaping" result, and the other does
when appropriate, and then just avoiding the "do we need escapes"
test for each character processed.   Then when '\' was fixed,
there needed to be another test for it, as it cannot (for other
reasons) be the same as all the others for which "this char
need escaping" is true.   So that added a 2nd test for each char...
Not all the code paths were updated.   Hence the bugs...

nb: this is all rarely seen in the wild, so it is no big
surprised that no-one ever noticed.

Now the "use two different syntax tables" is gone (the two
returned the same for '\' which is why '\' needed special
processing) - and in order to avoid two tests for each
char (plus the \ test) we duplicate the loops, one of which
tests each char to see if it needs an escape, the 2nd just
copies them.   This should be faster in the "no escapes"
code path (though that is not the point) and perhaps also
in the "escapes needed" path (no indirect reference to
the syntax table - though that would probably be in a
register) but makes the code slightly bigger.  For /bin/sh
the text segment (on amd64) has grown by 48 bytes.  But
it still uses the same number of 512 byte pages (and hence
also any bigger page size).  The resulting file size
(/bin/sh) is identical before and after.  So is /rescue/sh
(or /rescue/anything-else).
2018-11-18 17:23:37 +00:00
kre 7ba0d30a60 PR bin/53712
Avoid crash from redirect on null compound command.
2018-11-09 02:11:04 +00:00
kre 375f4ceb14 Allow shells forked to run command substitutions while expanding
prompts to exit when they're done, rather than forcing them to
turn into interactive shells and start reading input ...

Completes a part of the previous changes (just 10+ weeks late...)

Should fix the prompt expansion issue reported by Caóc on
current-users.
2018-11-08 18:37:42 +00:00
kre 8a9a96192a PR bin/48875 (is related, and ameliorated, but not exactly "fixed")
Import a whole set of tree evaluation enhancements from FreeBSD.

With these, before forking, the shell predicts (often) when all it will
have to do after forking (in the parent) is wait for the child and then
exit with the status from the child, and in such a case simply does not
fork, but rather allows the child to take over the parent's role.

This turns out to handle the particular test case from PR bin/48875 in
such a way that it works as hoped, rather than as it did (the delay there
was caused by an extra copy of the shell hanging around waiting for the
background child to complete ... and keeping the command substitution
stdout open, so the "real" parent had to wait in case more output appeared).

As part of doing this, redirection processing for compound commands gets
moved out of evalsubshell() and into a new evalredir(), which allows us
to properly handle errors occurring while performing those redirects,
and not mishandle (as in simply forget) fd's which had been moved out
of the way temporarily.

evaltree() has its degree of recursion reduced by making it loop to
handle the subsequent operation: that is instead of (for any binop
like ';' '&&' (etc)) where it used to
	evaltree(node->left);
	evaltree(node->right);
	return;
it now does (kind of)
	next = node;
	while ((node = next) != NULL) {
		next = NULL;

		if (node is a binary op) {
			evaltree(node->left);
			if appropriate /* if && test for success, etc */
				next = node->right;
			continue;
		}
		/* similar for loops, etc */
	}
which can be a good saving, as while the left side (now) tends to be
(usually) a simple (or simpleish) command, the right side can be many
commands (in a command sequence like a; b; c; d; ...  the node at the
top of the tree will now have "a" as its left node, and the tree for
b; c; d; ... as its right node - until now everything was evaluated
recursively so it made no difference, and the tree was constructed
the other way).

if/while/... statements are done similarly, recurse to evaluate the
condition, then if the (or one of the) body parts is to be evaluated,
set next to that, and loop (previously it recursed).

There is more to do in this area (particularly in the way that case
statements are processed - we can avoid recursion there as well) but
that can wait for another day.

While doing all of this we keep much better track of when the shell is
just going to exit once the current tree is evaluated (with a new
predicate at_eof() to tell us that we have, for sure, reached the end
of the input stream, that is, this shell will, for certain, not be reading
more command input) and use that info to avoid unneeded forks.   For that
we also need another new predicate (have_traps()) to determine of there
are any caught traps which might occur - if there are, we need to remain
to (potentially) handle them, so these optimisations will not occur (to
make the issue in PR 48875 appear again, run the same code, but with a
trap set to execute some code when a signal (or EXIT) occurs - note that
the trap must be set in the appropriate level of sub-shell to have this
effect, any caught traps are cleared in a subshell whenever one is created).

There is still work to be done to handle traps properly, whatever
weirdness they do (some of which is related to some of this.)

These changes do not need man page updates, but 48875 does - an update
to sh.1 will be forthcoming once it is decided what it should say...

Once again, all the heavy lifting for this set of changes comes directly
(with thanks) from the FreeBSD shell.

XXX pullup-8 (but not very soon)
2018-08-19 23:50:27 +00:00
kre 14482abc9a Part 2 of pattern matching (glob etc) fixes.
Attempt to correctly deal with \ (both when it is a literal,
in appropriate cases, and when it appears as CTLESC when it was
detected as a quoting character during parsing).

In a pattern, in sh, no quoted character can ever be anything other
than a literal character.   This is quite different than regular
expressions, and even different than other uses of glob matching,
where shell quoting is not an issue.

In something like

	ls ?\*.c

the ? is a meta-character, the * is a literal (it was quoted).  This
is nothing new, sh has handled that properly for ever.

But the same happens with
	VAR='?\*.c'
and
	ls $VAR

which has not always been handled correctly.   Of course, in

	ls "$VAR"

nothing in VAR is a meta-character (the entire expansion is quoted)
so even the '\' must match literally (or more accurately, no matching
happens - VAR simply contains an "unusual" filename).  But if it had
been

	ls *"$VAR"

then we would be looking for filenames that end with the literal 5
characters that make up $VAR.

The same kinds of things are requires of matching patterns in case
statements, and sub-strings with the % and # operators in variable
expansions.

While here, the final remnant of the ancient !! pattern matching
hack has been removed (the code that actually implemented it was
long gone, but one small piece remained, not doing any real harm,
but potentially wasting time - if someone gave a pattern which would
once have invoked that hack.)
2018-07-22 23:07:48 +00:00
kre c83568a7dc First pass at fixing some of the more arcane pattern matching
possibilities that we do not currently handle all that well.

This mostly means (for now) making sure that quoted pattern
magic characters (as well as quoted sh syntax magic chars)
are properly marked, so they remain known as being quoted,
and do not turn into pattern magic.   Also, make sure that an
unquoted \ in a pattern always quotes whatever comes next
(which, unlike in regular expressions, includes inside []
matches),
2018-07-20 22:47:26 +00:00
kre c6c29888c4 Remove atoi()
Mostly use number() (no longer implemented using atoi()) when an
unsigned integer is required, but use strtoXXX() when a conversion
is wanted, without the possibility or error (like setting OPTIND
and RANDOM).   Always init OPTIND to 1 when sh starts (overriding
anything in environ.)
2018-07-13 22:43:44 +00:00
kre d6d059edc2 PR bin/53201
Don't synerr on
	${var-anything
	more}

The newline in the middle of the var expansion is permitted.

Bug reported by Martijn Dekker from his modernish tests.

XXX pullup-8
2018-04-21 21:32:14 +00:00
kre d1b3ee239b PR bin/52715
Correct a (relatively harmless) use after free in prompt expansion
processing [detected by asan.]

Relatively harmless: as (while incorrect) the way the data is (was)
used more or less guaranteed that the buffer contents would be
unaltered until well after they are (were) no longer wanted (this
is the expanded prompt string, it is just output (or copied into
libedit internal storage) and forgotten.

This should make no visible difference to anyone (not using asan or
similar.)

XXX pullup -8
2017-11-10 17:31:12 +00:00
kre 5f92382c9a Add support for $'...' quoting (based upon C "..." strings, with \ expansions.)
Implementation largely obtained from FreeBSD, with adaptations to meet the
needs and style of this sh, some updates to agree with the current POSIX spec,
and a few other minor changes.

The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 )
[see note 2809 for the current proposed text] is yet to be approved,
so might change.  It currently leaves several aspects as unspecified,
this implementation handles those as:

Where more than 2 hex digits follow \x this implementation processes the
first two as hex, the following characters are processed as if the \x
sequence was not present.  The value obtained from a \nnn octal sequence
is truncated to the low 8 bits (if a bigger value is written, eg: \456.)
Invalid escape sequences are errors.  Invalid \u (or \U) code points are
errors if known to be invalid, otherwise can generate a '?' character.
Where any escape sequence generates nul ('\0') that char, and the rest of
the $'...' string is discarded, but anything remaining in the word is
processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd.

Differences from FreeBSD:
  FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C,
  but the current sh proposal differs.) reeBSD also continues consuming
  as many hex digits as exist after \x (permitted by the spec, but insane),
  and reject \u0000 as invalid).  Some of this is possibly because that
  their implementation is based upon an earlier proposal, perhaps note 590 -
  though that has been updated several times.

Differences from the current POSIX proposal:
  We currently always generate UTF-8 for the \u & \U escapes.   We should
  generate the equivalent character from the current locale's character set
  (and UTF8 only if that is what the current locale uses.)
  If anyone would like to correct that, go ahead.

  We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate
  the appropriate control character (SOH for \cA for example) with whatever
  value that has in the current character set.   Apart from EBCDIC, which
  we do not support, I've never seen a case where they differ, so ...
2017-08-21 13:20:49 +00:00
kre 70a37837ef PR bin/52458
Avoid mangling history when editing is enabled, and the prompt contains a \n

Also, allow empty input lines into history when they are being appended to
a previous (partial) command (but not when they would just make an empty entry).

For all the gory details, see the PR.

Note nothing here actually makes prompts containing \n work correctly
when editing is enabled, that's a libedit issue, which will be addressed
some other time.
2017-08-05 11:33:05 +00:00
kre fab2da9b21 PR bin/48498 PR bin/52426
Don't ignore unexpected reserved words after ';'
Don't allow any random token type as a case stmt pattern, only a word.
	Those are ancient ash bugs and do not affect correct scripts.

Don't ignore redirects in a case stmt list where the list is nothing but
redirects (if the pattern matches, the redirects should be performed).
	That was introduced when a redirect only case stmt list was allowed
	(older shells had generated a syntax error.)

Random cleanups/refactoring taken from or inspired by the FreeBSD sh
parser ...  use makename() consistently to create a NARG node - we
were using it in a couple of places but most NARG node creation was open
coded.  Introduce consumetoken() (from FreeBSD) to handle the fairly
common case where exactly one token type must come next, and we need to
check that, and skip past it when found (or error) and linebreak() (new)
to handle places where optional \n's are permitted.
Both previously open coded.

Simplify list() by removing its second arg, which was only ever used when
handling the end of a `` (old style command substitution).  Simply move
the code from inside list() to just after its call in the `` case (from
FreeBSD.)
2017-07-26 23:09:41 +00:00
kre a64e63ed28 Do a better job of detecting the error in pkgsrc/devel/libbson-1.6.3's
configure script, ie: $(( which is intended to be a sub-shell in a
command substitution, but is an arith subst instead, it needs to be
written $( ( to do as intended.   Instead of just blindly carrying on to
find the missing )) somewhere, anywhere, give up as soon as we have seen
an unbalanced ')' that isn't immediately followed by another ')' which
in a valid arith subst it always would be.

While here, there has been a comment in the code for quite a while noting a
difference in the standard between the text descr & grammar when it comes to
the syntax of case statements.   Add more comments to explain why parsing it
as we do is in fact definitely the correct way (ie: the grammar wins arguments
like this...).
2017-07-03 20:16:44 +00:00
kre 1fca9bbf62 Implement PS1, PS2 and PS4 expansions (variable expansions, arithmetic
expansions, and if enabled by the promptcmds option, command substitutions.)
2017-06-30 23:02:56 +00:00
kre 7f75cc46b5 Another ancient (highly improbable) bug bites the dust. This one
caused by incorrect macro usage (ie: using the wrong one) which has
been in the sources since version 1.1 (ie: forever).

Like the previous (STACKSTRNUL) bug, the probability of this one
actually occurring has been infinitesimal but the LINENO code increases
that to infinitesimal and a smidgen... (or a few, depending upon usage).

Still, apparently that was enough, Kamil Rytarowski discovered that the
zsh configure script (damn competition!) managed to trigger this problem.
2017-06-24 11:23:35 +00:00
kre ee3b307fc5 Many internal memory management type fixes.
PR bin/52302   (core dump with interactive shell, here doc and error
on same line) is fixed.   (An old bug.)

echo "$( echo x; for a in $( seq 1000 ); do printf '%s\n'; done; echo y )"
consistently prints 1002 lines (x, 1000 empty ones, then y) as it should
(And you don't want to know what it did before, or why.) (Another old one.)

(Recently added) Problems with ~ expansion fixed (mem management related).

Proper fix for the cwrappers configure problem (which includes the quick
fix that was done earlier, but extends upon that to be correct). (This was
another newly added problem.)

And the really devious (and rare) old bug - if STACKSTRNUL() needs to
allocate a new buffer in which to store the \0, calculate the size of
the string space remaining correctly, unlike when SPUTC() grows the
buffer, there is no actual data being stored in the STACKSTRNUL()
case - the string space remaining was calculated as one byte too few.
That would be harmless, unless the next buffer also filled, in which
case it was assumed that it was really full, not one byte less, meaning
one junk char (a nul, or anything) was being copied into the next (even
bigger buffer) corrupting the data.

Consistent use of stalloc() to allocate a new block of (stack) memory,
and grabstackstr() to claim a block of (stack) memory that had already
been occupied but not claimed as in use.  Since grabstackstr is implemented
as just a call to stalloc() this is a no-op change in practice, but makes
it much easier to comprehend what is really happening.  Previous code
sometimes used stalloc() when the use case was really for grabstackstr().
Change grabstackstr() to actually use the arg passed to it, instead of
(not much better than) guessing how much space to claim,

More care when using unstalloc()/ungrabstackstr() to return space, and in
particular when the stack must be returned to its previous state, rather than
just returning no-longer needed space, neither of those work.  They also don't
work properly if there have been (really, even might have been) any stack mem
allocations since the last stalloc()/grabstackstr().   (If we know there
cannot have been then the alloc/release sequence is kind of pointless.)
To work correctly in general we must use setstackmark()/popstackmark() so
do that when needed.  Have those also save/restore the top of stack string
space remaining.

	[Aside: for those reading this, the "stack" mentioned is not
	in any way related to the thing used for maintaining the C
	function call state, ie: the "stack segment" of the program,
	but the shell's internal memory management strategy.]

More comments to better explain what is happening in some cases.
Also cleaned up some hopelessly broken DEBUG mode data that were
recently added (no effect on anyone but the poor semi-human attempting
to make sense of it...).

User visible changes:

Proper counting of line numbers when a here document is delimited
by a multi-line end-delimiter, as in

	cat << 'REALLY
	END'
	here doc line 1
	here doc line 2
	REALLY
	END

(which is an obscure case, but nothing says should not work.)  The \n
in the end-delimiter of the here doc (the last one) was not incrementing
the line number, which from that point on in the script would be 1 too
low (or more, for end-delimiters with more than one \n in them.)

With tilde expansion:
	unset HOME; echo ~
changed to return getpwuid(getuid())->pw_home instead of failing (returning ~)

POSIX says this is unspecified, which makes it difficult for a script to
compensate for being run without HOME set (as in env -i sh script), so
while not able to be used portably, this seems like a useful extension
(and is implemented the same way by some other shells).

Further, with
	HOME=; printf %s ~
we now write nothing (which is required by POSIX - which requires ~ to
expand to the value of $HOME if it is set) previously if $HOME (in this
case) or a user's directory in the passwd file (for ~user) were a null
STRING, We failed the ~ expansion and left behind '~' or '~user'.
2017-06-17 07:22:12 +00:00
kre e4db6fa481 (Perhaps) temporary fix to pkgtools (cwrappers) build (configure).
Expanding  `` containing \ \n sequences looks to have been giving
problems.   I don't think this is the correct fix, but it will do
no worse harm than (perhaps) incorrectly calculating LINENO in this
kind of (rare) circumstance.   I'll look and see if there should be
a better fix later.
2017-06-08 22:10:39 +00:00
kre 15de6ce7d7 Remove some left over baggage from the LINENO v1 implementation that
didn't get removed with v2, and should have.   This would have had
(I think, without having tested it) one very minor effect on the way
LINENO worked in the v2 implementation, but my guess is it would have
taken a long time before anyone noticed...
2017-06-08 13:12:17 +00:00
kre f7d07fc011 Undo some over agressive fixes for a (pre-commit) bug that did not
need these changes to be fixed - and these cause problems in another
absurd use case.   Either of these issues is unlikely to be seen by
anyone who isn't an idiot masochist...
2017-06-07 08:10:31 +00:00
kre 727a69dc1d A better LINENO implementation. This version deletes (well, #if 0's out)
the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)).
(Code to invert the LINENO hack when required, like when de-compiling the
execution tree to provide the "jobs" command strings, is still included,
that can be deleted when the LINENO hack is completely removed - look for
refs to VSLINENO throughout the code.  The var funclinno in parser.c can
also be removed, it is used only for the LINENO hack.)

This version produces accurate results: $((LINENO)) was made as accurate
as the LINENO hack made ${LINENO} which is very good.  That's why the
LINENO hack is not yet completely removed, so it can be easily re-enabled.
If you can tell the difference when it is in use, or not in use, then
something has broken (or I managed to miss a case somewhere.)

The way that LINENO works is documented in its own (new) section in the
man page, so nothing more about that, or the new options, etc, here.

This version introduces the possibility of having a "reference" function
associated with a variable, which gets called whenever the value of the
variable is required (that's what implements LINENO).  There is just
one function pointer however, so any particular variable gets at most
one of the set function (as used for PATH, etc) or the reference function.
The VFUNCREF bit in the var flags indicates which func the variable in
question uses (if any - the func ptr, as before, can be NULL).

I would not call the results of this perfect yet, but it is close.
2017-06-07 05:08:32 +00:00
kre fd38bbe2e4 An initial attempt at implementing LINENO to meet the specs.
Aside from one problem (not too hard to fix if it was ever needed) this version
does about as well as most other shell implementations when expanding
$((LINENO)) and better for ${LINENO} as it retains the "LINENO hack" for the
latter, and that is very accurate.

Unfortunately that means that ${LINENO} and $((LINENO)) do not always produce
the same value when used on the same line (a defect that other shells do not
share - aside from the FreeBSD sh as it is today, where only the LINENO hack
exists and so (like for us before this commit) $((LINENO)) is always either
0, or at least whatever value was last set, perhaps by
	LINENO=${LINENO}
which does actually work ... for that one line...)

This could be corrected by simply removing the LINENO hack (look for the string
LINENO in parser.c) in which case ${LINENO} and $((LINENO)) would give the
same (not perfectly accurate) values, as do most other shells.

POSIX requires that LINENO be set before each command, and this implementation
does that fairly literally - except that we only bother before the commands
which actually expand words (for, case and simple commands).   Unfortunately
this forgot that expansions also occur in redirects, and the other compound
commands can also have redirects, so if a redirect on one of the other compound
commands wants to use the value of $((LINENO)) as a part of a generated file
name, then it will get an incorrect value.  This is the "one problem" above.
(Because the LINENO hack is still enabled, using ${LINENO} works.)

This could be fixed, but as this version of the LINENO implementation is just
for reference purposes (it will be superseded within minutes by a better one)
I won't bother.  However should anyone else decide that this is a better choice
(it is probably a smaller implementation, in terms of code & data space then
the replacement, but also I would expect, slower, and definitely less accurate)
this defect is something to bear in mind, and fix.

This version retains the *BSD historical practice that line numbers in functions
(all functions) count from 1 from the start of the function, and elsewhere,
start from 1 from where the shell started reading the input file/stream in
question.  In an "eval" expression the line number starts at the line of the
"eval" (and then increases if the input is a multi-line string).

Note: this version is not documented (beyond as much as LINENO was before)
hence this slightly longer than usual commit message.
2017-06-07 04:44:17 +00:00
kre 375027ae61 When we record an arithmetic expression ($(( ))) as being quoted,
what matters is the quoting state just before we switch into arithmetic
syntax parsing mode, not the state after...

This fixes the regiression introduced earlier today (UTC) where
quoted arithmetic expressions were being subjected to word splitting.
2017-06-03 18:31:35 +00:00
kre dd6b641408 Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented
differently...)

In particular	${01} is now $1 not $0  (for ${0any-digits})

		${4294967297} is most probably now ""
			(unless you have a very large number of params)
		it is no longer an alias for $1  (4294967297 & 0xFFFFFFFF) == 1

		$(( expr $(( more )) stuff )) is no longer the same as
		$(( expr (( more )) stuff )) which was sometimes OK, as in:
			$(( 3 + $(( 2 - 1 )) * 3 ))
		but not always as in:
			$(( 1$((1 + 1))1 ))
		which should be 121, but was an arith syntax error as
			1((1 + 1))1
		is meaningless.

Probably some more.   This also sprinkles a little const, splits a big
func that had 2 (kind of unrelated) purposes into two simpler ones,
and avoids some (semi-dubious) modifications (and restores) in the input
string to insert \0's when they were needed.
2017-06-03 10:31:16 +00:00
kre 9167dc7b19 NFC (normal builds): DEBUG only change - convert parser to newer trace method.
parser tracing is useful when debugging the parser (which admittedly is
fairly often...) but there is a lot of it, and it gets in the way when
looking at something else.   Now we can turn it off when not wanted.
2017-05-29 10:43:27 +00:00
kre 2d8874d9a7 More standard (and saner) implementation of the ! reserved word.
Unless the shell is compiled with the (compilation time) option
BOGUS_NOT_COMMAND (as in CFLAGS+=-DBOGUS_NOT_COMMAND) which it
will not normally be, the ! command (reserved word) will only
be permitted at the start of a pipeline (which includes the
degenerate pipeline with no '|'s in it of course - ie: a simple cmd)
and not in the middle of a pipeline sequence (no "cmd | ! cmd" nonsense.)
If the latter is really required, then "cmd | { ! cmd; }" works as
a standard equivalent.

In POSIX mode, permit only one !  ("! pipeline" is ok. "! ! pipeline" is not).
Again, if needed (and POSIX conformance is wanted) "! { ! pipeline; }"
works as an alternative - and is safer, some shells treat "! ! cmd" as
being identical to "cmd" (this one did until recently.)
2017-05-27 11:19:57 +00:00
kre db7849a108 NFC: changes to comments only - expand/add comments relating to ${#...}
parsing, and all its peculiarities.
2017-05-14 11:17:04 +00:00
kre 26a83a43ec Fix some parser weirdness...
${#VAR:-foo} (or any other modifier on ${#VAR} is a syntax error.
On the other hand ${##} is not, nor is ${##13} though they mean
quite different things (the latter is an idiom everyone should learn,
... $# except we refuse to admit the possibility that it is 13...
Even I cannot explain what ${#-foo} used to do, but it wasn't sane!
(It should be just $# as $# is never unset, but ...)
Shell syntax is truly a wondrous thing!
2017-05-11 15:07:37 +00:00
kre e64f57b73d NFC: Whitespace, KNF, and (some) consistency. 2017-05-10 11:06:47 +00:00
kre aa563ca425 If we are going to permit
! ! pipeline
(And for now the other places where ! is permitted)
we should at least generate the logically correct exit
status:
	! ! (exit 5); echo $?
should print 1, not 5.   ksh and bosh do it this way - and it makes sense.
bash and the FreeBSD sh echo "5" (as did we until now.)
dash, zsh, yash all enforce the standard syntax, and prohibit this.
2017-05-09 05:14:03 +00:00
kre 09a5470484 Remove bogus extra \n from syntax error message. 2017-05-09 02:47:47 +00:00
kre 7d41ae4eb6 Implement the ';&' (used instead of ';;') case statement list terminator
which causes fall through the to command list of the following pattern
(wuthout evaluating that pattern).   This has been approved for inclusion
in the next major version of the POSIX standard (Issue 8), and is
implemented by most other shells.

Now all form a circle and together attempt to summon the great wizd
in the hopes that his magic spells can transform the poor attempt
at documenting this feature into something rational...
2017-05-04 04:37:51 +00:00
kre 1b5660ae57 Fix the heredoc line counting bug that I caused when the heredoc
processing was changed just over a year ago (rev 1.111).
2017-05-03 21:36:16 +00:00
kre eaa91315bd Deal with \newline line continuations more correctly.
They can occur anywhere (*anywhere*) not only where it
happens to be convenient to the parser...

This fix from FreeBSD (thanks again folks).

To make this work, pushstring()'s signature needed to change to allow a
const char * as its string arg, which meant sprinkling some const other
places for a brighter appearance (and handling fallout).

All this because I wanted to see what number would come from

echo $\
{\
L\
I\
N\
E\
N\
O\
}

and was surprised at the result!    That works now...

The bug would also affect stuff like

true &\
& false

and all kinds of other uses where the \newline occurred in the
"wrong" place.

An ATF test for sh syntax is coming... (sometime.)
2017-05-03 04:51:04 +00:00