Commit Graph

127 Commits

Author SHA1 Message Date
kre 14482abc9a Part 2 of pattern matching (glob etc) fixes.
Attempt to correctly deal with \ (both when it is a literal,
in appropriate cases, and when it appears as CTLESC when it was
detected as a quoting character during parsing).

In a pattern, in sh, no quoted character can ever be anything other
than a literal character.   This is quite different than regular
expressions, and even different than other uses of glob matching,
where shell quoting is not an issue.

In something like

	ls ?\*.c

the ? is a meta-character, the * is a literal (it was quoted).  This
is nothing new, sh has handled that properly for ever.

But the same happens with
	VAR='?\*.c'
and
	ls $VAR

which has not always been handled correctly.   Of course, in

	ls "$VAR"

nothing in VAR is a meta-character (the entire expansion is quoted)
so even the '\' must match literally (or more accurately, no matching
happens - VAR simply contains an "unusual" filename).  But if it had
been

	ls *"$VAR"

then we would be looking for filenames that end with the literal 5
characters that make up $VAR.

The same kinds of things are requires of matching patterns in case
statements, and sub-strings with the % and # operators in variable
expansions.

While here, the final remnant of the ancient !! pattern matching
hack has been removed (the code that actually implemented it was
long gone, but one small piece remained, not doing any real harm,
but potentially wasting time - if someone gave a pattern which would
once have invoked that hack.)
2018-07-22 23:07:48 +00:00
kre d211c89f40 NFC: Whitespace cleanups 2018-07-22 21:16:58 +00:00
kre bcacfd9a45 DEBUG mode only change (ie: no effect to any normal shell).
Add tracing of pattern matching (aid in debugging various issues.)
2018-07-22 20:38:06 +00:00
kre c83568a7dc First pass at fixing some of the more arcane pattern matching
possibilities that we do not currently handle all that well.

This mostly means (for now) making sure that quoted pattern
magic characters (as well as quoted sh syntax magic chars)
are properly marked, so they remain known as being quoted,
and do not turn into pattern magic.   Also, make sure that an
unquoted \ in a pattern always quotes whatever comes next
(which, unlike in regular expressions, includes inside []
matches),
2018-07-20 22:47:26 +00:00
kre b81009ce62 When processing character classes ([:xxx:] inside []), treat a class name
that is longer than we can handle the same way we treat an unknown
class name (as a valid char class which contains nothing, so never
matches).   Previously a "too long" class name invalidated the
class, so [:very-long-name:] would match any of  '[' ':' 'v'  ...
(note: "very-long-name" is not long enough to trigger this, but you
get the idea!)

However, the name itself has a restricted syntax ([[:***:]] is not a
character class, it is a match for one of a '[' ':' or '*', followed by
a ']') which we did not implement - check the syntax of the name before
treating it as a character class (but we do add '_' to alphanumerics
as legal class name characters).
2018-06-22 18:19:41 +00:00
kre 829cc62a58 When matching a char class ([[:name:]]) in a pattern (for filename
expansion, case patterrns, etc) do not force '[' to be a member of
every class.

Before this fix, try:
	case [ in [[:alpha:]]) echo Huh\?;; esac

XXX pullup-8    (Perhaps -7 as well, though that shell version has
much more relevant bugs than this one.)  This bug is not in -6 as
that has no charclass support.
2018-06-22 17:22:34 +00:00
kre bd208c6933 Three fixes and a change to ~ expansions
1. A serious bug introduced 3 1/2 months ago (approx) (rev 1.116) which
   broke all but the simple cases of ~ expansions is fixed (amazingly,
   given the magnitude of this problem, no-one noticed!)

2. An ancient bug (probably from when ~ expansion was first addedin 1994, and
   certainly is in NetBSD-6 vintage shells) where ${UnSeT:-~} (and similar)
   does not expand the ~ is fixed (note that ${UnSeT:-~/} does expand,
   this should give a clue to the cause of the problem.

3. A fix/change to make the effects of ~ expansions on ${UnSeT:=whatever}
   identical to those in UnSeT=whatever   In particular, with HOME=/foo
   ${UnSeT:=~:~} now assigns, and expands to, /foo:/foo rather than ~:~
   just as VAR=~:~ assigns /foo:/foo to VAR.   Note this is even after the
   previous fix (ie: appending a '/' would not change the results here.)

   It is hard to call this one a bug fix for certain (though I believe it is)
   as many other shells also produce different results for the ${V:=...}
   expansions than  they do for V=... (though not all the same as we did).

   POSIX is not clear about this, expanding ~ after : in VAR=whatever
   assignments is clear, whether ${U:=whatever} assignments should be
   treated the same way is not stated, one way or the other.

4. Change to make ':' terminate the user name in a ~ expansion in all cases,
   not only in assignments.   This makes sense, as ':' is one character that
   cannot occur in user names, no matter how otherwise weird they become.
   bash (incl in posix mode) ksh93 and bosh all act this way, whereas most
   other shells (and POSIX) do not.   Because this is clearly an extension
   to POSIX, do this one only when not in posix mode (not set -o posix).
2017-10-06 21:09:45 +00:00
kre 5f92382c9a Add support for $'...' quoting (based upon C "..." strings, with \ expansions.)
Implementation largely obtained from FreeBSD, with adaptations to meet the
needs and style of this sh, some updates to agree with the current POSIX spec,
and a few other minor changes.

The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 )
[see note 2809 for the current proposed text] is yet to be approved,
so might change.  It currently leaves several aspects as unspecified,
this implementation handles those as:

Where more than 2 hex digits follow \x this implementation processes the
first two as hex, the following characters are processed as if the \x
sequence was not present.  The value obtained from a \nnn octal sequence
is truncated to the low 8 bits (if a bigger value is written, eg: \456.)
Invalid escape sequences are errors.  Invalid \u (or \U) code points are
errors if known to be invalid, otherwise can generate a '?' character.
Where any escape sequence generates nul ('\0') that char, and the rest of
the $'...' string is discarded, but anything remaining in the word is
processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd.

Differences from FreeBSD:
  FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C,
  but the current sh proposal differs.) reeBSD also continues consuming
  as many hex digits as exist after \x (permitted by the spec, but insane),
  and reject \u0000 as invalid).  Some of this is possibly because that
  their implementation is based upon an earlier proposal, perhaps note 590 -
  though that has been updated several times.

Differences from the current POSIX proposal:
  We currently always generate UTF-8 for the \u & \U escapes.   We should
  generate the equivalent character from the current locale's character set
  (and UTF8 only if that is what the current locale uses.)
  If anyone would like to correct that, go ahead.

  We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate
  the appropriate control character (SOH for \cA for example) with whatever
  value that has in the current character set.   Apart from EBCDIC, which
  we do not support, I've never seen a case where they differ, so ...
2017-08-21 13:20:49 +00:00
kre 1fca9bbf62 Implement PS1, PS2 and PS4 expansions (variable expansions, arithmetic
expansions, and if enabled by the promptcmds option, command substitutions.)
2017-06-30 23:02:56 +00:00
kre 701ac13230 Now that excessive use of STACKSTRNUL has served its purpose (well, accidental
purpose) in exposing the bug in its implementation, go back to not using
it when not needed for DEBUG TRACE purposes.   This change should have no
practical effect on either a DEBUG shell (where the STACKSTRNUL() calls
remain) or a non DEBUG shell where they are not needed.
2017-06-19 02:46:50 +00:00
kre f72bc19e74 NFC: DEBUG mode only change. Fix botched cleanup of one TRACE(). 2017-06-18 07:50:46 +00:00
kre ee3b307fc5 Many internal memory management type fixes.
PR bin/52302   (core dump with interactive shell, here doc and error
on same line) is fixed.   (An old bug.)

echo "$( echo x; for a in $( seq 1000 ); do printf '%s\n'; done; echo y )"
consistently prints 1002 lines (x, 1000 empty ones, then y) as it should
(And you don't want to know what it did before, or why.) (Another old one.)

(Recently added) Problems with ~ expansion fixed (mem management related).

Proper fix for the cwrappers configure problem (which includes the quick
fix that was done earlier, but extends upon that to be correct). (This was
another newly added problem.)

And the really devious (and rare) old bug - if STACKSTRNUL() needs to
allocate a new buffer in which to store the \0, calculate the size of
the string space remaining correctly, unlike when SPUTC() grows the
buffer, there is no actual data being stored in the STACKSTRNUL()
case - the string space remaining was calculated as one byte too few.
That would be harmless, unless the next buffer also filled, in which
case it was assumed that it was really full, not one byte less, meaning
one junk char (a nul, or anything) was being copied into the next (even
bigger buffer) corrupting the data.

Consistent use of stalloc() to allocate a new block of (stack) memory,
and grabstackstr() to claim a block of (stack) memory that had already
been occupied but not claimed as in use.  Since grabstackstr is implemented
as just a call to stalloc() this is a no-op change in practice, but makes
it much easier to comprehend what is really happening.  Previous code
sometimes used stalloc() when the use case was really for grabstackstr().
Change grabstackstr() to actually use the arg passed to it, instead of
(not much better than) guessing how much space to claim,

More care when using unstalloc()/ungrabstackstr() to return space, and in
particular when the stack must be returned to its previous state, rather than
just returning no-longer needed space, neither of those work.  They also don't
work properly if there have been (really, even might have been) any stack mem
allocations since the last stalloc()/grabstackstr().   (If we know there
cannot have been then the alloc/release sequence is kind of pointless.)
To work correctly in general we must use setstackmark()/popstackmark() so
do that when needed.  Have those also save/restore the top of stack string
space remaining.

	[Aside: for those reading this, the "stack" mentioned is not
	in any way related to the thing used for maintaining the C
	function call state, ie: the "stack segment" of the program,
	but the shell's internal memory management strategy.]

More comments to better explain what is happening in some cases.
Also cleaned up some hopelessly broken DEBUG mode data that were
recently added (no effect on anyone but the poor semi-human attempting
to make sense of it...).

User visible changes:

Proper counting of line numbers when a here document is delimited
by a multi-line end-delimiter, as in

	cat << 'REALLY
	END'
	here doc line 1
	here doc line 2
	REALLY
	END

(which is an obscure case, but nothing says should not work.)  The \n
in the end-delimiter of the here doc (the last one) was not incrementing
the line number, which from that point on in the script would be 1 too
low (or more, for end-delimiters with more than one \n in them.)

With tilde expansion:
	unset HOME; echo ~
changed to return getpwuid(getuid())->pw_home instead of failing (returning ~)

POSIX says this is unspecified, which makes it difficult for a script to
compensate for being run without HOME set (as in env -i sh script), so
while not able to be used portably, this seems like a useful extension
(and is implemented the same way by some other shells).

Further, with
	HOME=; printf %s ~
we now write nothing (which is required by POSIX - which requires ~ to
expand to the value of $HOME if it is set) previously if $HOME (in this
case) or a user's directory in the passwd file (for ~user) were a null
STRING, We failed the ~ expansion and left behind '~' or '~user'.
2017-06-17 07:22:12 +00:00
kre 51791eabef PR bin/52280
removescapes_nl in expari() even when not quoted,
CRTNONL's appear regardless of quoting (unlike CTLESC).
2017-06-07 09:31:30 +00:00
kre 69c48e3253 Set the line number before expanding args, not after. As the line_number
would have usually been set earlier, this change is mostly an effective
no-op, but it is better this way (just in case) - not observed to have
caused any problems.
2017-06-07 08:07:50 +00:00
kre 727a69dc1d A better LINENO implementation. This version deletes (well, #if 0's out)
the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)).
(Code to invert the LINENO hack when required, like when de-compiling the
execution tree to provide the "jobs" command strings, is still included,
that can be deleted when the LINENO hack is completely removed - look for
refs to VSLINENO throughout the code.  The var funclinno in parser.c can
also be removed, it is used only for the LINENO hack.)

This version produces accurate results: $((LINENO)) was made as accurate
as the LINENO hack made ${LINENO} which is very good.  That's why the
LINENO hack is not yet completely removed, so it can be easily re-enabled.
If you can tell the difference when it is in use, or not in use, then
something has broken (or I managed to miss a case somewhere.)

The way that LINENO works is documented in its own (new) section in the
man page, so nothing more about that, or the new options, etc, here.

This version introduces the possibility of having a "reference" function
associated with a variable, which gets called whenever the value of the
variable is required (that's what implements LINENO).  There is just
one function pointer however, so any particular variable gets at most
one of the set function (as used for PATH, etc) or the reference function.
The VFUNCREF bit in the var flags indicates which func the variable in
question uses (if any - the func ptr, as before, can be NULL).

I would not call the results of this perfect yet, but it is close.
2017-06-07 05:08:32 +00:00
kre 58810bf006 Another arithmetic expansion recordregion() fix, this time
calculate the lenght (used to calculate the end) based upon the
correct starting point.

Thanks to John Klos for finding and reporting this one.
2017-06-05 02:15:55 +00:00
kre aa681add59 PR bin/52272 - fix an off-by one that broke ~ expansions. 2017-06-04 23:40:31 +00:00
kre ea89b1308b DEBUG mode only change. Convert old trace style to new, and add some more.
NFC for any non-DEBUG shell.
2017-06-03 21:52:05 +00:00
kre d3573c91dc NFC: Code style only. Rather than being perverse and adding the
negative of a negative number, just add a positive number instead...
(the previous version came about purely as an accident of the way the
relevant piece of code was added and debugged.... that's my story anyway!)
2017-06-03 20:55:53 +00:00
kre 4dd9bbfe51 The correct usage of recordregion() is (begin, end) not (begin, length).
Fixing this fixes a regression introduced earlier today (UTC) where
arithmetic expressions would be split correctly when the arithmetic
started at the beginning of a word:
	echo $(( expression ))
where "begin" is 0, and so (begin, length) is the same as (begin, begin+length)
(aka: (begin,end) - and yes, "end" means 1 after last to consider).
but did not work correctly when the usage was
	echo XXX$( expression ))
(begin !+ 0) and would only split (some part of) the result of the expression.

This regression was also foung by the new t_fsplit:split_arith
test case added earlier to the ATF tests for sh.
2017-06-03 18:37:37 +00:00
kre dd6b641408 Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented
differently...)

In particular	${01} is now $1 not $0  (for ${0any-digits})

		${4294967297} is most probably now ""
			(unless you have a very large number of params)
		it is no longer an alias for $1  (4294967297 & 0xFFFFFFFF) == 1

		$(( expr $(( more )) stuff )) is no longer the same as
		$(( expr (( more )) stuff )) which was sometimes OK, as in:
			$(( 3 + $(( 2 - 1 )) * 3 ))
		but not always as in:
			$(( 1$((1 + 1))1 ))
		which should be 121, but was an arith syntax error as
			1((1 + 1))1
		is meaningless.

Probably some more.   This also sprinkles a little const, splits a big
func that had 2 (kind of unrelated) purposes into two simpler ones,
and avoids some (semi-dubious) modifications (and restores) in the input
string to insert \0's when they were needed.
2017-06-03 10:31:16 +00:00
kre f359a311bc Arrange for set -o and $- output to be sorted, rather than more
or less random (and becoming worse as more options are added.)
Since the data is known at compile time, sort at compile time,
rather than at run time.
2017-05-28 00:38:01 +00:00
christos 8e4a570f78 Convert the pattern matcher from recursive to backtracking (from FreeBSD). 2017-04-26 17:43:33 +00:00
kre bda1f28d90 PR bin/52090 - fix expansion of unquoted $* 2017-03-20 11:48:01 +00:00
kre cc8e58edf3 Finish support for all required $(( )) (shell arithmetic) operators,
closing PR bin/50958

That meant adding the assignment operators ('=', and all of the +=, *= ...)
Currently, ++, --, and ',' are not implemented (none of those are required
by posix) but support for them (most likely ',' first) might be added later.

To do this, I removed the yacc/lex arithmetic parser completely, and
replaced it with a hand written recursive descent parser, that I obtained
from FreeBSD, who earlier had obtained it from dash (Herbert Xu).

While doing the import, I cleaned up the sources (changed some file names
to avoid requiring a clean build, or signifigant surgery to the obj
directories if "build.sh -u" was to be used - "build.sh -u" should work
fine as it is now) removed some dashisms, applied some KNF, ...
2017-03-20 11:26:07 +00:00
kre df029fcc1b Fix for the "${unset-var#$(cmd1)}$(cmd2)" runs the wrong command bug.
... From FreeBSD
2017-03-12 04:19:29 +00:00
christos 9302f8ef1a Implement the NETBSD_SHELL readonly unexportable unimportable
variable (with its current value set at 20160401) as discussed on
current-users and tech-userlevel. This also includes the necessary
support to implement it properly (particularly the unexportable
part) and adds options to the export command to support unexportable
variables. Also implement the "posix" option (no single letter
equivalent) which gets its default value from whether or not
POSIXLY_CORRECT is set in the environment when the shell starts
(but can be changed just like any other option using -o and +o on
the command line, or the set builtin command.) While there, fix
all uses of options so it is possible to have options that have a
short (one char) name, and no long name, just as it has been possible
to have options with a long name and no short name, though there
are currently none (with no long name).  For now, the only use of
the posix option is to control whether ${ENV} is read at startup
by a non-interactive shell, so changing it with set is not usful
- that might change in the future. (from kre@)
2016-03-31 16:16:35 +00:00
christos b322b670f0 After discussions with Jilles Tjoelker (FreeBSD shell) and following
a suggestion from him, the way the fix to PR bin/50993 was implemented
has changed a little.   There are three steps involved in processing
a here document, reading it, parsing it, and then evaluating it
before applying it to the correct file descriptor for the command
to use.  The third of those is not related to this problem, and
has not changed.  The bug was caused by combining the first two
steps into one (and not doing it correctly - which would be hard
that way.)  The fix is to split the first two stages into separate
events.   The original fix moved the 2nd stage (parsing) to just
immediately before the 3rd stage (evaluation.)  Jilles pointed out
some unwanted side effects from doing it that way, and suggested
moving the 2nd stage to immediately after the first.  This commit
makes that change.  The effect is to revert the changes to expand.c
and parser.h (which are no longer needed) and simplify slightly
the change to parser.c. (from kre@)
2016-03-31 13:27:44 +00:00
christos 1d1484aa26 PR bin/50993 - this is a significant rewrite of the way that here
documents are processed.  Now, when first detected, they are
simply read (the only change made to the text is to join lines
ended with a \ to the subsequent line, otherwise end marker detection
does not work correctly (for here docs with an unquoted endmarker
only of course.)  This patch also moves the "internal subroutine"
for looking for the end marker out of readtoken1() (which had to
happen as readtoken1 is no longer reading the here doc when it is
needed) - that uses code mostly taken from FreeBSD's sh (thanks!)
and along the way results in some restrictions on what the end
marker can be being removed.   We still do not allow all we should.
(from kre@)
2016-03-27 14:39:33 +00:00
christos ca12a0b88a General KNF and source code cleanups, avoid scattering the
magic string " \t\n" all over the place, slightly improved
syntax error messages, restructured some of the code for
clarity, don't allow IFS to be imported through the environment,
and remove the (never) conditionally compiled ATTY option.
Apart from one or two syntax error messages, and ignoring IFS
if present in the environment, this is intended to have no
user visible changes. (from kre@)
2016-03-27 14:34:46 +00:00
christos 8c844a2ebd PR/19832, PR/35423: Fix handling 0x81 and 0x82 characters in expansions
($VAR etc) that are used to generate filenames for redirections. (from kre)
2016-03-16 15:44:35 +00:00
christos dec2ca90b9 PR bin/50834o: fix expansions of (unquoted) ${unset_var-} and ""$@ (from kre) 2016-03-08 14:09:07 +00:00
christos 78204bbf10 remove useless casts 2016-02-27 16:28:50 +00:00
christos 606614c83d PR bin/43469 - correctly handle quoting of the pattern part of ${var%pat}
type expansions. (from kre)
2016-02-22 20:02:00 +00:00
christos 19d6b8392c PR/50179: Timo Buhrmester: sh(1) variable expansion bug 2015-08-27 07:46:47 +00:00
joerg 3eb04a3615 Use an explicit body for a "until not EINTR" loop. 2015-06-06 15:22:58 +00:00
roy 7969ec4d55 Add wctype(3) support to Shell Patterns.
Obtained from FreeBSD.
2014-01-20 14:05:51 +00:00
ast 83d9b54597 Fix PR bin/48202 [non-critical/low]:
sh +nounset and `for X; do` iteration fails if parameter set empty
by applying and testing FreeBSD's patch of Oct 24 2009 for this; see
  http://svnweb.freebsd.org/base/head/bin/sh/expand.c?r1=198453&r2=198454
Also created an ATF test in tests/bin/sh/t_expand.sh for this error and
corrected a space->tabs problem there as well.
2013-10-06 21:05:50 +00:00
christos 018a6f7864 add crude $LINENO support for FreeBSD 2013-10-02 19:52:58 +00:00
dsl 7d60739ae7 Fix the expansion of "$(foo-$bar}" so that IFS isn't applied when
expanding $bar.
Noted by Greg Troxel on tech-userlevel running some 'git' tests.
Should fix PR bin/47361
2012-12-22 20:15:22 +00:00
christos a080d61232 include <limits.h> for CHAR_MIN/CHAR_MAX 2012-03-28 20:11:25 +00:00
plunky 9f61b80465 NULL does not need a cast 2011-08-31 16:24:54 +00:00
christos 69a4e2ee5b PR/45269: Andreas Gustafsson: Instead of falling off the edge when eating trailing newlines
if the block has moved, arrange so that trailing newlines are never placed in the string
in the first place, by accumulating them and adding them only after we've encountered a
non-newline character. This allows also for more efficient appending since we know how much
we need beforehand. From FreeBSD.
2011-08-23 10:04:39 +00:00
christos 4fc4fe2edf PR/45069: Henning Petersen: Use prototypes from builtins.h . 2011-06-18 21:18:46 +00:00
tsutsui 49ee47d09d Use %zu in printf format for size_t value. 2009-11-27 10:50:04 +00:00
lukem c6144e484f fix -Wsign-compare issues 2009-01-18 00:24:29 +00:00
christos 271febebf6 use EXP_CASE only when trimming and unquoted. 2008-12-21 17:15:09 +00:00
christos 26edf84a4b PR/36954: Roland Illig: don't eat backlash escapes in variable patterns.
Makes ${line%%\**} work.
2008-12-20 20:36:44 +00:00
dholland 7fb5a8c68e The field width passed for a %.*s printf format is supposed to be int, not
ptrdiff_t; on 64-bit platforms the latter will be too wide.
Adjust accordingly.
2008-10-16 17:58:29 +00:00
apb 91ce988bec Make /bin/sh use intmax_t (instead of int) for arithmetic in $((...)). 2007-03-25 06:29:26 +00:00