NetBSD

Author	SHA1	Message	Date
kre	256d645df3	Finish the fixes from Feb 4 for handling of random data that matches the internal CTL* chars. The earlier fixes handled CTL* char values in var expansions, but not in various other places they can occur (positional parameters, $@ $* -- even potentially $0 and ~ expansions, as well as byte strings generated from a \u in a $'' string). These should all be correctly handled now. There is a new ISCTL() macro to make the test, rather than using the old BASESYNTAX[c]==CCTL form (which us still a viable alternative) as the new way allows compiler optimisations, and less mem references, so it should be smaller and faster. Also, be sure in all cases to remove any CTLESC (or other) CTL* chars from all strings before they are made available for any external use (there was one case missed - which didn't matter when we weren't bothering to escape the CTL* chars at all.) XXX pullup-8 (will need to be via a patch) along with the Feb 4 fixes.	2019-02-27 04:10:56 +00:00
kre	021ba5091c	Revamp aliases - as dumb an idea as they are, if we're going to have them, they should work as documented, not cause core dumps, reference after free, incorrect replacements, failing to implement alias after alias, ... The big comment that ended: This is a good idea ------- *NOT* and the hack it was describing are gone. Note that most of this was from original CVS version 1.1 code (ie: came from the original import, even before 4.4-Lite was merged. That is, May 1994. And no-one in 24.5 years noticed (or at least complained about) all the bugs (or at least, most of them)). With these changes, aliases ought to work (if you can call it that) as they are expected to by POSIX. Now if only we could get POSIX to delete them (or make them optional)... Changes partly inspired by similar changes made by FreeBSD, (as was the previous change to alias.c, forgot ack in commit log for that one, apologies) but done a little differently, and perhaps with a slightly better outcome.	2018-12-03 06:40:26 +00:00
kre	df073671e8	Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed. Most of this has been OK for a long while, but \ needs hiding as well, which complicates things, as \ cannot simply be hidden in the syntax tables as one of the group of random special characters. This was fixed earlier for simple variable expansions, but every variety has its own code path ($var uses different code than $n which is different than $(...), which is different again from ~ expansions, and also from what $'...' produces). This could be fixed by moving them all to a common code path, but that's harder than it seems. The form in which the data is made available differs, so one common routine would need a whole bunch of different "get the next char or indicate end" methods - probably via passing in an accessor function. That's all a lot of churn, and would probably slow the shell. Instead, just make macros for doing the standard tests, and use those instead of open coding (differently) each time. This way some of the code paths don't end up forgetting to handle '\' (which is different than all the others). This removes one optimisation ... when no escaping is needed (like just $var (unquoted) where magic chars (think '*') in the value are intended to remain magic), the code avoided doing two tests for each char ("do we need escapes" and "is this char one that needs escaping") by choosing two different syntax tables (choice made outside the loop) - one of which never returns the magic "needs escaping" result, and the other does when appropriate, and then just avoiding the "do we need escapes" test for each character processed. Then when '\' was fixed, there needed to be another test for it, as it cannot (for other reasons) be the same as all the others for which "this char need escaping" is true. So that added a 2nd test for each char... Not all the code paths were updated. Hence the bugs... nb: this is all rarely seen in the wild, so it is no big surprised that no-one ever noticed. Now the "use two different syntax tables" is gone (the two returned the same for '\' which is why '\' needed special processing) - and in order to avoid two tests for each char (plus the \ test) we duplicate the loops, one of which tests each char to see if it needs an escape, the 2nd just copies them. This should be faster in the "no escapes" code path (though that is not the point) and perhaps also in the "escapes needed" path (no indirect reference to the syntax table - though that would probably be in a register) but makes the code slightly bigger. For /bin/sh the text segment (on amd64) has grown by 48 bytes. But it still uses the same number of 512 byte pages (and hence also any bigger page size). The resulting file size (/bin/sh) is identical before and after. So is /rescue/sh (or /rescue/anything-else).	2018-11-18 17:23:37 +00:00
kre	5f92382c9a	Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...	2017-08-21 13:20:49 +00:00
kre	727a69dc1d	A better LINENO implementation. This version deletes (well, #if 0's out) the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)). (Code to invert the LINENO hack when required, like when de-compiling the execution tree to provide the "jobs" command strings, is still included, that can be deleted when the LINENO hack is completely removed - look for refs to VSLINENO throughout the code. The var funclinno in parser.c can also be removed, it is used only for the LINENO hack.) This version produces accurate results: $((LINENO)) was made as accurate as the LINENO hack made ${LINENO} which is very good. That's why the LINENO hack is not yet completely removed, so it can be easily re-enabled. If you can tell the difference when it is in use, or not in use, then something has broken (or I managed to miss a case somewhere.) The way that LINENO works is documented in its own (new) section in the man page, so nothing more about that, or the new options, etc, here. This version introduces the possibility of having a "reference" function associated with a variable, which gets called whenever the value of the variable is required (that's what implements LINENO). There is just one function pointer however, so any particular variable gets at most one of the set function (as used for PATH, etc) or the reference function. The VFUNCREF bit in the var flags indicates which func the variable in question uses (if any - the func ptr, as before, can be NULL). I would not call the results of this perfect yet, but it is close.	2017-06-07 05:08:32 +00:00
kre	ab36694a85	DEBUG mode shell update (changes nothing for shells which are not compiled for DEBUG.) Add debug builtin command, and corresponding -D command line option. As usual, for DEBUG related stuff, read the source for info, that's all there is about this. This completes the infrastructure changes for the updated DEBUG TRACE mechanism, so now converting the rest of the shell's internal tracing can happen as desired - piecemeal.	2017-05-15 20:00:36 +00:00
christos	3b6be3a09c	parenthesize for safety.	2016-03-16 19:02:26 +00:00
christos	6f9ac0be1e	Remove wrong unsigned cast, index can be negative. Cast char to int so that gcc does not warn. Probably better to do the offset at runtime, but that would cost more.	2016-03-16 17:01:39 +00:00
christos	58a5df4203	factor out common code in macro.	2016-03-16 15:48:01 +00:00
christos	5f0a664efb	Revert (kind of) the change in 1.12 of the ancient mksyntax.sh script (undoing the effect of that commit on syntax.h when it was being dynamically generated) from 1996. This means that the shell parser is now locale independent, so scripts that work anywhere will work consistently everywhere. Inspired by a similar change in FreeBSD's sh (from 2010) - the original change in the other direction came from FreeBSD as well.... Note that this does not in any way add any kind of support for locales to sh (which is a whole different problem.) (from kre)	2016-03-16 15:45:40 +00:00
dsl	f0177aeba6	Put a syntax.c under CVS instead of building if with the mksyntax program. Kill mksyntax.c - no longer possible to get the 'wrong sort of chars'. /bin/sh now has no helper binaries. syntax.c uses C99 initialisers, run time initialisation could be used for systems where the compiler doesn't support them. I've used some #defines to help make this possible - but writing the code starts making it rather messy.	2004-01-17 17:38:12 +00:00
dsl	9cd22030d1	Put syntax.h under CVS instead of having it generated by mksyntax. Use CHAR_MIN (from limits.h) to determine whether target char are signed or unsigned - the syntax tables will not be indexed properly. Rip out all the stuff from mksyntax.c that wrote syntax.h. syntax.c can stiff be generated incorrectly...	2004-01-17 15:40:09 +00:00

12 Commits