NetBSD/bin/sh/expand.h

/*	$NetBSD: expand.h,v 1.26 2021/11/22 05:17:43 kre Exp $	*/

/*-
 * Copyright (c) 1991, 1993
 *	The Regents of the University of California.  All rights reserved.
 *
 * This code is derived from software contributed to Berkeley by
 * Kenneth Almquist.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 *
 *	@(#)expand.h	8.2 (Berkeley) 5/4/95
 */

#include <inttypes.h>

struct strlist {
	struct strlist *next;
	char *text;
};


struct arglist {
	struct strlist *list;
	struct strlist **lastp;
};

/*
 * expandarg() flags
 */
#define EXP_SPLIT	0x1	/* perform word splitting */
#define EXP_TILDE	0x2	/* do normal tilde expansion */
#define EXP_VARTILDE	0x4	/* expand tildes in an assignment */
#define EXP_REDIR	0x8	/* file glob for a redirection (1 match only) */
#define EXP_CASE	0x10	/* keeps quotes around for CASE pattern */
#define EXP_IFS_SPLIT	0x20	/* need to record arguments for ifs breakup */
#define EXP_IN_QUOTES	0x40	/* don't set EXP_IFS_SPLIT again */
#define EXP_GLOB	0x80	/* perform filename globbing */
#define EXP_NL		0x100	/* keep CRTNONL in output */

#define EXP_FULL	(EXP_SPLIT | EXP_GLOB)
#define EXP_QNEEDED	(EXP_GLOB | EXP_CASE | EXP_REDIR)

union node;

char *expandhere(union node *);
void expandarg(union node *, struct arglist *, int);
int rmescapes(char *);
int casematch(union node *, char *);
PR bin/53550 Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!) This is actually a relatively minor change, mostly to timimg (to just when things happen). Now here docs are expanded at the same time the "filename" word in a redirect is expanded, rather than later when the heredoc was being sent to its process. This actually makes things more consistent - but does break one of the ATF tests which was testing that we were (effectively) internally inconsistent in this area. Not all shells agree on the context in which redirection expansions should happen, some make any side effects visible to the parent shell (the majority do) others do the redirection expansions in a subshell so any side effcts are lost. We used to have a foot in each camp, with the majority for everything but here docs, and the minority for here docs. Now we're all the way with LBJ ... (or something like that). 2021-11-22 08:17:43 +03:00			`/* $NetBSD: expand.h,v 1.26 2021/11/22 05:17:43 kre Exp $ */`
convert to new RCS id conventions. 1995-03-21 12:01:59 +03:00
initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`/*-`
sync with 4.4lite 1994-05-11 21:09:42 +04:00			`* Copyright (c) 1991, 1993`
			`* The Regents of the University of California. All rights reserved.`
initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`*`
			`* This code is derived from software contributed to Berkeley by`
			`* Kenneth Almquist.`
			`*`
			`* Redistribution and use in source and binary forms, with or without`
			`* modification, are permitted provided that the following conditions`
			`* are met:`
			`* 1. Redistributions of source code must retain the above copyright`
			`* notice, this list of conditions and the following disclaimer.`
			`* 2. Redistributions in binary form must reproduce the above copyright`
			`* notice, this list of conditions and the following disclaimer in the`
			`* documentation and/or other materials provided with the distribution.`
Move UCB-licensed code from 4-clause to 3-clause licence. Patches provided by Joel Baker in PR 22249, verified by myself. 2003-08-07 13:05:01 +04:00			`* 3. Neither the name of the University nor the names of its contributors`
initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`* may be used to endorse or promote products derived from this software`
			`* without specific prior written permission.`
			`*`
			* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
			`* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE`
			`* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE`
			`* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE`
			`* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL`
			`* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS`
			`* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)`
			`* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT`
			`* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY`
			`* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF`
			`* SUCH DAMAGE.`
			`*`
Merge in my changes from vangogh, and fix the x=`false`; echo $? == 0 bug. 1995-05-12 01:28:33 +04:00			`* @(#)expand.h 8.2 (Berkeley) 5/4/95`
initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`*/`

Make /bin/sh use intmax_t (instead of int) for arithmetic in $((...)). 2007-03-25 10:29:26 +04:00			`#include <inttypes.h>`

initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`struct strlist {`
			`struct strlist *next;`
			`char *text;`
			`};`


			`struct arglist {`
			`struct strlist *list;`
			`struct strlist **lastp;`
			`};`

sync with 4.4lite 1994-05-11 21:09:42 +04:00			`/*`
			`* expandarg() flags`
			`*/`
Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented differently...) In particular ${01} is now $1 not $0 (for ${0any-digits}) ${4294967297} is most probably now "" (unless you have a very large number of params) it is no longer an alias for $1 (4294967297 & 0xFFFFFFFF) == 1 $(( expr $(( more )) stuff )) is no longer the same as $(( expr (( more )) stuff )) which was sometimes OK, as in: $(( 3 + $(( 2 - 1 )) * 3 )) but not always as in: $(( 1$((1 + 1))1 )) which should be 121, but was an arith syntax error as 1((1 + 1))1 is meaningless. Probably some more. This also sprinkles a little const, splits a big func that had 2 (kind of unrelated) purposes into two simpler ones, and avoids some (semi-dubious) modifications (and restores) in the input string to insert \0's when they were needed. 2017-06-03 13:31:16 +03:00			`#define EXP_SPLIT 0x1 /* perform word splitting */`
sync with 4.4lite 1994-05-11 21:09:42 +04:00			`#define EXP_TILDE 0x2 /* do normal tilde expansion */`
Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented differently...) In particular ${01} is now $1 not $0 (for ${0any-digits}) ${4294967297} is most probably now "" (unless you have a very large number of params) it is no longer an alias for $1 (4294967297 & 0xFFFFFFFF) == 1 $(( expr $(( more )) stuff )) is no longer the same as $(( expr (( more )) stuff )) which was sometimes OK, as in: $(( 3 + $(( 2 - 1 )) * 3 )) but not always as in: $(( 1$((1 + 1))1 )) which should be 121, but was an arith syntax error as 1((1 + 1))1 is meaningless. Probably some more. This also sprinkles a little const, splits a big func that had 2 (kind of unrelated) purposes into two simpler ones, and avoids some (semi-dubious) modifications (and restores) in the input string to insert \0's when they were needed. 2017-06-03 13:31:16 +03:00			`#define EXP_VARTILDE 0x4 /* expand tildes in an assignment */`
			`#define EXP_REDIR 0x8 /* file glob for a redirection (1 match only) */`
sync with 4.4lite 1994-05-11 21:09:42 +04:00			`#define EXP_CASE 0x10 /* keeps quotes around for CASE pattern */`
No functional changes (intended). Rename some variables, add some comments, and restructure a little. In preparation for fixing "set ${x-a b c}" and friends. 2004-06-26 18:09:58 +04:00			`#define EXP_IFS_SPLIT 0x20 /* need to record arguments for ifs breakup */`
Fix the expansion of "$(foo-$bar}" so that IFS isn't applied when expanding $bar. Noted by Greg Troxel on tech-userlevel running some 'git' tests. Should fix PR bin/47361 2012-12-23 00:15:22 +04:00			`#define EXP_IN_QUOTES 0x40 /* don't set EXP_IFS_SPLIT again */`
Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented differently...) In particular ${01} is now $1 not $0 (for ${0any-digits}) ${4294967297} is most probably now "" (unless you have a very large number of params) it is no longer an alias for $1 (4294967297 & 0xFFFFFFFF) == 1 $(( expr $(( more )) stuff )) is no longer the same as $(( expr (( more )) stuff )) which was sometimes OK, as in: $(( 3 + $(( 2 - 1 )) * 3 )) but not always as in: $(( 1$((1 + 1))1 )) which should be 121, but was an arith syntax error as 1((1 + 1))1 is meaningless. Probably some more. This also sprinkles a little const, splits a big func that had 2 (kind of unrelated) purposes into two simpler ones, and avoids some (semi-dubious) modifications (and restores) in the input string to insert \0's when they were needed. 2017-06-03 13:31:16 +03:00			`#define EXP_GLOB 0x80 /* perform filename globbing */`
A better LINENO implementation. This version deletes (well, #if 0's out) the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)). (Code to invert the LINENO hack when required, like when de-compiling the execution tree to provide the "jobs" command strings, is still included, that can be deleted when the LINENO hack is completely removed - look for refs to VSLINENO throughout the code. The var funclinno in parser.c can also be removed, it is used only for the LINENO hack.) This version produces accurate results: $((LINENO)) was made as accurate as the LINENO hack made ${LINENO} which is very good. That's why the LINENO hack is not yet completely removed, so it can be easily re-enabled. If you can tell the difference when it is in use, or not in use, then something has broken (or I managed to miss a case somewhere.) The way that LINENO works is documented in its own (new) section in the man page, so nothing more about that, or the new options, etc, here. This version introduces the possibility of having a "reference" function associated with a variable, which gets called whenever the value of the variable is required (that's what implements LINENO). There is just one function pointer however, so any particular variable gets at most one of the set function (as used for PATH, etc) or the reference function. The VFUNCREF bit in the var flags indicates which func the variable in question uses (if any - the func ptr, as before, can be NULL). I would not call the results of this perfect yet, but it is close. 2017-06-07 08:08:32 +03:00			`#define EXP_NL 0x100 /* keep CRTNONL in output */`
sync with 4.4lite 1994-05-11 21:09:42 +04:00
Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented differently...) In particular ${01} is now $1 not $0 (for ${0any-digits}) ${4294967297} is most probably now "" (unless you have a very large number of params) it is no longer an alias for $1 (4294967297 & 0xFFFFFFFF) == 1 $(( expr $(( more )) stuff )) is no longer the same as $(( expr (( more )) stuff )) which was sometimes OK, as in: $(( 3 + $(( 2 - 1 )) * 3 )) but not always as in: $(( 1$((1 + 1))1 )) which should be 121, but was an arith syntax error as 1((1 + 1))1 is meaningless. Probably some more. This also sprinkles a little const, splits a big func that had 2 (kind of unrelated) purposes into two simpler ones, and avoids some (semi-dubious) modifications (and restores) in the input string to insert \0's when they were needed. 2017-06-03 13:31:16 +03:00			`#define EXP_FULL (EXP_SPLIT \| EXP_GLOB)`
Rationalise (slightly) the way that expansions are processed to hide meta-characters in the result when the expansion was in (double) quotes, and so should not be further processed. Most of this has been OK for a long while, but \ needs hiding as well, which complicates things, as \ cannot simply be hidden in the syntax tables as one of the group of random special characters. This was fixed earlier for simple variable expansions, but every variety has its own code path ($var uses different code than $n which is different than $(...), which is different again from ~ expansions, and also from what $'...' produces). This could be fixed by moving them all to a common code path, but that's harder than it seems. The form in which the data is made available differs, so one common routine would need a whole bunch of different "get the next char or indicate end" methods - probably via passing in an accessor function. That's all a lot of churn, and would probably slow the shell. Instead, just make macros for doing the standard tests, and use those instead of open coding (differently) each time. This way some of the code paths don't end up forgetting to handle '\' (which is different than all the others). This removes one optimisation ... when no escaping is needed (like just $var (unquoted) where magic chars (think '*') in the value are intended to remain magic), the code avoided doing two tests for each char ("do we need escapes" and "is this char one that needs escaping") by choosing two different syntax tables (choice made outside the loop) - one of which never returns the magic "needs escaping" result, and the other does when appropriate, and then just avoiding the "do we need escapes" test for each character processed. Then when '\' was fixed, there needed to be another test for it, as it cannot (for other reasons) be the same as all the others for which "this char need escaping" is true. So that added a 2nd test for each char... Not all the code paths were updated. Hence the bugs... nb: this is all rarely seen in the wild, so it is no big surprised that no-one ever noticed. Now the "use two different syntax tables" is gone (the two returned the same for '\' which is why '\' needed special processing) - and in order to avoid two tests for each char (plus the \ test) we duplicate the loops, one of which tests each char to see if it needs an escape, the 2nd just copies them. This should be faster in the "no escapes" code path (though that is not the point) and perhaps also in the "escapes needed" path (no indirect reference to the syntax table - though that would probably be in a register) but makes the code slightly bigger. For /bin/sh the text segment (on amd64) has grown by 48 bytes. But it still uses the same number of 512 byte pages (and hence also any bigger page size). The resulting file size (/bin/sh) is identical before and after. So is /rescue/sh (or /rescue/anything-else). 2018-11-18 20:23:37 +03:00			`#define EXP_QNEEDED (EXP_GLOB \| EXP_CASE \| EXP_REDIR)`
sync with 4.4lite 1994-05-11 21:09:42 +04:00
initial import of 386bsd-0.1 sources 1993-03-21 12:45:37 +03:00			`union node;`
Fixes to shell expand (that is, $ stuff) from FreeBSD (implemented differently...) In particular ${01} is now $1 not $0 (for ${0any-digits}) ${4294967297} is most probably now "" (unless you have a very large number of params) it is no longer an alias for $1 (4294967297 & 0xFFFFFFFF) == 1 $(( expr $(( more )) stuff )) is no longer the same as $(( expr (( more )) stuff )) which was sometimes OK, as in: $(( 3 + $(( 2 - 1 )) * 3 )) but not always as in: $(( 1$((1 + 1))1 )) which should be 121, but was an arith syntax error as 1((1 + 1))1 is meaningless. Probably some more. This also sprinkles a little const, splits a big func that had 2 (kind of unrelated) purposes into two simpler ones, and avoids some (semi-dubious) modifications (and restores) in the input string to insert \0's when they were needed. 2017-06-03 13:31:16 +03:00
PR bin/53550 Here we go again... One more time to redo how here docs are processed (it has been a few years since the last time!) This is actually a relatively minor change, mostly to timimg (to just when things happen). Now here docs are expanded at the same time the "filename" word in a redirect is expanded, rather than later when the heredoc was being sent to its process. This actually makes things more consistent - but does break one of the ATF tests which was testing that we were (effectively) internally inconsistent in this area. Not all shells agree on the context in which redirection expansions should happen, some make any side effects visible to the parent shell (the majority do) others do the redirection expansions in a subshell so any side effcts are lost. We used to have a foot in each camp, with the majority for everything but here docs, and the minority for here docs. Now we're all the way with LBJ ... (or something like that). 2021-11-22 08:17:43 +03:00			`char expandhere(union node );`
Fixes from David Laight: - ansification - format of output of jobs command (etc) - job identiers %+, %- etc - $? and $(...) - correct quoting of output of set, export -p and readonly -p - differentiation between nornal and 'posix special' builtins - correct behaviour (posix) for errors on builtins and special builtins - builtin printf and kill - set -o debug (if compiled with DEBUG) - cd src obj (as ksh - too useful to do without) - unset -e name, remove non-readonly variable from export list. (so I could unset -e PS1 before running the test shell...) 2002-11-25 01:35:38 +03:00			`void expandarg(union node , struct arglist , int);`
When expanding a here-doc (NXHERE - the type with an unquoted end delim) the output will not be further processed (at all) so there is no need to escape magic chars in the output, and doing so leaves stray CTLESC chars in the here doc text. Not good. So don't do that... To save a strlen() of the result, to determine the size of the here doc, make rmescapes() return the length of the resulting string (this isn't needed for other uses, so didn't happen previously). Reported on current-users@ (2020-02-06) by Jun Ebihara XXX pullup -9 2020-02-13 08:19:05 +03:00			`int rmescapes(char *);`
Fixes from David Laight: - ansification - format of output of jobs command (etc) - job identiers %+, %- etc - $? and $(...) - correct quoting of output of set, export -p and readonly -p - differentiation between nornal and 'posix special' builtins - correct behaviour (posix) for errors on builtins and special builtins - builtin printf and kill - set -o debug (if compiled with DEBUG) - cd src obj (as ksh - too useful to do without) - unset -e name, remove non-readonly variable from export list. (so I could unset -e PS1 before running the test shell...) 2002-11-25 01:35:38 +03:00			`int casematch(union node , char );`