NetBSD/bin/sh/syntax.h

88 lines
3.6 KiB
C
Raw Normal View History

Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...
2017-08-21 16:20:49 +03:00
/* $NetBSD: syntax.h,v 1.9 2017/08/21 13:20:49 kre Exp $ */
/*-
* Copyright (c) 1991, 1993
* The Regents of the University of California. All rights reserved.
*
* This code is derived from software contributed to Berkeley by
* Kenneth Almquist.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <sys/cdefs.h>
#include <ctype.h>
/* Syntax classes */
#define CWORD 0 /* character is nothing special */
#define CNL 1 /* newline character */
#define CBACK 2 /* a backslash character */
#define CSQUOTE 3 /* single quote */
#define CDQUOTE 4 /* double quote */
#define CBQUOTE 5 /* backwards single quote */
#define CVAR 6 /* a dollar sign */
#define CENDVAR 7 /* a '}' character */
#define CLP 8 /* a left paren in arithmetic */
#define CRP 9 /* a right paren in arithmetic */
#define CEOF 10 /* end of file */
#define CCTL 11 /* like CWORD, except it must be escaped */
#define CSPCL 12 /* these terminate a word */
Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...
2017-08-21 16:20:49 +03:00
#define CSBACK 13 /* a backslash in a single quote syntax */
/* Syntax classes for is_ functions */
#define ISDIGIT 01 /* a digit */
#define ISUPPER 02 /* an upper case letter */
#define ISLOWER 04 /* a lower case letter */
#define ISUNDER 010 /* an underscore */
#define ISSPECL 020 /* the name of a special parameter */
Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...
2017-08-21 16:20:49 +03:00
#define ISSPACE 040 /* a white space character */
#define PEOF (CHAR_MIN - 1)
#define SYNBASE (-PEOF)
#define BASESYNTAX (basesyntax + SYNBASE)
#define DQSYNTAX (dqsyntax + SYNBASE)
#define SQSYNTAX (sqsyntax + SYNBASE)
#define ARISYNTAX (arisyntax + SYNBASE)
/* These defines assume that the digits are contiguous (which is guaranteed) */
#define is_digit(c) ((unsigned)((c) - '0') <= 9)
Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...
2017-08-21 16:20:49 +03:00
#define sh_ctype(c) (is_type+SYNBASE)[(int)(c)]
#define is_upper(c) (sh_ctype(c) & ISUPPER)
#define is_lower(c) (sh_ctype(c) & ISLOWER)
2016-03-16 18:48:01 +03:00
#define is_alpha(c) (sh_ctype(c) & (ISUPPER|ISLOWER))
#define is_name(c) (sh_ctype(c) & (ISUPPER|ISLOWER|ISUNDER))
#define is_in_name(c) (sh_ctype(c) & (ISUPPER|ISLOWER|ISUNDER|ISDIGIT))
Add support for $'...' quoting (based upon C "..." strings, with \ expansions.) Implementation largely obtained from FreeBSD, with adaptations to meet the needs and style of this sh, some updates to agree with the current POSIX spec, and a few other minor changes. The POSIX spec for this ( http://austingroupbugs.net/view.php?id=249 ) [see note 2809 for the current proposed text] is yet to be approved, so might change. It currently leaves several aspects as unspecified, this implementation handles those as: Where more than 2 hex digits follow \x this implementation processes the first two as hex, the following characters are processed as if the \x sequence was not present. The value obtained from a \nnn octal sequence is truncated to the low 8 bits (if a bigger value is written, eg: \456.) Invalid escape sequences are errors. Invalid \u (or \U) code points are errors if known to be invalid, otherwise can generate a '?' character. Where any escape sequence generates nul ('\0') that char, and the rest of the $'...' string is discarded, but anything remaining in the word is processed, ie: aaa$'bbb\0ccc'ddd produces the same as aaa'bbb'ddd. Differences from FreeBSD: FreeBSD allows only exactly 4 or 8 hex digits for \u and \U (as does C, but the current sh proposal differs.) reeBSD also continues consuming as many hex digits as exist after \x (permitted by the spec, but insane), and reject \u0000 as invalid). Some of this is possibly because that their implementation is based upon an earlier proposal, perhaps note 590 - though that has been updated several times. Differences from the current POSIX proposal: We currently always generate UTF-8 for the \u & \U escapes. We should generate the equivalent character from the current locale's character set (and UTF8 only if that is what the current locale uses.) If anyone would like to correct that, go ahead. We (and FreeBSD) generate (X & 0x1F) for \cX escapes where we should generate the appropriate control character (SOH for \cA for example) with whatever value that has in the current character set. Apart from EBCDIC, which we do not support, I've never seen a case where they differ, so ...
2017-08-21 16:20:49 +03:00
#define is_special(c) (sh_ctype(c) & (ISSPECL|ISDIGIT))
#define is_space(c) (sh_ctype(c) & ISSPACE)
#define digit_val(c) ((c) - '0')
extern const char basesyntax[];
extern const char dqsyntax[];
extern const char sqsyntax[];
extern const char arisyntax[];
extern const char is_type[];