NetBSD

Go to file

christos 0e1288d7c8 From Ingo Schwarze: If CHARSET_IS_UTF8 is not set, read_char() is broken in a large number of ways: 1. The isascii(3) check can yield false positives. If a string in an arbitrary encoding contains a byte in the range 0..127, that does not at all imply that it forms a character all by itself, and even less that it represents the same character as in ASCII. Consequently, read_char() may return characters the user never typed. Even if the encoding is not state dependent, the assumption that bytes in the range 0..127 represent ASCII characters is broken. Consider UTF-16, for example. 2. The reverse problem can also occur. In an arbitrary encoding, there is no guarantee that a character that can be represented by ASCII is represented by a seven-bit byte, and even less by the same byte as in ASCII. Even for single-byte encodings, these assumptions are broken. Consider the ISO 646 national variants, for example. Consequently, the current code is insufficient to keep ASCII characters working even for single-byte encodings. 3. The condition "++cbp != 1" can never trigger (because initially, cbp is 0, and the code can only go back up via the final goto, which has another cbp = 0 right before it) and it has no effect (because cbp isn't used afterwards). 4. bytes = ct_mbtowc(cp, cbuf, cbp) is broken. If this returns -1, the code assumes that is can just call mbtowc(3) again for later input bytes. In some implementations, that may even be broken for state-independent encodings, but trying again after mbtowc(3) failure certainly produces completely erratic and meaningless results in state-dependent encodings. 5. The assignment "cp = (Char)(unsigned char)cbuf[0]" is completely bogus. Even if the byte cbuf[0] represents a character all by itself, which it usually will not, whether or not the cast produces the desired result depends on the internal representation of wchar_t in the C library, which the application program can know nothing about. Even for ASCII in the C/POSIX locale, an ASCII character other than '\0' == L'\0' == 0 need not have the same numeric value as a char and as a wchar_t. To summarize, this code only works if all of the following conditions hold: - The encoding is a single-byte encoding. - ASCII is a subset of the encoding. - The implementation of mbtowc(3) in the C library does not require re-initialization after encoding errors. - The implementation of wchar_t in the C library uses the same numerical values as ASCII. Otherwise, it silently produces wrong results. The simplest way to fix this is to just use the same code as for UTF-8 (right above). Of course, that causes functional changes but that shouldn't matter since current behaviour is undefined. The patch below provides the following improvements: - It works for all stateless single-byte encodings, no matter whether they are somehow related to ASCII, no matter how mb[r]towc(3) are internally implemented, and no matter how wchar_t is internally represented. - Instead of producing unpredictable and definitely wrong results for non-UTF-8 multibyte characters, it behaves in a well-defined way: It aborts input processing, sets errno, and returns failure. Note that short of providing full support for arbitrary locales, it is impossible to do better. We cannot know whether a given unsupported locale is state-dependent, and for a state-dependent locale, it makes no sense to retry parsing after an encoding error, so the best we can do is abort processing for any* unsupported multi-byte character. - Note that single-byte characters in arbitrary state-independent locales still work, even in locales that may potentially also contain multibyte characters, as long as those don't occur in input. I'm not sure whether any such locales exist in practice... Tested with UTF-8 and C/POSIX on OpenBSD. Also tested that in the C/POSIX locale, non-ASCII bytes get through unmangled. You may wish to test with ISO-LATIN on NetBSD if NetBSD supports that. ---- Also use a constant for meta to avoid warnings.		2016-02-12 15:11:09 +00:00
bin	PR/50747: David Binderman: check bounds before dereference.	2016-02-03 05:26:16 +00:00
common	whitespace	2016-02-08 05:27:24 +00:00
compat	remove the xfree86 reachover makefiles and the vast majority of	2015-07-23 08:03:24 +00:00
crypto	Fix signing of in-memory data with SSH keys	2016-02-07 05:03:36 +00:00
dist/pf	Fix obviously broken condition.	2015-08-28 12:17:41 +00:00
distrib	add pcpp binary, now in pcc-20160208. Also, p++ in debug set	2016-02-09 20:42:44 +00:00
doc	texinfo-6.1 and grep-2.23 out.	2016-02-11 13:36:00 +00:00
etc	Drop almost unnecessary devices for floppy to shrink sysinst.fs.	2016-01-29 18:03:16 +00:00
external	update build machinery for pcc-20160208	2016-02-09 20:40:45 +00:00
extsrc
games	PR/50411: Rin Okuyama: fix two bugs:	2015-11-06 19:53:37 +00:00
gnu	has moved to external/gpl3	2016-01-16 18:41:12 +00:00
include	disable dso protected to work around binutils bug	2016-01-29 15:18:33 +00:00
lib	From Ingo Schwarze:	2016-02-12 15:11:09 +00:00
libexec	Actually, descsz should not contain the padding. The note still needs to	2016-02-09 10:20:03 +00:00
regress
rescue
sbin	fix usage message	2016-02-06 10:35:58 +00:00
share	use pcpp front end rather than libexec/cpp directly, since commandline	2016-02-09 20:44:26 +00:00
sys	Fix the bitmask of MVXPE_PMACC0_FRAMESIZELIMIT. It did no harm.	2016-02-12 09:24:15 +00:00
tests	Add tests for a gateway not on the local subnet	2016-01-29 04:15:46 +00:00
tools	silent when we don't have -ldl	2016-02-01 14:18:16 +00:00
usr.bin	use sizeof() and array notation.	2016-02-06 21:23:09 +00:00
usr.sbin	Document file format better. From Travis Paul and Matthew Bauer.	2016-02-09 14:14:02 +00:00
build.sh	Make evbarm64 (little endian) the default for aarch64.	2015-06-27 06:00:28 +00:00
BUILDING	Document MKREPRO_TIMESTAMP.	2016-01-29 13:51:13 +00:00
Makefile	fix direct reference to texinfo, bleh	2016-01-14 02:51:25 +00:00
Makefile.inc
UPDATING	Note that update builds are broken if MKDTRACE got enabled for your	2016-01-25 09:24:29 +00:00