el_getc() for the WIDECHAR case, that is, the version in eln.c.
For a UTF-8 locale, it is broken in four ways:
1. If the character read is outside the ASCII range, the function
does an undefined cast from wchar_t to char. Even if wchar_t
is internally represented as UCS-4, that is wrong and dangerous
because characters beyond codepoint U+0255 get their high bits
truncated, meaning that perfectly valid printable Unicode
characters get mapped to arbitrary bytes, even the ASCII escape
character for some Unicode characters. But wchar_t need not
be implemented in terms of UCS-4, so the outcome of this function
is undefined for any and all input.
2. If insufficient space is available for the result, the function
fails to detect failure and returns garbage rather than -1 as
specified in the documentation.
3. The documentation says that errno will be set on failure, but
that doesn't happen either in the above case.
4. Even for ASCII characters, the results may be wrong if wchar_t
is not using UCS-4.
As we have seen before, "histedit.h" can never get rid of including
the <wchar.h> header because using the data types defined there is
deeply ingrained in the public interfaces of libedit.
Now POSIX unconditionally requires that <wchar.h> defines the type
wint_t. Consequently, it can be used unconditionally, no matter
whether WIDECHAR is active or not. Consequently, the #define Int
is pointless.
Note that removing it is not gratuitious churn. Auditing for
integer signedness problems is already hard when only fundamental
types like "int" and "unsigned" are involved. It gets very hard
when types come into the picture that have platform-dependent
signedness, like "char" and "wint_t". Adding yet another layer
on top, changing both the signedness and the width in a platform-
dependent way, makes auditing yet harder, which IMHO is really
dangerous. Note that while removing the #define, i already found
one bug caused by this excessive complication - in the function
re_putc() in refresh.c. If WIDECHAR was defined, it printed an
Int = wint_t value with %c. Fortunately, that bug only affects
debugging, not production. The fix is contained in the patch.
With WIDECHAR, this doesn't change anything. For the case without
WIDECHAR, i checked that none of the places wants to store values
that might not fit in wint_t.
This only changes internal interfaces; public ones remain unchanged.
Next step: Remove #ifdef'ing in read_char(), in the same style
as we did for setlocale(3) in el.c.
A few remarks are required to explain the choices made.
* On first sight, handling mbrtowc(3) seems a bit less trivial
than handling setlocale(3) because its prototype uses the data
type mbstate_t from <wchar.h>. However, it turns out that
"histedit.h" already includes <wchar.h> unconditionally (i don't
like headers including other headers, but that ship has sailed,
people are by now certainly used to the fact that including
"histedit.h" doesn't require including <wchar.h> before), and
"histedit.h" is of course included all over the place. So from
that perspective, there is no problem with using mbrtowc(3)
unconditionally ever for !WIDECHAR.
* However, <wchar.h> also defines the mbrtowc(3) prototype,
so we cannot just #define mbrtowc away, or including the header
will break. It would also be a bad idea to porovide a local
implementation of mbrtowc() and hope that it overrides the one
in libc. Besides, the required prototype is subtly different:
While mbrtowc(3) takes "wchar_t *" as its first argument, we
need a function that takes "Char *". So unfortunately, we have
to keep a ct_mbrtowc #define, at least until we can maybe get
rid of "Char *" in the more remote future.
* After getting rid of the #else clause in read_char(), we can
pull "return 1;" into the default: clause. After that, we can
get rid of the ugly "goto again_lastbyte;" and just "break;".
As a bonus, that also gets rid of the ugly CONSTCOND.
* While here, delete the unused ct_mbtowc() from chartype.h.
Return error code, not 0 (!), on bus_space_subregion failure.
In answer to `XXX error branch' comment: if nouveau_barobj_ctor
fails, then the caller will call nouveau_barobj_dtor too. So there's
no leak here.
Unlikely to fix any observed bugs with nouveau -- there's no error
branch in the Linux side here. But maybe it will catch some other
bug earlier.
queue 7 is not default, it is caused by the filter tables.
The fields are including queue number, not bitfields.
So MVXPE_DF_QUEUE_ALL (b111) means queue 7.
And also, pass all unicast addresses if it is promisc mode.
MVXPE_PXC_UPM is working in almost cases,
but this change is needed for some cases; bridging frames through inter units,
using products have consecutive MAC addresses.
If CHARSET_IS_UTF8 is not set, read_char() is broken in a large
number of ways:
1. The isascii(3) check can yield false positives. If a string in
an arbitrary encoding contains a byte in the range 0..127,
that does not at all imply that it forms a character all by
itself, and even less that it represents the same character
as in ASCII. Consequently, read_char() may return characters
the user never typed.
Even if the encoding is not state dependent, the assumption that
bytes in the range 0..127 represent ASCII characters is broken.
Consider UTF-16, for example.
2. The reverse problem can also occur. In an arbitrary encoding,
there is no guarantee that a character that can be represented
by ASCII is represented by a seven-bit byte, and even less by
the same byte as in ASCII.
Even for single-byte encodings, these assumptions are broken.
Consider the ISO 646 national variants, for example.
Consequently, the current code is insufficient to keep ASCII
characters working even for single-byte encodings.
3. The condition "++cbp != 1" can never trigger (because initially,
cbp is 0, and the code can only go back up via the final goto,
which has another cbp = 0 right before it) and it has no effect
(because cbp isn't used afterwards).
4. bytes = ct_mbtowc(cp, cbuf, cbp) is broken. If this returns -1,
the code assumes that is can just call mbtowc(3) again for later
input bytes. In some implementations, that may even be broken
for state-independent encodings, but trying again after mbtowc(3)
failure certainly produces completely erratic and meaningless
results in state-dependent encodings.
5. The assignment "*cp = (Char)(unsigned char)cbuf[0]" is
completely bogus. Even if the byte cbuf[0] represents a
character all by itself, which it usually will not, whether
or not the cast produces the desired result depends on the
internal representation of wchar_t in the C library, which
the application program can know nothing about. Even for ASCII
in the C/POSIX locale, an ASCII character other than '\0' ==
L'\0' == 0 need not have the same numeric value as a char and
as a wchar_t.
To summarize, this code only works if all of the following
conditions hold:
- The encoding is a single-byte encoding.
- ASCII is a subset of the encoding.
- The implementation of mbtowc(3) in the C library does not
require re-initialization after encoding errors.
- The implementation of wchar_t in the C library uses the
same numerical values as ASCII.
Otherwise, it silently produces wrong results.
The simplest way to fix this is to just use the same code as for
UTF-8 (right above). Of course, that causes functional changes
but that shouldn't matter since current behaviour is undefined.
The patch below provides the following improvements:
- It works for all stateless single-byte encodings, no matter
whether they are somehow related to ASCII, no matter how
mb[r]towc(3) are internally implemented, and no matter how
wchar_t is internally represented.
- Instead of producing unpredictable and definitely wrong
results for non-UTF-8 multibyte characters, it behaves in
a well-defined way: It aborts input processing, sets errno,
and returns failure.
Note that short of providing full support for arbitrary locales,
it is impossible to do better. We cannot know whether a given
unsupported locale is state-dependent, and for a state-dependent
locale, it makes no sense to retry parsing after an encoding
error, so the best we can do is abort processing for *any*
unsupported multi-byte character.
- Note that single-byte characters in arbitrary state-independent
locales still work, even in locales that may potentially also
contain multibyte characters, as long as those don't occur in
input. I'm not sure whether any such locales exist in practice...
Tested with UTF-8 and C/POSIX on OpenBSD. Also tested that in the
C/POSIX locale, non-ASCII bytes get through unmangled. You may
wish to test with ISO-LATIN on NetBSD if NetBSD supports that.
----
Also use a constant for meta to avoid warnings.
Not optimal though - for some reason the framebuffer's endianness in 32bit
colour is wrong and I have no idea (yet) how to change that, so many apps
using xrender will crash.
list of changes, they can be found at
http://pcc.ludd.ltu.se/fisheye/changelog/pcc
Along with numerous bug fixes, the highlights might be a rewrite
of the CPP parser, updated backends for arm, pdp11, m68k, vax and
mips along with new backend for 8086. PCC now builds itself as a
2-pass compiler. There have been fixes for use with musl, C11
support added and use of UTF8 internally. PE/COFF target was fixed,
and Minix target added.