Go to file
christos 6b42622b31 UTF-8 fixes from Ingo Schwarze:
1. Assume that errno is non-zero when entering read_char()
    and that read(2) returns 0 (indicating end of file).
    Then, the code will clear errno before returning.
    (Obviously, the statement "errno = 0" is almost always
     a bug unless there is save_errno = errno right before it
     and the previous value is properly restored later,
     in all reachable code paths.)

 2. When encountering an invalid byte sequence, the code discards
    all following bytes until MB_LEN_MAX overflows; consider, for
    example, 0xc2 immediately followed by a few valid ASCII bytes.
    Three of those ASCII bytes will be discarded.

 3. On a POSIX system, EILSEQ will always be set after reading a
    valid (yes, valid, not invalid!) UTF-8 character.  The reason
    is that mbtowc(3) will first be called with a length limit
    (third argument) of 1, which will fail, return -1, and - on
    a POSIX system - set errno to EILSEQ.
    This third bug is mitigated a bit because i couldn't find any
    system that actually conforms to POSIX in this respect:  None
    of OpenBSD, NetBSD, FreeBSD, Solaris 11, and glibc set errno
    when an incomplete character is passed to mbtowc(3), even though
    that is required by POSIX.
    Anyway, that mbtowc(3) bug will be fixed at least in OpenBSD
    after release unlock, so it would be good to fix this bug in
    libedit before fixing the bug in mbtowc(3).

How can these three bugs be fixed?

 1. As far as i understand it, the intention of the bogus errno = 0
    is to undo the effects of failing system calls in el_wset(),
    sig_set(), and read__fixio() if the subsequent read(2) indicates
    end of file.  So, restoring errno has to be moved right after
    read__fixio().  Of course, neither 0 nor e is the right value
    to restore: 0 is wrong if errno happened to be set on entry, e
    would be wrong because if one read(2) fails but a second attempt
    succeeds after read__fixio(), errno should not be touched.  So,
    the errno to be restored in this case has to be saved before
    calling read(2) for the first time.

 2. Solving the second issue requires distinguishing invalid and
    incomplete characters, but that is impossible with the function
    mbtowc(3) because it returns -1 in both cases and sets errno
    to EILSEQ in both cases (once properly implemented).

    It is vital that each input character is processed right away.
    It is not acceptable to wait for the next input character before
    processing the previous one because this is an interactive
    library, not a batch system.  Consequently, the only situation
    where it is acceptable to wait for the next byte without first
    processing the previous one(s) is when the previous one(s) form
    an incomplete sequence that can be continued to form a valid
    character.

    Consequently, short of reimplementing a full UTF-8 state machine
    by hand, the only correct way forward is to use mbrtowc(3).
    Even then, care is needed to always have the state object
    properly initialized before using it, and to not discard a valid
    ASCII or UTF-8 lead byte if it happens to follow an invalid
    sequence.

 3. Fortunately, solution 2. also solves issue 3. as a side effect,
    by no longer using mbtowc(3) in the first place.
2016-02-08 17:18:43 +00:00
bin PR/50747: David Binderman: check bounds before dereference. 2016-02-03 05:26:16 +00:00
common whitespace 2016-02-08 05:27:24 +00:00
compat remove the xfree86 reachover makefiles and the vast majority of 2015-07-23 08:03:24 +00:00
crypto Fix signing of in-memory data with SSH keys 2016-02-07 05:03:36 +00:00
dist/pf Fix obviously broken condition. 2015-08-28 12:17:41 +00:00
distrib Add capability to attach external memory to files on rumpfs. This 2016-02-02 12:22:23 +00:00
doc new openssl 2016-01-30 17:00:53 +00:00
etc Drop almost unnecessary devices for floppy to shrink sysinst.fs. 2016-01-29 18:03:16 +00:00
external don't re-define _KERNTYPES 2016-02-07 21:03:49 +00:00
extsrc
games PR/50411: Rin Okuyama: fix two bugs: 2015-11-06 19:53:37 +00:00
gnu has moved to external/gpl3 2016-01-16 18:41:12 +00:00
include disable dso protected to work around binutils bug 2016-01-29 15:18:33 +00:00
lib UTF-8 fixes from Ingo Schwarze: 2016-02-08 17:18:43 +00:00
libexec Fix .note.netbsd.march by ensuring correct padding 2016-02-08 11:59:39 +00:00
regress moved to tests/net/in_cksum. 2015-01-05 22:39:29 +00:00
rescue Remove rtsol(8) and rtsold(8) as their functionality is in dhcpcd(8). 2014-09-11 13:10:03 +00:00
sbin fix usage message 2016-02-06 10:35:58 +00:00
share Remove the .SUNW_ctf sections when converting form ELF -> a.out by 2016-02-08 10:39:09 +00:00
sys PR/50783: David Binderman: Indent switch properly, add missing break. 2016-02-08 16:44:45 +00:00
tests Add tests for a gateway not on the local subnet 2016-01-29 04:15:46 +00:00
tools silent when we don't have -ldl 2016-02-01 14:18:16 +00:00
usr.bin use sizeof() and array notation. 2016-02-06 21:23:09 +00:00
usr.sbin Split case folding table into separate source file and add full 2016-02-06 10:40:58 +00:00
build.sh Make evbarm64 (little endian) the default for aarch64. 2015-06-27 06:00:28 +00:00
BUILDING Document MKREPRO_TIMESTAMP. 2016-01-29 13:51:13 +00:00
Makefile fix direct reference to texinfo, bleh 2016-01-14 02:51:25 +00:00
Makefile.inc
UPDATING Note that update builds are broken if MKDTRACE got enabled for your 2016-01-25 09:24:29 +00:00