NetBSD/lib
christos 6b42622b31 UTF-8 fixes from Ingo Schwarze:
1. Assume that errno is non-zero when entering read_char()
    and that read(2) returns 0 (indicating end of file).
    Then, the code will clear errno before returning.
    (Obviously, the statement "errno = 0" is almost always
     a bug unless there is save_errno = errno right before it
     and the previous value is properly restored later,
     in all reachable code paths.)

 2. When encountering an invalid byte sequence, the code discards
    all following bytes until MB_LEN_MAX overflows; consider, for
    example, 0xc2 immediately followed by a few valid ASCII bytes.
    Three of those ASCII bytes will be discarded.

 3. On a POSIX system, EILSEQ will always be set after reading a
    valid (yes, valid, not invalid!) UTF-8 character.  The reason
    is that mbtowc(3) will first be called with a length limit
    (third argument) of 1, which will fail, return -1, and - on
    a POSIX system - set errno to EILSEQ.
    This third bug is mitigated a bit because i couldn't find any
    system that actually conforms to POSIX in this respect:  None
    of OpenBSD, NetBSD, FreeBSD, Solaris 11, and glibc set errno
    when an incomplete character is passed to mbtowc(3), even though
    that is required by POSIX.
    Anyway, that mbtowc(3) bug will be fixed at least in OpenBSD
    after release unlock, so it would be good to fix this bug in
    libedit before fixing the bug in mbtowc(3).

How can these three bugs be fixed?

 1. As far as i understand it, the intention of the bogus errno = 0
    is to undo the effects of failing system calls in el_wset(),
    sig_set(), and read__fixio() if the subsequent read(2) indicates
    end of file.  So, restoring errno has to be moved right after
    read__fixio().  Of course, neither 0 nor e is the right value
    to restore: 0 is wrong if errno happened to be set on entry, e
    would be wrong because if one read(2) fails but a second attempt
    succeeds after read__fixio(), errno should not be touched.  So,
    the errno to be restored in this case has to be saved before
    calling read(2) for the first time.

 2. Solving the second issue requires distinguishing invalid and
    incomplete characters, but that is impossible with the function
    mbtowc(3) because it returns -1 in both cases and sets errno
    to EILSEQ in both cases (once properly implemented).

    It is vital that each input character is processed right away.
    It is not acceptable to wait for the next input character before
    processing the previous one because this is an interactive
    library, not a batch system.  Consequently, the only situation
    where it is acceptable to wait for the next byte without first
    processing the previous one(s) is when the previous one(s) form
    an incomplete sequence that can be continued to form a valid
    character.

    Consequently, short of reimplementing a full UTF-8 state machine
    by hand, the only correct way forward is to use mbrtowc(3).
    Even then, care is needed to always have the state object
    properly initialized before using it, and to not discard a valid
    ASCII or UTF-8 lead byte if it happens to follow an invalid
    sequence.

 3. Fortunately, solution 2. also solves issue 3. as a side effect,
    by no longer using mbtowc(3) in the first place.
2016-02-08 17:18:43 +00:00
..
csu Undo previous; the lossage is more basic. 2016-01-24 16:47:32 +00:00
i18n_module
libarch Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
libbluetooth correct comment in literal section 2016-01-22 08:51:40 +00:00
libbpfjit
libbsdmalloc
libbz2 Reorg docs, part 1: 2014-07-05 19:22:41 +00:00
libc Avoid shadowing global. 2016-02-06 19:33:07 +00:00
libc_vfp Possibly build libc_vfp if MACHINE_CPU is aarch64 too. 2015-07-08 01:08:24 +00:00
libcompat PR/50711: David Binderman: Fix memory leak on error 2016-01-26 16:05:18 +00:00
libcrypt fix error messages 2015-06-17 00:15:26 +00:00
libcurses Clear the "forced" flag after updating a line, otherwise we'll always do 2016-01-10 08:11:06 +00:00
libdm The actual header file for these functions is dm.h, not libdm.h. 2016-01-22 22:12:40 +00:00
libedit UTF-8 fixes from Ingo Schwarze: 2016-02-08 17:18:43 +00:00
libexecinfo Fix typo, from FreeBSD. 2015-12-26 10:34:36 +00:00
libform Counting from 0 to n-1 can go wrong badly, if n is unsigned and zero and 2015-12-11 21:22:57 +00:00
libintl back to the defines (fixing a typo -- extra 'g') 2015-06-08 15:04:20 +00:00
libipsec
libisns If a library needs a symbol from another library, pull that library in 2013-09-11 23:04:09 +00:00
libkern Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
libkvm mips needs _KMEMUSER for label_t in pcb.h 2016-01-24 16:07:48 +00:00
liblwres
libm Fix incorrect magic numbers in scaling. From FreeBSD commit 23397, by 2016-01-24 20:34:30 +00:00
libmenu fix unused warnings 2013-10-18 19:53:59 +00:00
libnpf - Change LDADD/DPADD in library dependencies to LIBDPLIBS 2016-01-05 13:07:46 +00:00
libossaudio Add missing defines for 16, 24 and 32 bit NE and OE formats. 2014-09-09 10:45:18 +00:00
libp2k Don't include <rump/rumpvnode_if.h> from rump.h. It's not needed 2016-01-25 11:45:57 +00:00
libpam Adapt to the new API. 2015-04-04 02:51:10 +00:00
libpanel Specify path of a local internal header of libpanel 2015-11-22 04:30:33 +00:00
libpci unsigned -> unsigned int 2016-01-23 07:21:18 +00:00
libperfuse Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
libpmc pmc_evid_, pmc_ctr_t etc are defined in <machine/types.h> but are not exposed 2016-01-23 21:44:55 +00:00
libposix MKCOMPAT fixes for when compat MACHINE_CPU != normal MACHINE_CPU 2014-08-10 23:25:49 +00:00
libppath If a library needs a symbol from another library, pull that library in 2013-09-11 23:04:09 +00:00
libprop
libpthread Fix PTHREAD_FOO_INITIALIZER for C++ by not using volatile in the relevant 2015-08-27 12:30:50 +00:00
libpthread_dbg don't use kernel types. 2016-01-23 14:02:21 +00:00
libpuffs Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
libquota Some NFS servers return RPC_PROGNOTREGISTERED instead of RPC_PROGVERSMISMATCH 2016-01-30 16:31:28 +00:00
libradius
librefuse Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
libresolv src is too big these days to tolerate superfluous apostrophes. It's 2014-10-18 08:33:23 +00:00
librmt
librpcsvc remove __P 2013-12-20 21:04:09 +00:00
librt Bump date for previous. 2015-11-19 07:03:13 +00:00
librump This is not needed anymore. 2015-08-21 06:56:35 +00:00
librumpclient Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
librumpdev
librumphijack Define _KERNTYPES for things that need it. 2016-01-23 21:22:45 +00:00
librumpnet
librumpuser Move librumpuser compile-time options into the librumpuser source 2016-01-25 00:24:23 +00:00
librumpvfs Move rump kernel man pages from various sources to sys/rump 2014-11-09 17:39:37 +00:00
libskey Uses FILE *, needs stdio.h. 2016-01-22 23:25:51 +00:00
libss
libtelnet Avoid enum type mismatch. 2014-04-26 22:10:40 +00:00
libterminfo Always copy the area buffer, even when the length was the same 2015-11-26 01:03:22 +00:00
libukfs Don't include <rump/rumpvnode_if.h> from rump.h. It's not needed 2016-01-25 11:45:57 +00:00
libusbhid Uses __BEGIN_DECLS so needs sys/cdefs.h; also needs stdint.h. 2016-01-22 23:51:23 +00:00
libutil prefer <sys/cpu.h> instead of <machine/cpu.h> 2016-01-25 18:14:04 +00:00
libwrap these are syslog-like 2015-10-14 15:54:21 +00:00
liby
libz Merge riastradh-drm2 to HEAD. 2014-03-18 18:20:35 +00:00
lua lua: updated from 5.3 work3 to 5.3.0 2015-02-02 14:03:05 +00:00
npf If a library needs a symbol from another library, pull that library in 2013-09-11 23:04:09 +00:00
bumpversion
checkoldver
checkver remove -'s from options 2013-02-17 02:36:21 +00:00
checkvers
Makefile use EXTERNAL_BINUTILS_SUBDIR 2016-01-26 17:47:35 +00:00
Makefile.inc