NetBSD

Author	SHA1	Message	Date
christos	bc4f01ce82	- don't set _GNU_SOURCE. We are not supposed to make decisions for others. - don't special-case wcsdup() From Ingo Schwarze.	2016-02-16 19:29:51 +00:00
christos	23f3e7075d	get rid of bool_t (Ingo Schwarze)	2016-02-16 19:11:25 +00:00
christos	747f681109	more include file cleanup (Ingo Schwarze)	2016-02-16 19:08:41 +00:00
christos	a539b892c3	include errno.h	2016-02-16 15:54:15 +00:00
christos	aefc1e4460	From Ingo Scharze: Let "el.h" include everything needed for struct editline, and don't include that stuff multiple times. That also improves consistency, also avoids circular inclusions, and also makes it easier to follow what is going on, even though not quite as nice. But it seems like the best we can do...	2016-02-16 15:53:48 +00:00
christos	f09cb8c626	cleanup chartype.h includes (Ingo Schwarze)	2016-02-16 14:08:25 +00:00
christos	c807fdff98	one more	2016-02-16 14:07:47 +00:00
christos	40850369f8	cleanup inclusion of histedit.h (Ingo Schwarze)	2016-02-16 14:06:05 +00:00
christos	89cffc1532	include explicitly errno.h since we use it.	2016-02-16 14:04:58 +00:00
christos	67bb823526	No need to include "sys.h" from here; it is included from config.h	2016-02-16 14:04:24 +00:00
christos	d8252c8b23	attribute unused	2016-02-15 23:36:30 +00:00
christos	67b10d3e9e	OpenBSD term.c rev. 1.7 2002/11/29 20:13:39 deraadt spelling	2016-02-15 22:53:38 +00:00
christos	f91f480498	OpenBSD readline.c rev. 1.14 2015/02/06 23:21:58 millert use SIZE_MAX	2016-02-15 22:48:59 +00:00
christos	92b1772005	OpenBSD readline.c rev. 1.13 2015/01/13 08:33:12 reyk rl_set_keyboard_input_timeout() for readline 4.2 compat	2016-02-15 21:58:37 +00:00
christos	f8ee3c5528	OpenBSD eln.c rev. 1.3 2011/11/27 21:46:44 pascal kill a C++-style comment	2016-02-15 21:56:35 +00:00
christos	42e2a4d875	Compile with WIDECHAR the same way the main Makefile does (Ingo Schwarze)	2016-02-15 21:38:07 +00:00
christos	1e12a8d1ca	Don't free getline memory (Ingo Schwarze).	2016-02-15 21:35:52 +00:00
christos	e8d0e8c012	forgot one fgetln define	2016-02-15 17:35:39 +00:00
christos	5390c8faa5	change tests for fgetln.	2016-02-15 16:14:39 +00:00
christos	a7ab79fbe5	Use getline for better portability.	2016-02-15 15:53:45 +00:00
christos	c0d16449e0	OpenBSD tokenizer.c rev. 1.8 2003/08/11 18:21:40 deraadt don't increase amax on realloc failure	2016-02-15 15:37:20 +00:00
christos	efeef4e587	OpenBSD term.c rev. 1.13 2009/12/11 18:58:59 jacekm fix two memory leaks	2016-02-15 15:35:03 +00:00
christos	c825536317	Change the test for the size of encoded buffer to include the NULL, from OpenBSD; no functional change.	2016-02-15 15:30:50 +00:00
christos	5367da5f9e	OpenBSD sig.c rev. 1.6 2001/12/06 04:26:00 deraadt save and restore errno in signal handler	2016-02-15 15:29:25 +00:00
christos	87240809e9	Use fparseln to avoid newline hacks.	2016-02-15 15:26:48 +00:00
christos	70a36d136a	use fparseln() to avoid needing to deal with missing \n in the last line and also to handle comments automatically.	2016-02-15 15:18:01 +00:00
christos	2884af9fee	From Ingo Schwarze: el_getc() for the WIDECHAR case, that is, the version in eln.c. For a UTF-8 locale, it is broken in four ways: 1. If the character read is outside the ASCII range, the function does an undefined cast from wchar_t to char. Even if wchar_t is internally represented as UCS-4, that is wrong and dangerous because characters beyond codepoint U+0255 get their high bits truncated, meaning that perfectly valid printable Unicode characters get mapped to arbitrary bytes, even the ASCII escape character for some Unicode characters. But wchar_t need not be implemented in terms of UCS-4, so the outcome of this function is undefined for any and all input. 2. If insufficient space is available for the result, the function fails to detect failure and returns garbage rather than -1 as specified in the documentation. 3. The documentation says that errno will be set on failure, but that doesn't happen either in the above case. 4. Even for ASCII characters, the results may be wrong if wchar_t is not using UCS-4.	2016-02-14 17:06:24 +00:00
christos	f54e4f97f9	From Ingo Schwarze: As we have seen before, "histedit.h" can never get rid of including the <wchar.h> header because using the data types defined there is deeply ingrained in the public interfaces of libedit. Now POSIX unconditionally requires that <wchar.h> defines the type wint_t. Consequently, it can be used unconditionally, no matter whether WIDECHAR is active or not. Consequently, the #define Int is pointless. Note that removing it is not gratuitious churn. Auditing for integer signedness problems is already hard when only fundamental types like "int" and "unsigned" are involved. It gets very hard when types come into the picture that have platform-dependent signedness, like "char" and "wint_t". Adding yet another layer on top, changing both the signedness and the width in a platform- dependent way, makes auditing yet harder, which IMHO is really dangerous. Note that while removing the #define, i already found one bug caused by this excessive complication - in the function re_putc() in refresh.c. If WIDECHAR was defined, it printed an Int = wint_t value with %c. Fortunately, that bug only affects debugging, not production. The fix is contained in the patch. With WIDECHAR, this doesn't change anything. For the case without WIDECHAR, i checked that none of the places wants to store values that might not fit in wint_t. This only changes internal interfaces; public ones remain unchanged.	2016-02-14 14:49:34 +00:00
christos	61ee30487d	From Ingo Schwartze: Next step: Remove #ifdef'ing in read_char(), in the same style as we did for setlocale(3) in el.c. A few remarks are required to explain the choices made. * On first sight, handling mbrtowc(3) seems a bit less trivial than handling setlocale(3) because its prototype uses the data type mbstate_t from <wchar.h>. However, it turns out that "histedit.h" already includes <wchar.h> unconditionally (i don't like headers including other headers, but that ship has sailed, people are by now certainly used to the fact that including "histedit.h" doesn't require including <wchar.h> before), and "histedit.h" is of course included all over the place. So from that perspective, there is no problem with using mbrtowc(3) unconditionally ever for !WIDECHAR. * However, <wchar.h> also defines the mbrtowc(3) prototype, so we cannot just #define mbrtowc away, or including the header will break. It would also be a bad idea to porovide a local implementation of mbrtowc() and hope that it overrides the one in libc. Besides, the required prototype is subtly different: While mbrtowc(3) takes "wchar_t " as its first argument, we need a function that takes "Char ". So unfortunately, we have to keep a ct_mbrtowc #define, at least until we can maybe get rid of "Char " in the more remote future. After getting rid of the #else clause in read_char(), we can pull "return 1;" into the default: clause. After that, we can get rid of the ugly "goto again_lastbyte;" and just "break;". As a bonus, that also gets rid of the ugly CONSTCOND. * While here, delete the unused ct_mbtowc() from chartype.h.	2016-02-14 14:47:48 +00:00
christos	cc7f005f24	Avoid c99 for now.	2016-02-12 17:23:21 +00:00
christos	57c556fd79	GC IGNORE_EXTCHARS and simplify code (Ingo Schwarze)	2016-02-12 15:36:08 +00:00
christos	0e1288d7c8	From Ingo Schwarze: If CHARSET_IS_UTF8 is not set, read_char() is broken in a large number of ways: 1. The isascii(3) check can yield false positives. If a string in an arbitrary encoding contains a byte in the range 0..127, that does not at all imply that it forms a character all by itself, and even less that it represents the same character as in ASCII. Consequently, read_char() may return characters the user never typed. Even if the encoding is not state dependent, the assumption that bytes in the range 0..127 represent ASCII characters is broken. Consider UTF-16, for example. 2. The reverse problem can also occur. In an arbitrary encoding, there is no guarantee that a character that can be represented by ASCII is represented by a seven-bit byte, and even less by the same byte as in ASCII. Even for single-byte encodings, these assumptions are broken. Consider the ISO 646 national variants, for example. Consequently, the current code is insufficient to keep ASCII characters working even for single-byte encodings. 3. The condition "++cbp != 1" can never trigger (because initially, cbp is 0, and the code can only go back up via the final goto, which has another cbp = 0 right before it) and it has no effect (because cbp isn't used afterwards). 4. bytes = ct_mbtowc(cp, cbuf, cbp) is broken. If this returns -1, the code assumes that is can just call mbtowc(3) again for later input bytes. In some implementations, that may even be broken for state-independent encodings, but trying again after mbtowc(3) failure certainly produces completely erratic and meaningless results in state-dependent encodings. 5. The assignment "cp = (Char)(unsigned char)cbuf[0]" is completely bogus. Even if the byte cbuf[0] represents a character all by itself, which it usually will not, whether or not the cast produces the desired result depends on the internal representation of wchar_t in the C library, which the application program can know nothing about. Even for ASCII in the C/POSIX locale, an ASCII character other than '\0' == L'\0' == 0 need not have the same numeric value as a char and as a wchar_t. To summarize, this code only works if all of the following conditions hold: - The encoding is a single-byte encoding. - ASCII is a subset of the encoding. - The implementation of mbtowc(3) in the C library does not require re-initialization after encoding errors. - The implementation of wchar_t in the C library uses the same numerical values as ASCII. Otherwise, it silently produces wrong results. The simplest way to fix this is to just use the same code as for UTF-8 (right above). Of course, that causes functional changes but that shouldn't matter since current behaviour is undefined. The patch below provides the following improvements: - It works for all stateless single-byte encodings, no matter whether they are somehow related to ASCII, no matter how mb[r]towc(3) are internally implemented, and no matter how wchar_t is internally represented. - Instead of producing unpredictable and definitely wrong results for non-UTF-8 multibyte characters, it behaves in a well-defined way: It aborts input processing, sets errno, and returns failure. Note that short of providing full support for arbitrary locales, it is impossible to do better. We cannot know whether a given unsupported locale is state-dependent, and for a state-dependent locale, it makes no sense to retry parsing after an encoding error, so the best we can do is abort processing for any* unsupported multi-byte character. - Note that single-byte characters in arbitrary state-independent locales still work, even in locales that may potentially also contain multibyte characters, as long as those don't occur in input. I'm not sure whether any such locales exist in practice... Tested with UTF-8 and C/POSIX on OpenBSD. Also tested that in the C/POSIX locale, non-ASCII bytes get through unmangled. You may wish to test with ISO-LATIN on NetBSD if NetBSD supports that. ---- Also use a constant for meta to avoid warnings.	2016-02-12 15:11:09 +00:00
christos	6af8d6733f	- Add some more Char casts - reduce ifdefs by providing empty defs for nls functions (Ingo Schwarze)	2016-02-11 19:21:04 +00:00
christos	28c0290948	remove unused wrapper (Ingo Schwarze)	2016-02-11 19:10:18 +00:00
christos	3ae44d1033	Remove utf8_islead() mbrtowc() handles this just fine (Ingo Schwarze)	2016-02-11 16:08:47 +00:00
christos	6b42622b31	UTF-8 fixes from Ingo Schwarze: 1. Assume that errno is non-zero when entering read_char() and that read(2) returns 0 (indicating end of file). Then, the code will clear errno before returning. (Obviously, the statement "errno = 0" is almost always a bug unless there is save_errno = errno right before it and the previous value is properly restored later, in all reachable code paths.) 2. When encountering an invalid byte sequence, the code discards all following bytes until MB_LEN_MAX overflows; consider, for example, 0xc2 immediately followed by a few valid ASCII bytes. Three of those ASCII bytes will be discarded. 3. On a POSIX system, EILSEQ will always be set after reading a valid (yes, valid, not invalid!) UTF-8 character. The reason is that mbtowc(3) will first be called with a length limit (third argument) of 1, which will fail, return -1, and - on a POSIX system - set errno to EILSEQ. This third bug is mitigated a bit because i couldn't find any system that actually conforms to POSIX in this respect: None of OpenBSD, NetBSD, FreeBSD, Solaris 11, and glibc set errno when an incomplete character is passed to mbtowc(3), even though that is required by POSIX. Anyway, that mbtowc(3) bug will be fixed at least in OpenBSD after release unlock, so it would be good to fix this bug in libedit before fixing the bug in mbtowc(3). How can these three bugs be fixed? 1. As far as i understand it, the intention of the bogus errno = 0 is to undo the effects of failing system calls in el_wset(), sig_set(), and read__fixio() if the subsequent read(2) indicates end of file. So, restoring errno has to be moved right after read__fixio(). Of course, neither 0 nor e is the right value to restore: 0 is wrong if errno happened to be set on entry, e would be wrong because if one read(2) fails but a second attempt succeeds after read__fixio(), errno should not be touched. So, the errno to be restored in this case has to be saved before calling read(2) for the first time. 2. Solving the second issue requires distinguishing invalid and incomplete characters, but that is impossible with the function mbtowc(3) because it returns -1 in both cases and sets errno to EILSEQ in both cases (once properly implemented). It is vital that each input character is processed right away. It is not acceptable to wait for the next input character before processing the previous one because this is an interactive library, not a batch system. Consequently, the only situation where it is acceptable to wait for the next byte without first processing the previous one(s) is when the previous one(s) form an incomplete sequence that can be continued to form a valid character. Consequently, short of reimplementing a full UTF-8 state machine by hand, the only correct way forward is to use mbrtowc(3). Even then, care is needed to always have the state object properly initialized before using it, and to not discard a valid ASCII or UTF-8 lead byte if it happens to follow an invalid sequence. 3. Fortunately, solution 2. also solves issue 3. as a side effect, by no longer using mbtowc(3) in the first place.	2016-02-08 17:18:43 +00:00
christos	ef555cf8bb	Whitespace fix (Ingo Schwarze)	2016-01-30 15:05:27 +00:00
christos	7ce9f672f2	Fix misplaced parentheses (Ingo Schwarze)	2016-01-30 04:02:51 +00:00
christos	65691b0e16	One macro is enough (Ingo Schwarze)	2016-01-29 19:59:11 +00:00
gson	07d2388506	unbreak the build	2015-12-08 16:53:27 +00:00
christos	8d14d38c26	If we did not setup the tty, don't reset it.	2015-12-08 12:57:16 +00:00
christos	a2993d741e	Only reset the terminal if we have a tty (Boris Ranto)	2015-12-08 12:56:55 +00:00
christos	8ec268554e	Fix descriptions of el_set functions. Americanise initialise :-)	2015-11-03 21:36:59 +00:00
christos	0fe5419e98	Use the full buffer for the conversion; ideally we should be dynamically allocating this. From Jilles Tjoelker	2015-10-21 21:45:30 +00:00
christos	234792da04	make sure we have space for NUL and NUL terminate buffer array (Jilles Tjoelker)	2015-10-19 00:36:27 +00:00
christos	14ccb7c1cc	remove duplicate declaration	2015-06-02 15:36:45 +00:00
christos	0804279dff	Adjust API to a more modern readline (Ryo Onodera)	2015-06-02 15:35:31 +00:00
christos	0b61093115	- fix types of rl_completion_entry_function and rl_add_defun - call update pos before completion to refresh the screen From Thomas Eriksson	2015-05-26 19:59:21 +00:00
christos	bdf16bca92	make el_gets() return the number of characters read in wide mode (not the number of wide characters) From khorben@ by FreeBSD: https://svnweb.freebsd.org/ports/head/devel/libedit/files/patch-src_eln.c?\ revision=382458&view=markup XXX: Pullup-7	2015-05-18 15:07:04 +00:00
christos	5113710e5b	add FreeBSD	2015-05-17 13:14:41 +00:00

1 2 3 4 5 ...

661 Commits