37 Commits

Author SHA1 Message Date
Szabolcs Nagy
f1471d3216 fix an overflow in wcsxfrm when n==0
posix allows zero length destination
2014-01-23 03:24:54 +01:00
Szabolcs Nagy
571744447c include cleanups: remove unused headers and add feature test macros 2013-12-12 05:09:18 +00:00
Szabolcs Nagy
2b1f2f146d remove duplicate includes from dynlink.c, strfmon.c and getaddrinfo.c 2013-11-25 23:34:10 +00:00
Rich Felker
37c25065be remove spurious tmp file present since initial git check-in 2013-08-17 22:28:50 -04:00
Rich Felker
109bd65acf add hkscs/big5-2003/eten extensions to iconv big5
with these changes, the character set implemented as "big5" in musl is
a pure superset of cp950, the canonical "big5", and agrees with the
normative parts of Unicode. this means it has minor differences from
both hkscs and big5-2003:

- the range A2CC-A2CE maps to CJK ideographs rather than numerals,
  contrary to changes made in big5-2003.

- C6CD maps to a CJK ideograph rather than its corresponding Kangxi
  radical character, contrary to changes made in hkscs.

- F9FE maps to U+2593 rather than U+FFED.

of these differences, none but the last are visually distinct, and the
last is a character used purely for text-based graphics, not to convey
linguistic content.

should there be future demand for strict conformance to big5-2003 or
hkscs mappings, the present charset aliases can be replaced with
distinct variants.

reportedly there are other non-standard big5 extensions in common use
in Taiwan and perhaps elsewhere, which could also be added as layers
on top of the existing big5 support.

there may be additional characters which should be added to the hkscs
table: the whatwg standard for big5 defines what appears to be a
superset of hkscs.
2013-08-17 16:23:22 -04:00
Rich Felker
19b4a0a20e add Big5 charset support to iconv
at this point, it is just the common base charset equivalent to
Windows CP 950, with no further extensions. HKSCS and possibly other
supersets will be added later. other aliases may need to be added too.
2013-08-07 13:16:14 -04:00
Rich Felker
734062b298 iconv support for legacy Korean encodings
like for other character sets, stateful iso-2022 form is not supported
yet but everything else should work. all charset aliases are treated
the same, as Windows codepage 949, because reportedly the EUC-KR
charset name is in widespread (mis?)usage in email and on the web for
data which actually uses the extended characters outside the standard
93x94 grid. this could easily be changed if desired.

the principle of this converter for handling the giant bulk of rare
Hangul syllables outside of the standard KS X 1001 93x94 grid is the
same as the GB18030 converter's treatment of non-explicitly-coded
Unicode codepoints: sequences in the extension range are mapped to an
integer index N, and the converter explicitly computes the Nth Hangul
syllable not explicitly encoded in the character map. empirically,
this requires at most 7 passes over the grid. this approach reduces
the table size required for Korean legacy encodings from roughly 44k
to 17k and should have minimal performance impact on real-world text
conversions since the "slow" characters are rare. where it does have
impact, the cost is merely a large constant time factor.
2013-08-05 13:14:17 -04:00
Rich Felker
1ae4bc4280 fix semantically incorrect use of LC_GLOBAL_LOCALE
LC_GLOBAL_LOCALE refers to the global locale, controlled by setlocale,
not the thread-local locale in effect which these functions should be
using. neither LC_GLOBAL_LOCALE nor 0 has an argument to the *_l
functions has behavior defined by the standard, but 0 is a more
logical choice for requesting the callee to lookup the current locale.
in the future I may move the current locale lookup the the caller (the
non-_l-suffixed wrapper).

at this point, all of the locale logic is dummied out, so no harm was
done, but it should at least avoid misleading usage.
2013-07-28 03:41:01 -04:00
Rich Felker
87be54a135 rework langinfo code for ABI compat and for use by time code 2013-07-24 18:52:02 -04:00
Rich Felker
ad4a536769 update strxfrm/wcsxfrm for future LC_COLLATE support and ABI compat 2013-07-24 18:44:31 -04:00
Rich Felker
4350935ca4 add ABI compat aliases for a number of locale_t functions 2013-07-24 18:40:52 -04:00
Rich Felker
4b0306c83c prepare strcoll/wcscoll for LC_COLLATE support and add ABI symbols 2013-07-24 18:17:09 -04:00
Rich Felker
0a37d99547 move strftime_l into strftime.c and add __-prefixed version
the latter is both for ABI purposes, and to facilitate eventually
adding LC_TIME support. it's also nice to eliminate an extra source
file.
2013-07-24 17:58:31 -04:00
Rich Felker
6a4cfbdbe7 fix iconv conversion to legacy 8bit codepages
this seems to have been a simple copy-and-paste error from the code
for converting from legacy codepages.
2013-06-26 14:27:45 -04:00
Rich Felker
400c5e5c83 use restrict everywhere it's required by c99 and/or posix 2008
to deal with the fact that the public headers may be used with pre-c99
compilers, __restrict is used in place of restrict, and defined
appropriately for any supported compiler. we also avoid the form
[restrict] since older versions of gcc rejected it due to a bug in the
original c99 standard, and instead use the form *restrict.
2012-09-06 22:44:55 -04:00
Rich Felker
b3d7d062af duplocale: don't crash when called with LC_GLOBAL_LOCALE
posix has resolved to add this usage; for now, we just avoid writing
anything to the new locale object since it's not used anyway.
2012-06-20 13:48:57 -04:00
Rich Felker
85a3ba3a28 fix localeconv values and implementation
dynamic-allocation of the structure is not valid; it can crash an
application if malloc fails. since localeconv is not specified to have
failure conditions, the object needs to have static storage duration.

need to review whether all the values are right or not still..
2012-06-19 22:44:08 -04:00
Rich Felker
26710be714 fix multiple iconv bugs reading utf-16/32 and wchar_t 2012-06-18 21:41:38 -04:00
Rich Felker
673633c689 fix iconv dest utf-16: unavailable chars must be replaced; EILSEQ is wrong 2012-06-18 20:43:21 -04:00
Rich Felker
a2f149b5d1 fix erroneous utf-16 encoding with surrogates in iconv
apparently this was never tested before.
2012-06-18 20:29:41 -04:00
Rich Felker
80d7859f32 fix major breakage in iconv, bogus rejecting of dest charsets 2012-04-21 14:46:40 -04:00
Rich Felker
bff650df9f add strfmon_l variant (still mostly incomplete) 2012-03-25 00:21:20 -04:00
Rich Felker
25501c1079 initial, very primitive strfmon 2012-03-21 00:47:37 -04:00
Rich Felker
e0614f7cd4 add all missing wchar functions except floating point parsers
these are mostly untested and adapted directly from corresponding byte
string functions and similar.
2012-03-01 23:24:45 -05:00
Rich Felker
36bf56940a more locale_t interfaces (string stuff) and header updates
this should be everything except for some functions where the non-_l
version isn't even implemented yet (mainly some non-ISO-C wcs*
functions).
2012-02-06 21:51:02 -05:00
Rich Felker
c09b6f8ab6 fix some omissions and mistakes in locale_t interface definitions 2012-02-06 21:33:40 -05:00
Rich Felker
e5a7f14c81 add more of the locale_t interfaces, all dummied out to ignore the locale 2012-02-06 21:29:31 -05:00
Rich Felker
0e2331c9b6 gb18030 support in iconv (only from, not to)
also support (and restrict to subsets) older chinese sets, and
explicitly refuse to convert to cjk (since there's no code for it yet)
2011-07-12 20:30:04 -04:00
Rich Felker
95a85e047e legacy japanese charset support in iconv (only from, not to) 2011-07-12 02:43:24 -04:00
Rich Felker
594b16e004 simplify iconv and support more legacy codepages 2011-07-12 00:31:39 -04:00
Rich Felker
2f0c415ceb iconv was not returning -1 on most failure
this broke most uses of iconv in real-world programs, especially
glib's iconv wrappers.
2011-07-03 19:26:12 -04:00
Rich Felker
11c531e21d implement uselocale function (minimal) 2011-05-30 01:41:23 -04:00
Rich Felker
bb8d3d00e2 fix breakage due to converting a return type to size_t in iconv... 2011-04-07 16:10:44 -04:00
Rich Felker
5600088d38 fix nl_langinfo to actually use the existing, correct internal version 2011-04-03 19:51:14 -04:00
Rich Felker
9ae8d5fc71 fix all implicit conversion between signed/unsigned pointers
sadly the C language does not specify any such implicit conversion, so
this is not a matter of just fixing warnings (as gcc treats it) but
actual errors. i would like to revisit a number of these changes and
possibly revise the types used to reduce the number of casts required.
2011-03-25 16:34:03 -04:00
Rich Felker
7fe308eb9f use a more-correct integer type, and silence 64-bit warnings as a bonus 2011-02-13 23:38:21 -05:00
Rich Felker
0b44a0315b initial check-in, version 0.5.0 2011-02-12 00:22:29 -05:00