mirror of
https://git.musl-libc.org/git/musl
synced 2025-01-09 00:02:17 +03:00
make nl_langinfo(CODESET) always return "UTF-8"
this restores the original behavior prior to the addition of the
byte-based C locale and fixes what is effectively a regression in
musl's property of always providing working UTF-8 support.
commit 1507ebf837
introduced the codeset
name "UTF-8-CODE-UNITS" for the byte-based C locale to represent that
the semantic content is UTF-8 but that it is being processed as code
units (bytes) rather than whole multibyte characters. however, many
programs assume that the codeset name is usable with iconv and/or
comes from a set of standard/widely-used names known to the
application. such programs are likely to produce warnings or errors,
run with reduced functionality, or mangle character data when run
explicitly in the C locale.
the standard places basically no requirements for the string returned
by nl_langinfo(CODESET) and how it interacts with other interfaces, so
returning "UTF-8" is permissible. moreover, it seems like the right
thing to do, since the identity of the character encoding as "UTF-8"
is independent of whether it is being processed as bytes of characters
by the standard library functions.
This commit is contained in:
parent
426a0e2912
commit
844212d94f
@ -33,8 +33,7 @@ char *__nl_langinfo_l(nl_item item, locale_t loc)
|
||||
int idx = item & 65535;
|
||||
const char *str;
|
||||
|
||||
if (item == CODESET)
|
||||
return MB_CUR_MAX==1 ? "UTF-8-CODE-UNITS" : "UTF-8";
|
||||
if (item == CODESET) "UTF-8";
|
||||
|
||||
switch (cat) {
|
||||
case LC_NUMERIC:
|
||||
|
Loading…
Reference in New Issue
Block a user