514 lines
15 KiB
Groff
514 lines
15 KiB
Groff
.\" $NetBSD: nls.7,v 1.11 2003/06/26 11:55:56 wiz Exp $
|
|
.\"
|
|
.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
.\" by Gregory McGarry.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. All advertising materials mentioning features or use of this software
|
|
.\" must display the following acknowledgement:
|
|
.\" This product includes software developed by the NetBSD
|
|
.\" Foundation, Inc. and its contributors.
|
|
.\" 4. Neither the name of The NetBSD Foundation nor the names of its
|
|
.\" contributors may be used to endorse or promote products derived
|
|
.\" from this software without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
.\"
|
|
.Dd May 17, 2003
|
|
.Dt NLS 7
|
|
.Os
|
|
.Sh NAME
|
|
.Nm NLS
|
|
.Nd Native Language Support Overview
|
|
.Sh DESCRIPTION
|
|
Native Language Support (NLS) provides commands for a single
|
|
worldwide operating system base.
|
|
An internationalized system has no built-in assumptions or dependencies
|
|
on language-specific or cultural-specific conventions such as:
|
|
.Pp
|
|
.Bl -bullet -offset indent -compact
|
|
.It
|
|
Character classifications
|
|
.It
|
|
Character comparison rules
|
|
.It
|
|
Character collation order
|
|
.It
|
|
Numeric and monetary formatting
|
|
.It
|
|
Date and time formatting
|
|
.It
|
|
Message-text language
|
|
.It
|
|
Character sets
|
|
.El
|
|
.Pp
|
|
All information pertaining to cultural conventions and language is
|
|
obtained at program run time.
|
|
.Pp
|
|
.Dq Internationalization
|
|
(often abbreviated
|
|
.Dq i18n )
|
|
refers to the operation by which system software is developed to support
|
|
multiple cultural-specific and language-specific conventions.
|
|
This is a generalization process by which the system is untied from
|
|
calling only English strings or other English-specific conventions.
|
|
.Dq Localization
|
|
(often abbreviated
|
|
.Dq l10n )
|
|
refers to the operations by which the user environment is customized to
|
|
handle its input and output appropriate for specific language and cultural
|
|
conventions.
|
|
This is a specialization process, by which generic methods already
|
|
implemented in an internationalized system are used in specific ways.
|
|
The formal description of cultural conventions for some country, together
|
|
with all associated translations targeted to the native language, is
|
|
called the
|
|
.Dq locale .
|
|
.Pp
|
|
.Nx
|
|
provides extensive support to programmers and system developers to
|
|
enable internationalized software to be developed.
|
|
.Nx
|
|
also supplies a large variety of locales for system localization.
|
|
.Ss Localization of Information
|
|
All locale information is accessible to programs at run time so that
|
|
data is processed and displayed correctly for specific cultural
|
|
conventions and language.
|
|
.Pp
|
|
A locale is divided into categories.
|
|
A category is a group of language-specific and culture-specific conventions
|
|
as outlined in the list above.
|
|
ISO C specifies the following six standard categories supported by
|
|
.Nx :
|
|
.Pp
|
|
.Bl -tag -compact -width LC_MONETARYXX
|
|
.It LC_COLLATE
|
|
string-collation order information
|
|
.It LC_CTYPE
|
|
character classification, case conversion, and other character attributes
|
|
.It LC_MESSAGES
|
|
the format for affirmative and negative responses
|
|
.It LC_MONETARY
|
|
rules and symbols for formatting monetary numeric information
|
|
.It LC_NUMERIC
|
|
rules and symbols for formatting nonmonetary numeric information
|
|
.It LC_TIME
|
|
rules and symbols for formatting time and date information
|
|
.El
|
|
.Pp
|
|
Localization of the system is achieved by setting appropriate values
|
|
in environment variables to identify which locale should be used.
|
|
The environment variables have the same names as their respective
|
|
locale categories.
|
|
Additionally, the
|
|
.Ev LANG ,
|
|
.Ev LC_ALL ,
|
|
and
|
|
.Ev NLSPATH
|
|
environment variables are used.
|
|
The
|
|
.Ev NLSPATH
|
|
environment variable specifies a colon-separated list of directory names
|
|
where the message catalog files of the NLS database are located.
|
|
The
|
|
.Ev LC_ALL
|
|
and
|
|
.Ev LANG
|
|
environment variables also determine the current locale.
|
|
.Pp
|
|
The values of these environment variables contains a string format as:
|
|
.Pp
|
|
.Bd -literal
|
|
language[_territory][.codeset][@modifier]
|
|
.Ed
|
|
.Pp
|
|
Valid values for the language field come from the ISO639 standard which
|
|
defines two-character codes for many languages.
|
|
Some common language codes are:
|
|
.Pp
|
|
.nf
|
|
.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
|
|
\fILanguage Name\fP \fICode\fP \fILanguage Family\fP
|
|
.ta \w'SERBO-CROATIAN'u+2n +\w'DE'u+5n +\w'OCEANIC/INDONESIAN'u+2nC
|
|
.sp 5p
|
|
ABKHAZIAN AB IBERO-CAUCASIAN
|
|
AFAN (OROMO) OM HAMITIC
|
|
AFAR AA HAMITIC
|
|
AFRIKAANS AF GERMANIC
|
|
ALBANIAN SQ INDO-EUROPEAN (OTHER)
|
|
AMHARIC AM SEMITIC
|
|
ARABIC AR SEMITIC
|
|
ARMENIAN HY INDO-EUROPEAN (OTHER)
|
|
ASSAMESE AS INDIAN
|
|
AYMARA AY AMERINDIAN
|
|
AZERBAIJANI AZ TURKIC/ALTAIC
|
|
BASHKIR BA TURKIC/ALTAIC
|
|
BASQUE EU BASQUE
|
|
BENGALI BN INDIAN
|
|
BHUTANI DZ ASIAN
|
|
BIHARI BH INDIAN
|
|
BISLAMA BI
|
|
BRETON BR CELTIC
|
|
BULGARIAN BG SLAVIC
|
|
BURMESE MY ASIAN
|
|
BYELORUSSIAN BE SLAVIC
|
|
CAMBODIAN KM ASIAN
|
|
CATALAN CA ROMANCE
|
|
CHINESE ZH ASIAN
|
|
CORSICAN CO ROMANCE
|
|
CROATIAN HR SLAVIC
|
|
CZECH CS SLAVIC
|
|
DANISH DA GERMANIC
|
|
DUTCH NL GERMANIC
|
|
ENGLISH EN GERMANIC
|
|
ESPERANTO EO INTERNATIONAL AUX.
|
|
ESTONIAN ET FINNO-UGRIC
|
|
FAROESE FO GERMANIC
|
|
FIJI FJ OCEANIC/INDONESIAN
|
|
FINNISH FI FINNO-UGRIC
|
|
FRENCH FR ROMANCE
|
|
FRISIAN FY GERMANIC
|
|
GALICIAN GL ROMANCE
|
|
GEORGIAN KA IBERO-CAUCASIAN
|
|
GERMAN DE GERMANIC
|
|
GREEK EL LATIN/GREEK
|
|
GREENLANDIC KL ESKIMO
|
|
GUARANI GN AMERINDIAN
|
|
GUJARATI GU INDIAN
|
|
HAUSA HA NEGRO-AFRICAN
|
|
HEBREW HE SEMITIC
|
|
HINDI HI INDIAN
|
|
HUNGARIAN HU FINNO-UGRIC
|
|
ICELANDIC IS GERMANIC
|
|
INDONESIAN ID OCEANIC/INDONESIAN
|
|
INTERLINGUA IA INTERNATIONAL AUX.
|
|
INTERLINGUE IE INTERNATIONAL AUX.
|
|
INUKTITUT IU
|
|
INUPIAK IK ESKIMO
|
|
IRISH GA CELTIC
|
|
ITALIAN IT ROMANCE
|
|
JAPANESE JA ASIAN
|
|
JAVANESE JV OCEANIC/INDONESIAN
|
|
KANNADA KN DRAVIDIAN
|
|
KASHMIRI KS INDIAN
|
|
KAZAKH KK TURKIC/ALTAIC
|
|
KINYARWANDA RW NEGRO-AFRICAN
|
|
KIRGHIZ KY TURKIC/ALTAIC
|
|
KURUNDI RN NEGRO-AFRICAN
|
|
KOREAN KO ASIAN
|
|
KURDISH KU IRANIAN
|
|
LAOTHIAN LO ASIAN
|
|
LATIN LA LATIN/GREEK
|
|
LATVIAN LV BALTIC
|
|
LINGALA LN NEGRO-AFRICAN
|
|
LITHUANIAN LT BALTIC
|
|
MACEDONIAN MK SLAVIC
|
|
MALAGASY MG OCEANIC/INDONESIAN
|
|
MALAY MS OCEANIC/INDONESIAN
|
|
MALAYALAM ML DRAVIDIAN
|
|
MALTESE MT SEMITIC
|
|
MAORI MI OCEANIC/INDONESIAN
|
|
MARATHI MR INDIAN
|
|
MOLDAVIAN MO ROMANCE
|
|
MONGOLIAN MN
|
|
NAURU NA
|
|
NEPALI NE INDIAN
|
|
NORWEGIAN NO GERMANIC
|
|
OCCITAN OC ROMANCE
|
|
ORIYA OR INDIAN
|
|
PASHTO PS IRANIAN
|
|
PERSIAN (farsi) FA IRANIAN
|
|
POLISH PL SLAVIC
|
|
PORTUGUESE PT ROMANCE
|
|
PUNJABI PA INDIAN
|
|
QUECHUA QU AMERINDIAN
|
|
RHAETO-ROMANCE RM ROMANCE
|
|
ROMANIAN RO ROMANCE
|
|
RUSSIAN RU SLAVIC
|
|
SAMOAN SM OCEANIC/INDONESIAN
|
|
SANGHO SG NEGRO-AFRICAN
|
|
SANSKRIT SA INDIAN
|
|
SCOTS GAELIC GD CELTIC
|
|
SERBIAN SR SLAVIC
|
|
SERBO-CROATIAN SH SLAVIC
|
|
SESOTHO ST NEGRO-AFRICAN
|
|
SETSWANA TN NEGRO-AFRICAN
|
|
SHONA SN NEGRO-AFRICAN
|
|
SINDHI SD INDIAN
|
|
SINGHALESE SI INDIAN
|
|
SISWATI SS NEGRO-AFRICAN
|
|
SLOVAK SK SLAVIC
|
|
SLOVENIAN SL SLAVIC
|
|
SOMALI SO HAMITIC
|
|
SPANISH ES ROMANCE
|
|
SUNDANESE SU OCEANIC/INDONESIAN
|
|
SWAHILI SW NEGRO-AFRICAN
|
|
SWEDISH SV GERMANIC
|
|
TAGALOG TL OCEANIC/INDONESIAN
|
|
TAJIK TG IRANIAN
|
|
TAMIL TA DRAVIDIAN
|
|
TATAR TT TURKIC/ALTAIC
|
|
TELUGU TE DRAVIDIAN
|
|
THAI TH ASIAN
|
|
TIBETAN BO ASIAN
|
|
TIGRINYA TI SEMITIC
|
|
TONGA TO OCEANIC/INDONESIAN
|
|
TSONGA TS NEGRO-AFRICAN
|
|
TURKISH TR TURKIC/ALTAIC
|
|
TURKMEN TK TURKIC/ALTAIC
|
|
TWI TW NEGRO-AFRICAN
|
|
UIGUR UG
|
|
UKRAINIAN UK SLAVIC
|
|
URDU UR INDIAN
|
|
UZBEK UZ TURKIC/ALTAIC
|
|
VIETNAMESE VI ASIAN
|
|
VOLAPUK VO INTERNATIONAL AUX.
|
|
WELSH CY CELTIC
|
|
WOLOF WO NEGRO-AFRICAN
|
|
XHOSA XH NEGRO-AFRICAN
|
|
YIDDISH YI GERMANIC
|
|
YORUBA YO NEGRO-AFRICAN
|
|
ZHUANG ZA
|
|
ZULU ZU NEGRO-AFRICAN
|
|
.ta
|
|
.fi
|
|
.Pp
|
|
For example, the locale for the Danish language spoken in Denmark
|
|
using the ISO8859-1 character set is da_DK.ISO8859-1.
|
|
The da stands for the Danish language and the DK stands for Denmark.
|
|
The short form of da_DK is sufficient to indicate this locale.
|
|
.Pp
|
|
The environment variable settings are queried by their priority level
|
|
in the following manner:
|
|
.Pp
|
|
.Bl -bullet
|
|
.It
|
|
If the
|
|
.Ev LC_ALL
|
|
environment variable is set, all six categories use the locale it
|
|
specifies.
|
|
.It
|
|
If the
|
|
.Ev LC_ALL
|
|
environment variable is not set, each individual category uses the
|
|
locale specified by its corresponding environment variable.
|
|
.It
|
|
If the
|
|
.Ev LC_ALL
|
|
environment variable is not set, and a value for a particular
|
|
.Ev LC_*
|
|
environment variable is not set, the value of the
|
|
.Ev LANG
|
|
environment variable specifies the default locale for all categories.
|
|
Only the
|
|
.Ev LANG
|
|
environment variable should be set in /etc/profile, since it makes it
|
|
most easy for the user to override the system default using the individual
|
|
.Ev LC_*
|
|
variables.
|
|
.It
|
|
If the
|
|
.Ev LC_ALL
|
|
environment variable is not set, a value for a particular
|
|
.Ev LC_*
|
|
environment variable is not set, and the value of the
|
|
.Ev LANG
|
|
environment variable is not set, the locale for that specific
|
|
category defaults to the C locale.
|
|
The C or POSIX locale assumes the 7-bit ASCII character set and defines
|
|
information for the six categories.
|
|
.El
|
|
.Ss Character Sets
|
|
A character is any symbol used for the organization, control, or
|
|
representation of data.
|
|
A group of such symbols used to describe a
|
|
particular language make up a character set.
|
|
It is the encoding values in a character set that provide
|
|
the interface between the system and its input and output devices.
|
|
.Pp
|
|
The following character sets are supported in
|
|
.Nx
|
|
.Bl -tag -width ISO8859_family
|
|
.It ISO8859 family
|
|
Industry-standard character sets are provided by means of the ISO8859
|
|
family of character sets, which provide a range of single-byte character set
|
|
support that includes Latin-1, Latin-2, Arabic, Cyrillic, Hebrew,
|
|
Greek, and Turkish.
|
|
The eucJP character set is the industry-standard character set used to support
|
|
the Japanese locale.
|
|
.It Unicode
|
|
A Unicode environment based on the UTF-8 character set is supported for all
|
|
supported language/territories.
|
|
UTF-8 provides character support for most of the major languages of the
|
|
world and can be used in environments where multiple languages must be
|
|
processed simultaneously.
|
|
.El
|
|
.Ss Font Sets
|
|
A font set contains the glyphs to be displayed on the screen for a
|
|
corresponding character in a character set.
|
|
A display must support a suitable font to display a character set.
|
|
If suitable fonts are available to the X server, then X clients can
|
|
include support for different character sets.
|
|
.Xr xterm 1
|
|
includes support for UTF-8 character sets.
|
|
.Xr xfd 1
|
|
is useful for displaying all the characters in an X font.
|
|
.Pp
|
|
The
|
|
.Nx
|
|
.Xr wscons 4
|
|
console provides support for loading fonts using the
|
|
.Xr wsfontload 8
|
|
utility.
|
|
Currently, only fonts for the ISO8859-1 family of character sets are
|
|
supported.
|
|
.Ss Internationalization for Programmers
|
|
To facilitate translations of messages into various languages and to
|
|
make the translated messages available to the program based on a
|
|
user's locale, it is necessary to keep messages separate from the
|
|
programs and provide them in the form of message catalogs that a
|
|
program can access at run time.
|
|
.Pp
|
|
Access to locale information is provided through the
|
|
.Xr setlocale 3
|
|
and
|
|
.Xr nl_langinfo 3
|
|
interfaces.
|
|
See their respective man pages for further information.
|
|
.Pp
|
|
Message source files containing application messages are created by
|
|
the programmer and converted to message catalogs.
|
|
These catalogs are used by the application to retrieve and display
|
|
messages, as needed.
|
|
.Pp
|
|
.Nx
|
|
supports two message catalog interfaces: the X/Open
|
|
.Xr catgets 3
|
|
interface and the Uniforum
|
|
.Xr gettext 3
|
|
interface.
|
|
The
|
|
.Xr catgets 3
|
|
interface has the advantage that it belongs to a standard which is
|
|
well supported.
|
|
Unfortunately the interface is complicated to use and
|
|
maintenance of the catalogs is difficult.
|
|
The implementation also doesn't support different character sets.
|
|
The
|
|
.Xr gettext 3
|
|
interface has not been standardized yet, however it is being supported
|
|
by an increasing number of systems.
|
|
It also provides many additional tools which make programming and
|
|
catalog maintenance much easier.
|
|
.Ss Support for Multibyte Characters and Wide Characters
|
|
Character sets with multibyte characters may be difficult to decode, or may
|
|
contain state (i.e., adjacent characters are dependent).
|
|
ISO C specifies a set of functions using 'wide characters' which can handle
|
|
multibyte characters properly.
|
|
A wide character is specified in ISO C
|
|
as being a fixed number of bits wide and is stateless.
|
|
.Pp
|
|
There are two types for wide characters:
|
|
.Em wchar_t
|
|
and
|
|
.Em wint_t .
|
|
.Em wchar_t
|
|
is a type which can contain one wide character and operates like 'char'
|
|
type does for one character.
|
|
.Em wint_t
|
|
can contain one wide character or WEOF (wide EOF).
|
|
.Pp
|
|
There are functions that operate on
|
|
.Em wchar_t ,
|
|
and substitute for functions operating on 'char'.
|
|
See
|
|
.Xr wmemchr 3
|
|
and
|
|
.Xr towlower 3
|
|
for details.
|
|
There are some additional functions that operate on
|
|
.Em wchar_t .
|
|
See
|
|
.Xr wctype 3
|
|
and
|
|
.Xr wctran 3
|
|
for details.
|
|
.Pp
|
|
Wide characters should be used for all I/O processing which may rely
|
|
on locale-specific strings.
|
|
The two primary issues requiring special use of wide characters are:
|
|
.Bl -bullet -offset indent
|
|
.It
|
|
All I/O is performed using multibyte characters.
|
|
Input data is converted into wide characters immediately after
|
|
reading and data for output is converted from wide characters to
|
|
multibyte characters immediately before writing.
|
|
Conversion is achieved using
|
|
.Xr mbstowcs 3 ,
|
|
.Xr mbsrtowcs 3 ,
|
|
.Xr wcstombs 3 ,
|
|
.Xr wcsrtombs 3 ,
|
|
.Xr mblen 3 ,
|
|
.Xr mbrlen 3 ,
|
|
and
|
|
.Xr mbsinit 3 .
|
|
.It
|
|
Wide characters are used directly for I/O, using
|
|
.Xr getwchar 3 ,
|
|
.Xr fgetwc 3 ,
|
|
.Xr getwc 3 ,
|
|
.Xr ungetwc 3 ,
|
|
.Xr fgetws 3 ,
|
|
.Xr putwchar 3 ,
|
|
.Xr fputwc 3 ,
|
|
.Xr putwc 3 ,
|
|
and
|
|
.Xr fputws 3 .
|
|
They are also used for formatted I/O functions for wide characters
|
|
such as
|
|
.Xr fwscanf 3 ,
|
|
.Xr wscanf 3 ,
|
|
.Xr swscanf 3 ,
|
|
.Xr fwprintf 3 ,
|
|
.Xr wprintf 3 ,
|
|
.Xr swprintf 3 ,
|
|
.Xr vfwprintf 3 ,
|
|
.Xr vwprintf 3 ,
|
|
and
|
|
.Xr vswprintf 3 ,
|
|
and wide character identifier of %lc, %C, %ls, %S for conventional
|
|
formatted I/O functions.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.Xr gencat 1 ,
|
|
.Xr xfd 1 ,
|
|
.Xr xterm 1 ,
|
|
.Xr catgets 3 ,
|
|
.Xr gettext 3 ,
|
|
.Xr nl_langinfo 3 ,
|
|
.Xr setlocale 3 ,
|
|
.Xr wsfontload 8
|
|
.Sh BUGS
|
|
This man page is incomplete.
|