2009-04-09 06:51:54 +04:00
|
|
|
.\" $NetBSD: nls.7,v 1.15 2009/04/09 02:51:54 joerg Exp $
|
2003-02-12 05:42:44 +03:00
|
|
|
.\"
|
|
|
|
.\" Copyright (c) 2003 The NetBSD Foundation, Inc.
|
|
|
|
.\" All rights reserved.
|
|
|
|
.\"
|
|
|
|
.\" This code is derived from software contributed to The NetBSD Foundation
|
|
|
|
.\" by Gregory McGarry.
|
|
|
|
.\"
|
|
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
|
|
.\" modification, are permitted provided that the following conditions
|
|
|
|
.\" are met:
|
|
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
|
|
.\"
|
|
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
|
|
|
|
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
|
|
|
|
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
|
|
|
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
|
|
|
|
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
|
|
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
|
|
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
|
|
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
|
|
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
|
|
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
|
|
|
.\" POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
.\"
|
2007-03-02 23:28:54 +03:00
|
|
|
.Dd February 21, 2007
|
2003-02-12 05:42:44 +03:00
|
|
|
.Dt NLS 7
|
|
|
|
.Os
|
|
|
|
.Sh NAME
|
|
|
|
.Nm NLS
|
2003-05-08 02:41:26 +04:00
|
|
|
.Nd Native Language Support Overview
|
2003-02-12 05:42:44 +03:00
|
|
|
.Sh DESCRIPTION
|
2003-05-08 02:41:26 +04:00
|
|
|
Native Language Support (NLS) provides commands for a single
|
2003-02-12 21:48:14 +03:00
|
|
|
worldwide operating system base.
|
|
|
|
An internationalized system has no built-in assumptions or dependencies
|
|
|
|
on language-specific or cultural-specific conventions such as:
|
2003-02-12 05:42:44 +03:00
|
|
|
.Pp
|
2003-06-26 15:53:18 +04:00
|
|
|
.Bl -bullet -offset indent -compact
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
Character classifications
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
Character comparison rules
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
Character collation order
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
Numeric and monetary formatting
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
Date and time formatting
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
|
|
|
Message-text language
|
|
|
|
.It
|
2003-05-17 06:57:39 +04:00
|
|
|
Character sets
|
2003-02-12 05:42:44 +03:00
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
All information pertaining to cultural conventions and language is
|
|
|
|
obtained at program run time.
|
|
|
|
.Pp
|
2003-02-12 21:48:14 +03:00
|
|
|
.Dq Internationalization
|
|
|
|
(often abbreviated
|
|
|
|
.Dq i18n )
|
|
|
|
refers to the operation by which system software is developed to support
|
|
|
|
multiple cultural-specific and language-specific conventions.
|
|
|
|
This is a generalization process by which the system is untied from
|
|
|
|
calling only English strings or other English-specific conventions.
|
|
|
|
.Dq Localization
|
|
|
|
(often abbreviated
|
|
|
|
.Dq l10n )
|
|
|
|
refers to the operations by which the user environment is customized to
|
|
|
|
handle its input and output appropriate for specific language and cultural
|
|
|
|
conventions.
|
|
|
|
This is a specialization process, by which generic methods already
|
|
|
|
implemented in an internationalized system are used in specific ways.
|
|
|
|
The formal description of cultural conventions for some country, together
|
|
|
|
with all associated translations targeted to the native language, is
|
|
|
|
called the
|
|
|
|
.Dq locale .
|
2003-02-12 05:42:44 +03:00
|
|
|
.Pp
|
|
|
|
.Nx
|
|
|
|
provides extensive support to programmers and system developers to
|
|
|
|
enable internationalized software to be developed.
|
|
|
|
.Nx
|
|
|
|
also supplies a large variety of locales for system localization.
|
2003-02-12 21:48:14 +03:00
|
|
|
.Ss Localization of Information
|
2003-02-12 05:42:44 +03:00
|
|
|
All locale information is accessible to programs at run time so that
|
|
|
|
data is processed and displayed correctly for specific cultural
|
|
|
|
conventions and language.
|
|
|
|
.Pp
|
2003-02-12 21:48:14 +03:00
|
|
|
A locale is divided into categories.
|
|
|
|
A category is a group of language-specific and culture-specific conventions
|
|
|
|
as outlined in the list above.
|
|
|
|
ISO C specifies the following six standard categories supported by
|
2003-02-12 05:42:44 +03:00
|
|
|
.Nx :
|
|
|
|
.Pp
|
2003-02-12 21:48:14 +03:00
|
|
|
.Bl -tag -compact -width LC_MONETARYXX
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_COLLATE
|
2003-02-12 05:42:44 +03:00
|
|
|
string-collation order information
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_CTYPE
|
2003-02-12 05:42:44 +03:00
|
|
|
character classification, case conversion, and other character attributes
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_MESSAGES
|
2003-02-12 05:42:44 +03:00
|
|
|
the format for affirmative and negative responses
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_MONETARY
|
2003-02-12 05:42:44 +03:00
|
|
|
rules and symbols for formatting monetary numeric information
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_NUMERIC
|
2003-02-12 05:42:44 +03:00
|
|
|
rules and symbols for formatting nonmonetary numeric information
|
2007-03-02 23:28:54 +03:00
|
|
|
.It Ev LC_TIME
|
2003-02-12 05:42:44 +03:00
|
|
|
rules and symbols for formatting time and date information
|
|
|
|
.El
|
|
|
|
.Pp
|
|
|
|
Localization of the system is achieved by setting appropriate values
|
2003-02-12 21:48:14 +03:00
|
|
|
in environment variables to identify which locale should be used.
|
2003-04-14 10:47:12 +04:00
|
|
|
The environment variables have the same names as their respective
|
2003-05-08 08:48:27 +04:00
|
|
|
locale categories.
|
|
|
|
Additionally, the
|
2003-02-12 21:48:14 +03:00
|
|
|
.Ev LANG ,
|
|
|
|
.Ev LC_ALL ,
|
|
|
|
and
|
2003-04-14 10:47:12 +04:00
|
|
|
.Ev NLSPATH
|
|
|
|
environment variables are used.
|
2003-02-12 21:48:14 +03:00
|
|
|
The
|
|
|
|
.Ev NLSPATH
|
|
|
|
environment variable specifies a colon-separated list of directory names
|
|
|
|
where the message catalog files of the NLS database are located.
|
|
|
|
The
|
|
|
|
.Ev LC_ALL
|
|
|
|
and
|
|
|
|
.Ev LANG
|
2003-02-12 05:42:44 +03:00
|
|
|
environment variables also determine the current locale.
|
|
|
|
.Pp
|
|
|
|
The values of these environment variables contains a string format as:
|
|
|
|
.Pp
|
|
|
|
.Bd -literal
|
|
|
|
language[_territory][.codeset][@modifier]
|
|
|
|
.Ed
|
|
|
|
.Pp
|
2003-05-08 02:41:26 +04:00
|
|
|
Valid values for the language field come from the ISO639 standard which
|
2003-05-08 08:48:27 +04:00
|
|
|
defines two-character codes for many languages.
|
|
|
|
Some common language codes are:
|
2003-05-08 02:41:26 +04:00
|
|
|
.Pp
|
2009-04-09 06:51:54 +04:00
|
|
|
.Bl -column "PERSIAN (farsi)" "Sy Code" "OCEANIC/INDONESIAN"
|
|
|
|
.It Sy Language Name Ta Sy Code Ta Sy Language Family
|
|
|
|
.It ABKHAZIAN AB IBERO-CAUCASIAN
|
|
|
|
.It AFAN (OROMO) OM HAMITIC
|
|
|
|
.It AFAR AA HAMITIC
|
|
|
|
.It AFRIKAANS AF GERMANIC
|
|
|
|
.It ALBANIAN SQ INDO-EUROPEAN (OTHER)
|
|
|
|
.It AMHARIC AM SEMITIC
|
|
|
|
.It ARABIC AR SEMITIC
|
|
|
|
.It ARMENIAN HY INDO-EUROPEAN (OTHER)
|
|
|
|
.It ASSAMESE AS INDIAN
|
|
|
|
.It AYMARA AY AMERINDIAN
|
|
|
|
.It AZERBAIJANI AZ TURKIC/ALTAIC
|
|
|
|
.It BASHKIR BA TURKIC/ALTAIC
|
|
|
|
.It BASQUE EU BASQUE
|
|
|
|
.It BENGALI BN INDIAN
|
|
|
|
.It BHUTANI DZ ASIAN
|
|
|
|
.It BIHARI BH INDIAN
|
|
|
|
.It BISLAMA Ta BI Ta ""
|
|
|
|
.It BRETON BR CELTIC
|
|
|
|
.It BULGARIAN BG SLAVIC
|
|
|
|
.It BURMESE MY ASIAN
|
|
|
|
.It BYELORUSSIAN BE SLAVIC
|
|
|
|
.It CAMBODIAN KM ASIAN
|
|
|
|
.It CATALAN CA ROMANCE
|
|
|
|
.It CHINESE ZH ASIAN
|
|
|
|
.It CORSICAN CO ROMANCE
|
|
|
|
.It CROATIAN HR SLAVIC
|
|
|
|
.It CZECH CS SLAVIC
|
|
|
|
.It DANISH DA GERMANIC
|
|
|
|
.It DUTCH NL GERMANIC
|
|
|
|
.It ENGLISH EN GERMANIC
|
|
|
|
.It ESPERANTO EO INTERNATIONAL AUX.
|
|
|
|
.It ESTONIAN ET FINNO-UGRIC
|
|
|
|
.It FAROESE FO GERMANIC
|
|
|
|
.It FIJI FJ OCEANIC/INDONESIAN
|
|
|
|
.It FINNISH FI FINNO-UGRIC
|
|
|
|
.It FRENCH FR ROMANCE
|
|
|
|
.It FRISIAN FY GERMANIC
|
|
|
|
.It GALICIAN GL ROMANCE
|
|
|
|
.It GEORGIAN KA IBERO-CAUCASIAN
|
|
|
|
.It GERMAN DE GERMANIC
|
|
|
|
.It GREEK EL LATIN/GREEK
|
|
|
|
.It GREENLANDIC KL ESKIMO
|
|
|
|
.It GUARANI GN AMERINDIAN
|
|
|
|
.It GUJARATI GU INDIAN
|
|
|
|
.It HAUSA HA NEGRO-AFRICAN
|
|
|
|
.It HEBREW HE SEMITIC
|
|
|
|
.It HINDI HI INDIAN
|
|
|
|
.It HUNGARIAN HU FINNO-UGRIC
|
|
|
|
.It ICELANDIC IS GERMANIC
|
|
|
|
.It INDONESIAN ID OCEANIC/INDONESIAN
|
|
|
|
.It INTERLINGUA IA INTERNATIONAL AUX.
|
|
|
|
.It INTERLINGUE IE INTERNATIONAL AUX.
|
|
|
|
.It INUKTITUT Ta IU Ta ""
|
|
|
|
.It INUPIAK IK ESKIMO
|
|
|
|
.It IRISH GA CELTIC
|
|
|
|
.It ITALIAN IT ROMANCE
|
|
|
|
.It JAPANESE JA ASIAN
|
|
|
|
.It JAVANESE JV OCEANIC/INDONESIAN
|
|
|
|
.It KANNADA KN DRAVIDIAN
|
|
|
|
.It KASHMIRI KS INDIAN
|
|
|
|
.It KAZAKH KK TURKIC/ALTAIC
|
|
|
|
.It KINYARWANDA RW NEGRO-AFRICAN
|
|
|
|
.It KIRGHIZ KY TURKIC/ALTAIC
|
|
|
|
.It KURUNDI RN NEGRO-AFRICAN
|
|
|
|
.It KOREAN KO ASIAN
|
|
|
|
.It KURDISH KU IRANIAN
|
|
|
|
.It LAOTHIAN LO ASIAN
|
|
|
|
.It LATIN LA LATIN/GREEK
|
|
|
|
.It LATVIAN LV BALTIC
|
|
|
|
.It LINGALA LN NEGRO-AFRICAN
|
|
|
|
.It LITHUANIAN LT BALTIC
|
|
|
|
.It MACEDONIAN MK SLAVIC
|
|
|
|
.It MALAGASY MG OCEANIC/INDONESIAN
|
|
|
|
.It MALAY MS OCEANIC/INDONESIAN
|
|
|
|
.It MALAYALAM ML DRAVIDIAN
|
|
|
|
.It MALTESE MT SEMITIC
|
|
|
|
.It MAORI MI OCEANIC/INDONESIAN
|
|
|
|
.It MARATHI MR INDIAN
|
|
|
|
.It MOLDAVIAN MO ROMANCE
|
|
|
|
.It MONGOLIAN Ta MN Ta ""
|
|
|
|
.It NAURU Ta NA Ta ""
|
|
|
|
.It NEPALI NE INDIAN
|
|
|
|
.It NORWEGIAN NO GERMANIC
|
|
|
|
.It OCCITAN OC ROMANCE
|
|
|
|
.It ORIYA OR INDIAN
|
|
|
|
.It PASHTO PS IRANIAN
|
|
|
|
.It PERSIAN (farsi) FA IRANIAN
|
|
|
|
.It POLISH PL SLAVIC
|
|
|
|
.It PORTUGUESE PT ROMANCE
|
|
|
|
.It PUNJABI PA INDIAN
|
|
|
|
.It QUECHUA QU AMERINDIAN
|
|
|
|
.It RHAETO-ROMANCE RM ROMANCE
|
|
|
|
.It ROMANIAN RO ROMANCE
|
|
|
|
.It RUSSIAN RU SLAVIC
|
|
|
|
.It SAMOAN SM OCEANIC/INDONESIAN
|
|
|
|
.It SANGHO SG NEGRO-AFRICAN
|
|
|
|
.It SANSKRIT SA INDIAN
|
|
|
|
.It SCOTS GAELIC GD CELTIC
|
|
|
|
.It SERBIAN SR SLAVIC
|
|
|
|
.It SERBO-CROATIAN SH SLAVIC
|
|
|
|
.It SESOTHO ST NEGRO-AFRICAN
|
|
|
|
.It SETSWANA TN NEGRO-AFRICAN
|
|
|
|
.It SHONA SN NEGRO-AFRICAN
|
|
|
|
.It SINDHI SD INDIAN
|
|
|
|
.It SINGHALESE SI INDIAN
|
|
|
|
.It SISWATI SS NEGRO-AFRICAN
|
|
|
|
.It SLOVAK SK SLAVIC
|
|
|
|
.It SLOVENIAN SL SLAVIC
|
|
|
|
.It SOMALI SO HAMITIC
|
|
|
|
.It SPANISH ES ROMANCE
|
|
|
|
.It SUNDANESE SU OCEANIC/INDONESIAN
|
|
|
|
.It SWAHILI SW NEGRO-AFRICAN
|
|
|
|
.It SWEDISH SV GERMANIC
|
|
|
|
.It TAGALOG TL OCEANIC/INDONESIAN
|
|
|
|
.It TAJIK TG IRANIAN
|
|
|
|
.It TAMIL TA DRAVIDIAN
|
|
|
|
.It TATAR TT TURKIC/ALTAIC
|
|
|
|
.It TELUGU TE DRAVIDIAN
|
|
|
|
.It THAI TH ASIAN
|
|
|
|
.It TIBETAN BO ASIAN
|
|
|
|
.It TIGRINYA TI SEMITIC
|
|
|
|
.It TONGA TO OCEANIC/INDONESIAN
|
|
|
|
.It TSONGA TS NEGRO-AFRICAN
|
|
|
|
.It TURKISH TR TURKIC/ALTAIC
|
|
|
|
.It TURKMEN TK TURKIC/ALTAIC
|
|
|
|
.It TWI TW NEGRO-AFRICAN
|
|
|
|
.It UIGUR Ta UG Ta ""
|
|
|
|
.It UKRAINIAN UK SLAVIC
|
|
|
|
.It URDU UR INDIAN
|
|
|
|
.It UZBEK UZ TURKIC/ALTAIC
|
|
|
|
.It VIETNAMESE VI ASIAN
|
|
|
|
.It VOLAPUK VO INTERNATIONAL AUX.
|
|
|
|
.It WELSH CY CELTIC
|
|
|
|
.It WOLOF WO NEGRO-AFRICAN
|
|
|
|
.It XHOSA XH NEGRO-AFRICAN
|
|
|
|
.It YIDDISH YI GERMANIC
|
|
|
|
.It YORUBA YO NEGRO-AFRICAN
|
|
|
|
.It ZHUANG Ta ZA Ta ""
|
|
|
|
.It ZULU ZU NEGRO-AFRICAN
|
|
|
|
.El
|
2003-05-08 02:41:26 +04:00
|
|
|
.Pp
|
2003-02-12 05:42:44 +03:00
|
|
|
For example, the locale for the Danish language spoken in Denmark
|
2007-02-21 03:11:55 +03:00
|
|
|
using the ISO 8859-1 character set is da_DK.ISO8859-1.
|
2003-02-12 21:48:14 +03:00
|
|
|
The da stands for the Danish language and the DK stands for Denmark.
|
|
|
|
The short form of da_DK is sufficient to indicate this locale.
|
2003-02-12 05:42:44 +03:00
|
|
|
.Pp
|
|
|
|
The environment variable settings are queried by their priority level
|
|
|
|
in the following manner:
|
|
|
|
.Pp
|
|
|
|
.Bl -bullet
|
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
If the
|
|
|
|
.Ev LC_ALL
|
|
|
|
environment variable is set, all six categories use the locale it
|
|
|
|
specifies.
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
If the
|
|
|
|
.Ev LC_ALL
|
|
|
|
environment variable is not set, each individual category uses the
|
|
|
|
locale specified by its corresponding environment variable.
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
If the
|
|
|
|
.Ev LC_ALL
|
|
|
|
environment variable is not set, and a value for a particular
|
|
|
|
.Ev LC_*
|
|
|
|
environment variable is not set, the value of the
|
|
|
|
.Ev LANG
|
2003-04-14 10:47:12 +04:00
|
|
|
environment variable specifies the default locale for all categories.
|
|
|
|
Only the
|
2003-02-12 21:48:14 +03:00
|
|
|
.Ev LANG
|
2003-04-14 10:47:12 +04:00
|
|
|
environment variable should be set in /etc/profile, since it makes it
|
|
|
|
most easy for the user to override the system default using the individual
|
2003-02-12 21:48:14 +03:00
|
|
|
.Ev LC_*
|
2003-04-14 10:47:12 +04:00
|
|
|
variables.
|
2003-02-12 05:42:44 +03:00
|
|
|
.It
|
2003-02-12 21:48:14 +03:00
|
|
|
If the
|
|
|
|
.Ev LC_ALL
|
|
|
|
environment variable is not set, a value for a particular
|
|
|
|
.Ev LC_*
|
|
|
|
environment variable is not set, and the value of the
|
|
|
|
.Ev LANG
|
|
|
|
environment variable is not set, the locale for that specific
|
|
|
|
category defaults to the C locale.
|
2007-02-21 03:11:55 +03:00
|
|
|
The C or POSIX locale assumes the ASCII character set and defines
|
2003-02-12 21:48:14 +03:00
|
|
|
information for the six categories.
|
2003-02-12 05:42:44 +03:00
|
|
|
.El
|
2003-05-17 06:57:39 +04:00
|
|
|
.Ss Character Sets
|
2003-02-12 05:42:44 +03:00
|
|
|
A character is any symbol used for the organization, control, or
|
2003-02-12 21:48:14 +03:00
|
|
|
representation of data.
|
|
|
|
A group of such symbols used to describe a
|
|
|
|
particular language make up a character set.
|
2003-05-17 06:57:39 +04:00
|
|
|
It is the encoding values in a character set that provide
|
2003-02-12 05:42:44 +03:00
|
|
|
the interface between the system and its input and output devices.
|
|
|
|
.Pp
|
2003-05-17 06:57:39 +04:00
|
|
|
The following character sets are supported in
|
2007-02-21 03:11:55 +03:00
|
|
|
.Nx :
|
|
|
|
.Bl -tag -width ISO_8859_family
|
|
|
|
.It ASCII
|
|
|
|
The American Standard Code for Information Exchange (ASCII) standard
|
|
|
|
specifies 128 Roman characters and control codes, encoded in a 7-bit
|
|
|
|
character encoding scheme.
|
|
|
|
.It ISO 8859 family
|
|
|
|
Industry-standard character sets specified by the ISO/IEC 8859
|
|
|
|
standard.
|
|
|
|
The standard is divided into 15 numbered parts, with each
|
|
|
|
part specifying broad script similarities.
|
|
|
|
Examples include Western European, Central European, Arabic, Cyrillic,
|
|
|
|
Hebrew, Greek, and Turkish.
|
2007-03-02 23:28:54 +03:00
|
|
|
The character sets use an 8-bit character encoding scheme which is
|
2007-02-21 03:11:55 +03:00
|
|
|
compatible with the ASCII character set.
|
2003-02-12 05:42:44 +03:00
|
|
|
.It Unicode
|
2007-02-21 03:11:55 +03:00
|
|
|
The Unicode character set is the full set of known abstract characters of
|
|
|
|
all real-world scripts. It can be used in environments where multiple
|
|
|
|
scripts must be processed simultaneously.
|
|
|
|
Unicode is compatible with ISO 8859-1 (Western European) and ASCII.
|
|
|
|
Many character encoding schemes are available for Unicode, including UTF-8,
|
|
|
|
UTF-16 and UTF-32.
|
|
|
|
These encoding schemes are multi-byte encodings.
|
|
|
|
The UTF-8 encoding scheme uses 8-bit, variable-width encodings which is
|
|
|
|
compatible with ASCII.
|
|
|
|
The UTF-16 encoding scheme uses 16-bit, variable-width encodings.
|
|
|
|
The UTF-32 encoding scheme using 32-bit, fixed-width encodings.
|
2003-02-12 05:42:44 +03:00
|
|
|
.El
|
2003-05-17 06:57:39 +04:00
|
|
|
.Ss Font Sets
|
|
|
|
A font set contains the glyphs to be displayed on the screen for a
|
|
|
|
corresponding character in a character set.
|
|
|
|
A display must support a suitable font to display a character set.
|
|
|
|
If suitable fonts are available to the X server, then X clients can
|
|
|
|
include support for different character sets.
|
|
|
|
.Xr xterm 1
|
2007-02-21 03:11:55 +03:00
|
|
|
includes support for Unicode with UTF-8 encoding.
|
2003-05-17 07:03:32 +04:00
|
|
|
.Xr xfd 1
|
|
|
|
is useful for displaying all the characters in an X font.
|
2003-05-17 06:57:39 +04:00
|
|
|
.Pp
|
2003-05-18 13:10:51 +04:00
|
|
|
The
|
|
|
|
.Nx
|
2003-05-17 06:57:39 +04:00
|
|
|
.Xr wscons 4
|
|
|
|
console provides support for loading fonts using the
|
|
|
|
.Xr wsfontload 8
|
|
|
|
utility.
|
|
|
|
Currently, only fonts for the ISO8859-1 family of character sets are
|
|
|
|
supported.
|
2003-02-12 05:42:44 +03:00
|
|
|
.Ss Internationalization for Programmers
|
|
|
|
To facilitate translations of messages into various languages and to
|
|
|
|
make the translated messages available to the program based on a
|
|
|
|
user's locale, it is necessary to keep messages separate from the
|
|
|
|
programs and provide them in the form of message catalogs that a
|
|
|
|
program can access at run time.
|
|
|
|
.Pp
|
|
|
|
Access to locale information is provided through the
|
|
|
|
.Xr setlocale 3
|
|
|
|
and
|
|
|
|
.Xr nl_langinfo 3
|
2003-02-12 21:48:14 +03:00
|
|
|
interfaces.
|
|
|
|
See their respective man pages for further information.
|
2003-02-12 05:42:44 +03:00
|
|
|
.Pp
|
|
|
|
Message source files containing application messages are created by
|
2003-02-12 21:48:14 +03:00
|
|
|
the programmer and converted to message catalogs.
|
|
|
|
These catalogs are used by the application to retrieve and display
|
|
|
|
messages, as needed.
|
2003-02-12 05:42:44 +03:00
|
|
|
.Pp
|
|
|
|
.Nx
|
|
|
|
supports two message catalog interfaces: the X/Open
|
|
|
|
.Xr catgets 3
|
2003-02-12 21:48:14 +03:00
|
|
|
interface and the Uniforum
|
2003-02-12 05:42:44 +03:00
|
|
|
.Xr gettext 3
|
2003-02-12 21:48:14 +03:00
|
|
|
interface.
|
|
|
|
The
|
|
|
|
.Xr catgets 3
|
2003-02-12 05:42:44 +03:00
|
|
|
interface has the advantage that it belongs to a standard which is
|
2003-02-12 21:48:14 +03:00
|
|
|
well supported.
|
|
|
|
Unfortunately the interface is complicated to use and
|
|
|
|
maintenance of the catalogs is difficult.
|
2003-05-17 06:57:39 +04:00
|
|
|
The implementation also doesn't support different character sets.
|
2003-02-12 21:48:14 +03:00
|
|
|
The
|
2003-02-12 05:42:44 +03:00
|
|
|
.Xr gettext 3
|
|
|
|
interface has not been standardized yet, however it is being supported
|
2003-02-12 21:48:14 +03:00
|
|
|
by an increasing number of systems.
|
|
|
|
It also provides many additional tools which make programming and
|
|
|
|
catalog maintenance much easier.
|
2007-02-21 03:11:55 +03:00
|
|
|
.Ss Support for Multi-byte Encodings
|
|
|
|
Some character sets with multi-byte encodings may be difficult to decode,
|
|
|
|
or may contain state (i.e., adjacent characters are dependent).
|
2003-05-18 13:10:51 +04:00
|
|
|
ISO C specifies a set of functions using 'wide characters' which can handle
|
2007-02-21 03:11:55 +03:00
|
|
|
multi-byte encodings properly.
|
|
|
|
The behaviour of these functions is affected
|
2007-03-02 23:28:54 +03:00
|
|
|
by the
|
|
|
|
.Ev LC_CTYPE
|
|
|
|
category of the current locale.
|
2007-02-21 03:11:55 +03:00
|
|
|
.Pp
|
2003-05-18 13:10:51 +04:00
|
|
|
A wide character is specified in ISO C
|
2003-05-17 06:57:39 +04:00
|
|
|
as being a fixed number of bits wide and is stateless.
|
|
|
|
There are two types for wide characters:
|
|
|
|
.Em wchar_t
|
|
|
|
and
|
|
|
|
.Em wint_t .
|
|
|
|
.Em wchar_t
|
2003-06-26 15:55:56 +04:00
|
|
|
is a type which can contain one wide character and operates like 'char'
|
|
|
|
type does for one character.
|
2003-05-17 06:57:39 +04:00
|
|
|
.Em wint_t
|
|
|
|
can contain one wide character or WEOF (wide EOF).
|
|
|
|
.Pp
|
|
|
|
There are functions that operate on
|
|
|
|
.Em wchar_t ,
|
|
|
|
and substitute for functions operating on 'char'.
|
|
|
|
See
|
|
|
|
.Xr wmemchr 3
|
|
|
|
and
|
2003-05-18 13:10:51 +04:00
|
|
|
.Xr towlower 3
|
2003-05-17 06:57:39 +04:00
|
|
|
for details.
|
|
|
|
There are some additional functions that operate on
|
|
|
|
.Em wchar_t .
|
|
|
|
See
|
|
|
|
.Xr wctype 3
|
|
|
|
and
|
2007-03-02 23:28:54 +03:00
|
|
|
.Xr wctrans 3
|
2003-05-17 06:57:39 +04:00
|
|
|
for details.
|
|
|
|
.Pp
|
|
|
|
Wide characters should be used for all I/O processing which may rely
|
2003-05-18 13:10:51 +04:00
|
|
|
on locale-specific strings.
|
|
|
|
The two primary issues requiring special use of wide characters are:
|
2003-06-26 15:53:18 +04:00
|
|
|
.Bl -bullet -offset indent
|
2003-05-17 06:57:39 +04:00
|
|
|
.It
|
|
|
|
All I/O is performed using multibyte characters.
|
|
|
|
Input data is converted into wide characters immediately after
|
|
|
|
reading and data for output is converted from wide characters to
|
2007-02-21 03:11:55 +03:00
|
|
|
multi-byte encoding immediately before writing.
|
|
|
|
Conversion is controlled by the
|
2003-05-17 06:57:39 +04:00
|
|
|
.Xr mbstowcs 3 ,
|
|
|
|
.Xr mbsrtowcs 3 ,
|
|
|
|
.Xr wcstombs 3 ,
|
|
|
|
.Xr wcsrtombs 3 ,
|
2003-05-18 13:10:51 +04:00
|
|
|
.Xr mblen 3 ,
|
2003-05-17 06:57:39 +04:00
|
|
|
.Xr mbrlen 3 ,
|
|
|
|
and
|
|
|
|
.Xr mbsinit 3 .
|
|
|
|
.It
|
|
|
|
Wide characters are used directly for I/O, using
|
|
|
|
.Xr getwchar 3 ,
|
2003-05-18 13:10:51 +04:00
|
|
|
.Xr fgetwc 3 ,
|
|
|
|
.Xr getwc 3 ,
|
2003-05-17 06:57:39 +04:00
|
|
|
.Xr ungetwc 3 ,
|
|
|
|
.Xr fgetws 3 ,
|
|
|
|
.Xr putwchar 3 ,
|
|
|
|
.Xr fputwc 3 ,
|
|
|
|
.Xr putwc 3 ,
|
|
|
|
and
|
|
|
|
.Xr fputws 3 .
|
|
|
|
They are also used for formatted I/O functions for wide characters
|
|
|
|
such as
|
|
|
|
.Xr fwscanf 3 ,
|
|
|
|
.Xr wscanf 3 ,
|
|
|
|
.Xr swscanf 3 ,
|
|
|
|
.Xr fwprintf 3 ,
|
|
|
|
.Xr wprintf 3 ,
|
|
|
|
.Xr swprintf 3 ,
|
|
|
|
.Xr vfwprintf 3 ,
|
|
|
|
.Xr vwprintf 3 ,
|
|
|
|
and
|
|
|
|
.Xr vswprintf 3 ,
|
|
|
|
and wide character identifier of %lc, %C, %ls, %S for conventional
|
|
|
|
formatted I/O functions.
|
|
|
|
.El
|
2003-02-12 05:42:44 +03:00
|
|
|
.Sh SEE ALSO
|
|
|
|
.Xr gencat 1 ,
|
2003-05-17 07:03:32 +04:00
|
|
|
.Xr xfd 1 ,
|
2003-05-18 13:10:51 +04:00
|
|
|
.Xr xterm 1 ,
|
2003-02-12 05:42:44 +03:00
|
|
|
.Xr catgets 3 ,
|
|
|
|
.Xr gettext 3 ,
|
|
|
|
.Xr nl_langinfo 3 ,
|
2003-05-17 06:57:39 +04:00
|
|
|
.Xr setlocale 3 ,
|
|
|
|
.Xr wsfontload 8
|
2003-02-12 05:42:44 +03:00
|
|
|
.Sh BUGS
|
|
|
|
This man page is incomplete.
|