Update docs to explain that 7.1 locks down LC_COLLATE and LC_CTYPE at
initdb time. A few copy-editing cleanups, too.
This commit is contained in:
parent
671f798cc9
commit
1073123baa
@ -1,4 +1,4 @@
|
|||||||
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.5 2000/12/22 21:51:57 petere Exp $ -->
|
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.6 2001/01/19 04:47:50 tgl Exp $ -->
|
||||||
|
|
||||||
<chapter id="charset">
|
<chapter id="charset">
|
||||||
<title>Localization</>
|
<title>Localization</>
|
||||||
@ -54,7 +54,7 @@
|
|||||||
cultural preferences regarding alphabets, sorting, number
|
cultural preferences regarding alphabets, sorting, number
|
||||||
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
|
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
|
||||||
C and POSIX-like locale facilities provided by the server operating
|
C and POSIX-like locale facilities provided by the server operating
|
||||||
system. For additional information refer the documentation of your
|
system. For additional information refer to the documentation of your
|
||||||
system.
|
system.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -62,7 +62,7 @@
|
|||||||
<title>Overview</>
|
<title>Overview</>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Locale support is not build into <productname>PostgreSQL</> by
|
Locale support is not built into <productname>PostgreSQL</> by
|
||||||
default; to enable it, supply the <option>--enable-locale</> option
|
default; to enable it, supply the <option>--enable-locale</> option
|
||||||
to the <filename>configure</> script:
|
to the <filename>configure</> script:
|
||||||
<informalexample>
|
<informalexample>
|
||||||
@ -95,7 +95,7 @@ export LANG=sv_SE
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
Occasionally it is useful to mix rules from several locales, e.g.,
|
Occasionally it is useful to mix rules from several locales, e.g.,
|
||||||
use U.S. rules but Spanish messages. To do that a set of
|
use U.S. collation rules but Spanish messages. To do that a set of
|
||||||
environment variables exist that override the default of
|
environment variables exist that override the default of
|
||||||
<envar>LANG</> for a particular category:
|
<envar>LANG</> for a particular category:
|
||||||
|
|
||||||
@ -141,14 +141,23 @@ export LANG=sv_SE
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Once you have chosen a set of localization rules this way you must
|
Note that the locale behavior is determined by the environment
|
||||||
keep them fixed for any particular database cluster. That means
|
variables seen by the server, not by the environment of any client.
|
||||||
that the locales that were active when you ran <filename>initdb</>
|
Therefore, be careful to set these variables before starting the
|
||||||
must be kept the same when you start the postmaster. Otherwise,
|
postmaster.
|
||||||
the changed sort order can corrupt indexes or make your data
|
</para>
|
||||||
disappear mysteriously. It is currently not possible to change the
|
|
||||||
locales after database initialization or to use more than one set
|
<para>
|
||||||
of locales for a given database cluster.
|
The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the
|
||||||
|
sort order of indexes. Therefore, these values must be kept fixed
|
||||||
|
for any particular database cluster, or indexes on text columns will
|
||||||
|
become corrupt. <productname>Postgres</productname> enforces this
|
||||||
|
by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
|
||||||
|
that are seen by <command>initdb</>. The server automatically adopts
|
||||||
|
those two values when it is started; only the other <envar>LC_</>
|
||||||
|
categories can be set from the environment at server startup.
|
||||||
|
In short, only one collation order can be used in a database cluster,
|
||||||
|
and it is chosen at <command>initdb</> time.
|
||||||
</para>
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -183,7 +192,10 @@ export LANG=sv_SE
|
|||||||
<para>
|
<para>
|
||||||
The only severe drawback of using the locale support in
|
The only severe drawback of using the locale support in
|
||||||
<productname>PostgreSQL</> is its speed. So use locale only if you
|
<productname>PostgreSQL</> is its speed. So use locale only if you
|
||||||
actually need it.
|
actually need it. It should be noted in particular that selecting
|
||||||
|
a non-C locale disables index optimizations for <literal>LIKE</> and
|
||||||
|
<literal>~</> operators, which can make a huge difference in the
|
||||||
|
speed of searches that use those operators.
|
||||||
</para>
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -261,7 +273,7 @@ perl: warning: Falling back to the standard locale ("C").
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
<acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
|
<acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
|
||||||
character sets including ISO8859. (I would not say all of problems
|
character sets including ISO8859. (I would not say all problems
|
||||||
have been fixed. I just confirmed that the regression test ran fine
|
have been fixed. I just confirmed that the regression test ran fine
|
||||||
and a few French characters could be used with the patch. Please let
|
and a few French characters could be used with the patch. Please let
|
||||||
me know if you find any problem while using 8-bit characters.)
|
me know if you find any problem while using 8-bit characters.)
|
||||||
@ -271,7 +283,7 @@ perl: warning: Falling back to the standard locale ("C").
|
|||||||
<title>Enabling MB</title>
|
<title>Enabling MB</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Run configure with a multibyte option:
|
Run configure with the multibyte option:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
% ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
|
% ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
|
||||||
@ -383,11 +395,11 @@ perl: warning: Falling back to the standard locale ("C").
|
|||||||
% initdb -E EUC_JP
|
% initdb -E EUC_JP
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
|
sets the default encoding to EUC_JP (Extended Unix Code for Japanese).
|
||||||
Note that you can use "--encoding" instead of "-E" if you prefer
|
Note that you can use "--encoding" instead of "-E" if you prefer
|
||||||
to type longer option strings.
|
to type longer option strings.
|
||||||
If no -E or --encoding option is given, the encoding
|
If no -E or --encoding option is given, the encoding
|
||||||
specified at the compile time is used.
|
specified at configure time is used.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -397,8 +409,8 @@ perl: warning: Falling back to the standard locale ("C").
|
|||||||
% createdb -E EUC_KR korean
|
% createdb -E EUC_KR korean
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
will create a database named "korean" with EUC_KR encoding. The
|
will create a database named "korean" with EUC_KR encoding.
|
||||||
another way to accomplish this is to use a SQL command:
|
Another way to accomplish this is to use a SQL command:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
|
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
|
||||||
@ -527,20 +539,11 @@ char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
|
|||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem>
|
|
||||||
<para>
|
|
||||||
Using <envar>PGCLIENTENCODING</envar>.
|
|
||||||
|
|
||||||
If an environment variable <envar>PGCLIENTENCODING</envar> is defined in the
|
|
||||||
frontend, an automatic encoding translation is done by the backend.
|
|
||||||
</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
Using <command>SET CLIENT_ENCODING TO</command>.
|
Using <command>SET CLIENT_ENCODING TO</command>.
|
||||||
|
|
||||||
Setting the frontend side encoding can be done a SQL command:
|
Setting the frontend side encoding can be done by this SQL command:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SET CLIENT_ENCODING TO 'encoding';
|
SET CLIENT_ENCODING TO 'encoding';
|
||||||
@ -552,7 +555,7 @@ SET CLIENT_ENCODING TO 'encoding';
|
|||||||
SET NAMES 'encoding';
|
SET NAMES 'encoding';
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
To query the current the frontend encoding:
|
To query the current frontend encoding:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SHOW CLIENT_ENCODING;
|
SHOW CLIENT_ENCODING;
|
||||||
@ -565,6 +568,17 @@ RESET CLIENT_ENCODING;
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
Using <envar>PGCLIENTENCODING</envar>.
|
||||||
|
|
||||||
|
If environment variable <envar>PGCLIENTENCODING</envar> is defined
|
||||||
|
in the client's environment, that client encoding is automatically
|
||||||
|
selected when a backend connection is made. (This can subsequently
|
||||||
|
be overridden using any of the other methods mentioned above.)
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -588,7 +602,7 @@ RESET CLIENT_ENCODING;
|
|||||||
<para>
|
<para>
|
||||||
Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
|
Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
|
||||||
then some Japanese characters could not be translated into LATIN1. In
|
then some Japanese characters could not be translated into LATIN1. In
|
||||||
this case, a letter cannot be represented in the LATIN1 character set,
|
this case, a letter that cannot be represented in the LATIN1 character set
|
||||||
would be transformed as:
|
would be transformed as:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
@ -601,7 +615,7 @@ RESET CLIENT_ENCODING;
|
|||||||
<title>References</title>
|
<title>References</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
These are good sources to start learning various kind of encoding
|
These are good sources to start learning about various kinds of encoding
|
||||||
systems.
|
systems.
|
||||||
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
@ -724,8 +738,7 @@ Mar 1, 1998 PL1 released
|
|||||||
<para>
|
<para>
|
||||||
<!--
|
<!--
|
||||||
[Here is a good documentation explaining how to use WIN1250 on
|
[Here is a good documentation explaining how to use WIN1250 on
|
||||||
Windows/ODBC from Pavel Behal. Please note that Installation step 1)
|
Windows/ODBC from Pavel Behal]
|
||||||
is not necceary in 6.5.1 - Tatsuo]
|
|
||||||
|
|
||||||
Version: 0.91 for PgSQL 6.5
|
Version: 0.91 for PgSQL 6.5
|
||||||
Author: Pavel Behal
|
Author: Pavel Behal
|
||||||
@ -815,20 +828,14 @@ Sorry for my Eglish and C code, I'm not native :-)
|
|||||||
<title>WIN1250 on Windows/ODBC</title>
|
<title>WIN1250 on Windows/ODBC</title>
|
||||||
<step>
|
<step>
|
||||||
<para>
|
<para>
|
||||||
Change the three relevant files in the source directories.
|
Compile <productname>Postgres</productname> with locale enabled
|
||||||
</para>
|
|
||||||
</step>
|
|
||||||
|
|
||||||
<step>
|
|
||||||
<para>
|
|
||||||
Compile <productname>Postgres</productname> with local enabled
|
|
||||||
and the multibyte encoding set to <literal>LATIN2</literal>.
|
and the multibyte encoding set to <literal>LATIN2</literal>.
|
||||||
</para>
|
</para>
|
||||||
</step>
|
</step>
|
||||||
|
|
||||||
<step>
|
<step>
|
||||||
<para>
|
<para>
|
||||||
Set up your instalation. Do not forget to create locale
|
Set up your installation. Do not forget to create locale
|
||||||
variables in your profile (environment). For example (this may
|
variables in your profile (environment). For example (this may
|
||||||
not be correct for <emphasis>your</emphasis> environment):
|
not be correct for <emphasis>your</emphasis> environment):
|
||||||
|
|
||||||
@ -936,8 +943,8 @@ HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
|
|||||||
<para>
|
<para>
|
||||||
The <filename>charset.conf</> file is always processed up to the
|
The <filename>charset.conf</> file is always processed up to the
|
||||||
end, so you can easily specify exceptions from the previous
|
end, so you can easily specify exceptions from the previous
|
||||||
rules. In the src/data you will find charset.conf example and a few
|
rules. In the <filename>src/data/</> directory you will find an
|
||||||
recoding tables.
|
example <filename>charset.conf</> and a few recoding tables.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -945,7 +952,7 @@ HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
|
|||||||
set mapping there are obviously some restrictions as well. You
|
set mapping there are obviously some restrictions as well. You
|
||||||
cannot use different encodings on the same host at the same
|
cannot use different encodings on the same host at the same
|
||||||
time. It is also inconvenient when you boot your client hosts into
|
time. It is also inconvenient when you boot your client hosts into
|
||||||
more operating systems. Nevertheless, when these restrictions are
|
multiple operating systems. Nevertheless, when these restrictions are
|
||||||
not limiting and you do not need multi-byte characters than it is a
|
not limiting and you do not need multi-byte characters than it is a
|
||||||
simple and effective solution.
|
simple and effective solution.
|
||||||
</para>
|
</para>
|
||||||
|
Loading…
Reference in New Issue
Block a user