postgres/doc/README.Charsets

114 lines
4.9 KiB
Plaintext

PostgreSQL Charsets README
Josef Balatka, <balatka@email.cz>
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999
This document is a brief overview of the national charsets support
that PostgreSQL ver. 6.5 has implemented. Various compilation options
and setup tips are mentioned here to be helpful in the particular use.
---------------------------------------------------------------------------
Table of Contents
1. Locale awareness
2. Single-byte charsets recoding
3. Multi-byte support/recoding
4. Credits
---------------------------------------------------------------------------
1. Locale awareness
PostgreSQL server supports both locale aware and locale not aware
(default) operational modes. You can determine this mode during the
configuration stage of the installation with --enable-locale option.
If you don't use --enable-locale, the multi-language code will not be
compiled and PostgreSQL will behave as an ASCII compliant application.
This mode is useful for its speed but only provided that you don't
have to consider national specific chars.
With --enable-locale you will get a locale aware server using LC_*
environment variables to determine how to process national specifics.
In this case strcoll(3) and similar functions are used internally
so speed is somewhat lower.
Notice here that --enable-locale is sufficient when all your clients
use the same single-byte encoding as the database server does.
When your clients use encoding different from the server than you have
to use, moreover, --enable-recode or --with-mb=<encoding> options on
the server side or a particular client that does recoding itself (e.g.
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic
encoding capability). Option --with-mb=<encoding> is necessary for the
multi-byte charsets support.
2. Single-byte charsets recoding
You can set up this feature with --enable-recode option. This option
is described as 'enable Cyrillic recode support' which doesn't express
all its power. It can be used for *any* single-byte charset recoding.
This method uses charset.conf file located in the $PGDATA directory.
It's a typical configuration text file where spaces and newlines
separate items and records and # specifies comments. Three keywords
with the following syntax are recognized here:
BaseCharset <server_charset>
RecodeTable <from_charset> <to_charset> <file_name>
HostCharset <host_spec> <host_charset>
BaseCharset defines encoding of the database server. All charset
names are only used for mapping inside the charset.conf so you can
freely use typing-friendly names.
RecodeTable records specify translation table between server and client.
The file name is relative to the $PGDATA directory. Table file format
is very simple. There are no keywords and characters are represented by
a pair of decimal or hexadecimal (0x prefixed) values on single lines:
<char_value> <translated_char_value>
HostCharset records define IP address and charset. You can use a single
IP address, an IP mask range starting from the given address or an IP
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40)
The charset.conf is always processed up to the end, so you can easily
specify exceptions from the previous rules. In the src/data you will
find charset.conf example and a few recoding tables.
As this solution is based on the client's IP address / charset mapping
there are obviously some restrictions as well. You can't use different
encoding on the same host at the same time. It's also inconvenient when
you boot your client hosts into more operating systems.
Nevertheless, when these restrictions are not limiting and you don't
need multi-byte chars than it's a simple and effective solution.
3. Multi-byte support/recoding
It's a new generation of charset encoding in PostgreSQL designed as a
more complex solution supporting both single-byte and multi-byte chars.
You can set up this feature with --with-mb=<encoding> option.
There is no IP mapping file and recoding is controlled through the new
SQL statements. Recoding tables are included in the code. Many national
charsets are already supported and further will follow.
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how
to use the multibyte support. In the file doc/README.locale there is
a particular instruction on usage of the multibyte support with Cyrillic.
4. Credits
I'd like to thank the PostgreSQL development team and all contributors
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and
Tatsuo Ishii for opening the door into the multi-language world.