I've sent 3 mails to pgsql-patches. There are two files, one for doc
and for src/data directories, and one minor patch for doc/README.locale. Please apply. Oleg.
This commit is contained in:
parent
c5d0a1bc42
commit
972124091d
113
doc/README.Charsets
Normal file
113
doc/README.Charsets
Normal file
@ -0,0 +1,113 @@
|
|||||||
|
|
||||||
|
PostgreSQL Charsets README
|
||||||
|
Josef Balatka, <balatka@email.cz>
|
||||||
|
Draft v0.1, Tue Jul 20 15:49:07 CEST 1999
|
||||||
|
|
||||||
|
This document is a brief overview of the national charsets support
|
||||||
|
that PostgreSQL ver. 6.5 has implemented. Various compilation options
|
||||||
|
and setup tips are mentioned here to be helpful in the particular use.
|
||||||
|
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
Table of Contents
|
||||||
|
|
||||||
|
1. Locale awareness
|
||||||
|
|
||||||
|
2. Single-byte charsets recoding
|
||||||
|
|
||||||
|
3. Multi-byte support/recoding
|
||||||
|
|
||||||
|
4. Credits
|
||||||
|
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
1. Locale awareness
|
||||||
|
|
||||||
|
PostgreSQL server supports both locale aware and locale not aware
|
||||||
|
(default) operational modes. You can determine this mode during the
|
||||||
|
configuration stage of the installation with --enable-locale option.
|
||||||
|
|
||||||
|
If you don't use --enable-locale, the multi-language code will not be
|
||||||
|
compiled and PostgreSQL will behave as an ASCII compliant application.
|
||||||
|
This mode is useful for its speed but only provided that you don't
|
||||||
|
have to consider national specific chars.
|
||||||
|
|
||||||
|
With --enable-locale you will get a locale aware server using LC_*
|
||||||
|
environment variables to determine how to process national specifics.
|
||||||
|
In this case strcoll(3) and similar functions are used internally
|
||||||
|
so speed is somewhat lower.
|
||||||
|
|
||||||
|
Notice here that --enable-locale is sufficient when all your clients
|
||||||
|
use the same single-byte encoding as the database server does.
|
||||||
|
|
||||||
|
When your clients use encoding different from the server than you have
|
||||||
|
to use, moreover, --enable-recode or --with-mb=<encoding> options on
|
||||||
|
the server side or a particular client that does recoding itself (e.g.
|
||||||
|
there exists a PostgreSQL ODBC driver for Win32 with various Cyrillic
|
||||||
|
encoding capability). Option --with-mb=<encoding> is necessary for the
|
||||||
|
multi-byte charsets support.
|
||||||
|
|
||||||
|
|
||||||
|
2. Single-byte charsets recoding
|
||||||
|
|
||||||
|
You can set up this feature with --enable-recode option. This option
|
||||||
|
is described as 'enable Cyrillic recode support' which doesn't express
|
||||||
|
all its power. It can be used for *any* single-byte charset recoding.
|
||||||
|
|
||||||
|
This method uses charset.conf file located in the $PGDATA directory.
|
||||||
|
It's a typical configuration text file where spaces and newlines
|
||||||
|
separate items and records and # specifies comments. Three keywords
|
||||||
|
with the following syntax are recognized here:
|
||||||
|
|
||||||
|
BaseCharset <server_charset>
|
||||||
|
RecodeTable <from_charset> <to_charset> <file_name>
|
||||||
|
HostCharset <host_spec> <host_charset>
|
||||||
|
|
||||||
|
BaseCharset defines encoding of the database server. All charset
|
||||||
|
names are only used for mapping inside the charset.conf so you can
|
||||||
|
freely use typing-friendly names.
|
||||||
|
|
||||||
|
RecodeTable records specify translation table between server and client.
|
||||||
|
The file name is relative to the $PGDATA directory. Table file format
|
||||||
|
is very simple. There are no keywords and characters are represented by
|
||||||
|
a pair of decimal or hexadecimal (0x prefixed) values on single lines:
|
||||||
|
|
||||||
|
<char_value> <translated_char_value>
|
||||||
|
|
||||||
|
HostCharset records define IP address and charset. You can use a single
|
||||||
|
IP address, an IP mask range starting from the given address or an IP
|
||||||
|
interval (e.g. 127.0.0.1, 192.168.1.100/24, 192.168.1.20-192.168.1.40)
|
||||||
|
|
||||||
|
The charset.conf is always processed up to the end, so you can easily
|
||||||
|
specify exceptions from the previous rules. In the src/data you will
|
||||||
|
find charset.conf example and a few recoding tables.
|
||||||
|
|
||||||
|
As this solution is based on the client's IP address / charset mapping
|
||||||
|
there are obviously some restrictions as well. You can't use different
|
||||||
|
encoding on the same host at the same time. It's also inconvenient when
|
||||||
|
you boot your client hosts into more operating systems.
|
||||||
|
Nevertheless, when these restrictions are not limiting and you don't
|
||||||
|
need multi-byte chars than it's a simple and effective solution.
|
||||||
|
|
||||||
|
|
||||||
|
3. Multi-byte support/recoding
|
||||||
|
|
||||||
|
It's a new generation of charset encoding in PostgreSQL designed as a
|
||||||
|
more complex solution supporting both single-byte and multi-byte chars.
|
||||||
|
You can set up this feature with --with-mb=<encoding> option.
|
||||||
|
|
||||||
|
There is no IP mapping file and recoding is controlled through the new
|
||||||
|
SQL statements. Recoding tables are included in the code. Many national
|
||||||
|
charsets are already supported and further will follow.
|
||||||
|
|
||||||
|
See doc/README.mb, doc/README.mb.jp to get detailed instruction on how
|
||||||
|
to use the multibyte support. In the file doc/README.locale there is
|
||||||
|
a particular instruction on usage of the multibyte support with Cyrillic.
|
||||||
|
|
||||||
|
|
||||||
|
4. Credits
|
||||||
|
|
||||||
|
I'd like to thank the PostgreSQL development team and all contributors
|
||||||
|
for creating PostgreSQL. Thanks to Oleg Bartunov, Oleg Broytmann and
|
||||||
|
Tatsuo Ishii for opening the door into the multi-language world.
|
||||||
|
|
@ -1,5 +1,17 @@
|
|||||||
===========
|
===========
|
||||||
14 Apr 1999
|
1999 Jul 21
|
||||||
|
===========
|
||||||
|
|
||||||
|
Josef Balatka, <balatka@email.cz> asked us to remove RECODE and sent me
|
||||||
|
Czech ISO-8859-2 -> WIN-1250 translation table.
|
||||||
|
RECODE is no longer contains Cyrillic RECODE and will stay in PostgreSQL.
|
||||||
|
|
||||||
|
He also created some bits of documentation, mostly concerning RECODE -
|
||||||
|
see README.Charsets.
|
||||||
|
|
||||||
|
|
||||||
|
===========
|
||||||
|
1999 Apr 14
|
||||||
===========
|
===========
|
||||||
|
|
||||||
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it
|
Tatsuo Ishii <t-ishii@sra.co.jp> updated Multibyte support extending it
|
||||||
|
12
src/data/isocz-wincz.tab
Normal file
12
src/data/isocz-wincz.tab
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
#
|
||||||
|
# Czech ISO-8859-2 -> WIN-1250 translation table
|
||||||
|
#
|
||||||
|
165 188
|
||||||
|
169 138
|
||||||
|
171 141
|
||||||
|
174 142
|
||||||
|
181 190
|
||||||
|
185 154
|
||||||
|
187 157
|
||||||
|
190 158
|
||||||
|
|
Loading…
x
Reference in New Issue
Block a user