Assorted editing for collation documentation.
I made a pass over this to familiarize myself with the feature, and found some things that could be improved.
This commit is contained in:
parent
4502c8e1c0
commit
a612b17120
@ -1128,8 +1128,8 @@
|
||||
<entry><type>oid</type></entry>
|
||||
<entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry>
|
||||
<entry>
|
||||
The defined collation of the column, zero if the column does
|
||||
not have a collatable type.
|
||||
The defined collation of the column, or zero if the column is
|
||||
not of a collatable datatype.
|
||||
</entry>
|
||||
</row>
|
||||
|
||||
@ -2088,7 +2088,7 @@
|
||||
The catalog <structname>pg_collation</structname> describes the
|
||||
available collations, which are essentially mappings from an SQL
|
||||
name to operating system locale categories.
|
||||
See <xref linkend="locale"> for more information.
|
||||
See <xref linkend="collation"> for more information.
|
||||
</para>
|
||||
|
||||
<table>
|
||||
@ -2132,38 +2132,48 @@
|
||||
<entry><structfield>collencoding</structfield></entry>
|
||||
<entry><type>int4</type></entry>
|
||||
<entry></entry>
|
||||
<entry>
|
||||
Encoding to which the collation is applicable. SQL-level
|
||||
commands such as <command>ALTER COLLATION</command> only
|
||||
operate on the collation belonging to the current database
|
||||
encoding. But this field is necessary because when this
|
||||
catalog is initialized, the encoding of future databases is not
|
||||
yet known. For practical purposes, collations that do not
|
||||
match the current database encoding should be considered
|
||||
invalid or invisible. It could be useful, however, to create
|
||||
collations whose encoding does not match the database encoding
|
||||
in template databases. This would currently have to be done
|
||||
manually.
|
||||
</entry>
|
||||
<entry>Encoding to which the collation is applicable</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structfield>collcollate</structfield></entry>
|
||||
<entry><type>name</type></entry>
|
||||
<entry></entry>
|
||||
<entry>LC_COLLATE for this collation object</entry>
|
||||
<entry><symbol>LC_COLLATE</> for this collation object</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
<entry><structfield>collctype</structfield></entry>
|
||||
<entry><type>name</type></entry>
|
||||
<entry></entry>
|
||||
<entry>LC_CTYPE for this collation object</entry>
|
||||
<entry><symbol>LC_CTYPE</> for this collation object</entry>
|
||||
</row>
|
||||
</tbody>
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
Note that the unique key on this catalog is (<structfield>collname</>,
|
||||
<structfield>collencoding</>, <structfield>collnamespace</>) not just
|
||||
(<structfield>collname</>, <structfield>collnamespace</>).
|
||||
<productname>PostgreSQL</productname> generally ignores all
|
||||
collations not belonging to the current database's encoding; therefore
|
||||
it is sufficient to use a qualified SQL name
|
||||
(<replaceable>schema</>.<replaceable>name</>) to identify a collation,
|
||||
even though this is not unique according to the catalog definition.
|
||||
The current database's encoding is automatically used as an additional
|
||||
lookup key. The reason for defining the catalog this way is that
|
||||
<application>initdb</> fills it in at cluster initialization time with
|
||||
entries for all locales available on the system, so it must be able to
|
||||
hold entries for all encodings that might ever be used in the cluster.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In the <literal>template0</> database, it could be useful to create
|
||||
collations whose encoding does not match the database encoding,
|
||||
since they could match the encodings of databases later cloned from
|
||||
<literal>template0</>. This would currently have to be done manually.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="catalog-pg-conversion">
|
||||
@ -6123,12 +6133,11 @@
|
||||
<entry><literal><link linkend="catalog-pg-collation"><structname>pg_collation</structname></link>.oid</literal></entry>
|
||||
<entry><para>
|
||||
<structfield>typcollation</structfield> specifies the collation
|
||||
of the type. If a type does not support collations, this will
|
||||
be zero, collation analysis at parse time is skipped, and
|
||||
the use of <literal>COLLATE</literal> clauses with the type is
|
||||
invalid. A base type that supports collations will have
|
||||
<symbol>DEFAULT_COLLATION_OID</symbol> here. A domain can have
|
||||
another collation OID, if one was defined for the domain.
|
||||
of the type. If the type does not support collations, this will
|
||||
be zero. A base type that supports collations will have
|
||||
<symbol>DEFAULT_COLLATION_OID</symbol> here. A domain over a
|
||||
collatable type can have some other collation OID, if one was defined
|
||||
for the domain.
|
||||
</para></entry>
|
||||
</row>
|
||||
|
||||
|
@ -15,6 +15,8 @@
|
||||
Using the locale features of the operating system to provide
|
||||
locale-specific collation order, number formatting, translated
|
||||
messages, and other aspects.
|
||||
This is covered in <xref linkend="locale"> and
|
||||
<xref linkend="collation">.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@ -23,6 +25,7 @@
|
||||
Providing a number of different character sets to support storing text
|
||||
in all kinds of languages, and providing character set translation
|
||||
between client and server.
|
||||
This is covered in <xref linkend="multibyte">.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
@ -138,9 +141,12 @@ initdb --locale=sv_SE
|
||||
fixed when the database is created. You can use different settings
|
||||
for different databases, but once a database is created, you cannot
|
||||
change them for that database anymore. <literal>LC_COLLATE</literal>
|
||||
and <literal>LC_CTYPE</literal> are these type of categories. They affect
|
||||
and <literal>LC_CTYPE</literal> are these categories. They affect
|
||||
the sort order of indexes, so they must be kept fixed, or indexes on
|
||||
text columns would become corrupt. The default values for these
|
||||
text columns would become corrupt.
|
||||
(But you can alleviate this restriction using collations, as discussed
|
||||
in <xref linkend="collation">.)
|
||||
The default values for these
|
||||
categories are determined when <command>initdb</command> is run, and
|
||||
those values are used when new databases are created, unless
|
||||
specified otherwise in the <command>CREATE DATABASE</command> command.
|
||||
@ -153,7 +159,7 @@ initdb --locale=sv_SE
|
||||
linkend="runtime-config-client-format"> for details). The values
|
||||
that are chosen by <command>initdb</command> are actually only written
|
||||
into the configuration file <filename>postgresql.conf</filename> to
|
||||
serve as defaults when the server is started. If you disable these
|
||||
serve as defaults when the server is started. If you remove these
|
||||
assignments from <filename>postgresql.conf</filename> then the
|
||||
server will inherit the settings from its execution environment.
|
||||
</para>
|
||||
@ -308,17 +314,17 @@ initdb --locale=sv_SE
|
||||
<title>Collation Support</title>
|
||||
|
||||
<para>
|
||||
The collation support allows specifying the sort order and certain
|
||||
other locale aspects of data per column or per operation at run
|
||||
time. This alleviates the problem that the
|
||||
The collation feature allows specifying the sort order and certain
|
||||
other locale aspects of data per-column, or even per-operation.
|
||||
This alleviates the restriction that the
|
||||
<symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
|
||||
of a database cannot be changed after its creation.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
The collation support feature is currently only known to work on
|
||||
Linux/glibc and Mac OS X platforms.
|
||||
Collation support is currently only known to work on
|
||||
Linux (glibc) and Mac OS X platforms.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@ -326,48 +332,51 @@ initdb --locale=sv_SE
|
||||
<title>Concepts</title>
|
||||
|
||||
<para>
|
||||
Conceptually, every datum of a collatable data type has a
|
||||
collation. (Collatable data types in the base system are
|
||||
Conceptually, every expression of a collatable data type has a
|
||||
collation. (The built-in collatable data types are
|
||||
<type>text</type>, <type>varchar</type>, and <type>char</type>.
|
||||
User-defined base types can also be marked collatable.) If the
|
||||
datum is a column reference, the collation of the datum is the
|
||||
defined collation of the column. If the datum is a constant, the
|
||||
expression is a column reference, the collation of the expression is the
|
||||
defined collation of the column. If the expression is a constant, the
|
||||
collation is the default collation of the data type of the
|
||||
constant. The collation of more complex expressions is derived
|
||||
from the input collations as described below.
|
||||
constant. The collation of a more complex expression is derived
|
||||
from the collations of its inputs, as described below.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The collation of a datum can also be the <quote>default</quote>
|
||||
collation, which reverts to the locale settings defined for the
|
||||
database. In some cases, a datum can also have no known
|
||||
The collation of an expression can be the <quote>default</quote>
|
||||
collation, which means the locale settings defined for the
|
||||
database. In some cases, an expression can also have no known
|
||||
collation. In such cases, ordering operations and other
|
||||
operations that need to know the collation will fail.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When the database system has to perform an ordering or a
|
||||
comparison, it considers the collation of the input data. This
|
||||
happens in two situations: an <literal>ORDER BY</literal> clause
|
||||
and a function or operator call such as <literal><</literal>.
|
||||
The collation to apply for the performance of the <literal>ORDER
|
||||
BY</literal> clause is simply the collation of the sort key. The
|
||||
collation to apply for a function or operator call is derived from
|
||||
the arguments, as described below. Additionally, collations are
|
||||
taken into account by functions that convert between lower and
|
||||
upper case letters, that is, <function>lower</function>,
|
||||
<function>upper</function>, and <function>initcap</function>.
|
||||
comparison, it uses the collation of the input expression. This
|
||||
happens, for example, with <literal>ORDER BY</literal> clauses
|
||||
and function or operator calls such as <literal><</literal>.
|
||||
The collation to apply for an <literal>ORDER BY</literal> clause
|
||||
is simply the collation of the sort key. The collation to apply for a
|
||||
function or operator call is derived from the arguments, as described
|
||||
below. In addition to comparison operators, collations are taken into
|
||||
account by functions that convert between lower and upper case
|
||||
letters, such as <function>lower</>, <function>upper</>, and
|
||||
<function>initcap</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For a function call, the collation that is derived from combining
|
||||
the argument collations is both used for performing any
|
||||
comparisons or ordering and for the collation of the function
|
||||
result, if the result type is collatable.
|
||||
For a function or operator call, the collation that is derived by
|
||||
examining the argument collations is used at run time for performing
|
||||
the specified operation. If the result of the function or operator
|
||||
call is of a collatable data type, the collation is also used at parse
|
||||
time as the defined collation of the function or operator expression,
|
||||
in case there is a surrounding expression that requires knowledge of
|
||||
its collation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <firstterm>collation derivation</firstterm> of a datum can be
|
||||
The <firstterm>collation derivation</firstterm> of an expression can be
|
||||
implicit or explicit. This distinction affects how collations are
|
||||
combined when multiple different collations appear in an
|
||||
expression. An explicit collation derivation arises when a
|
||||
@ -379,9 +388,9 @@ initdb --locale=sv_SE
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
If any input item has an explicit collation derivation, then
|
||||
all explicitly derived collations among the input items must be
|
||||
the same, otherwise an error is raised. If an explicitly
|
||||
If any input expression has an explicit collation derivation, then
|
||||
all explicitly derived collations among the input expressions must be
|
||||
the same, otherwise an error is raised. If any explicitly
|
||||
derived collation is present, that is the result of the
|
||||
collation combination.
|
||||
</para>
|
||||
@ -389,8 +398,8 @@ initdb --locale=sv_SE
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Otherwise, all input items must have the same implicit
|
||||
collation derivation or the default collation. If an
|
||||
Otherwise, all input expressions must have the same implicit
|
||||
collation derivation or the default collation. If any
|
||||
implicitly derived collation is present, that is the result of
|
||||
the collation combination. Otherwise, the result is the
|
||||
default collation.
|
||||
@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
|
||||
A collation is an SQL schema object that maps an SQL name to
|
||||
operating system locales. In particular, it maps to a combination
|
||||
of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>. (As
|
||||
the name would indicate, the main purpose of a collation is to set
|
||||
the name would suggest, the main purpose of a collation is to set
|
||||
<symbol>LC_COLLATE</symbol>, which controls the sort order. But
|
||||
it is rarely necessary in practice to have an
|
||||
<symbol>LC_CTYPE</symbol> setting that is different from
|
||||
<symbol>LC_COLLATE</symbol>, so it is more convenient to collect
|
||||
these under one concept than to create another infrastructure for
|
||||
setting <symbol>LC_CTYPE</symbol> per datum.) Also, a collation
|
||||
is tied to a character encoding. The same collation name may
|
||||
exist for different encodings.
|
||||
setting <symbol>LC_CTYPE</symbol> per expression.) Also, a collation
|
||||
is tied to a character set encoding (see <xref linkend="multibyte">).
|
||||
The same collation name may exist for different encodings.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When a database system is initialized, <command>initdb</command>
|
||||
When a database cluster is initialized, <command>initdb</command>
|
||||
populates the system catalog <literal>pg_collation</literal> with
|
||||
collations based on all the locales it finds on the operating
|
||||
system at the time. For example, the operating system might
|
||||
@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
|
||||
collation may be created using
|
||||
the <xref linkend="sql-createcollation"> command. That command
|
||||
can also be used to create a new collation from an existing
|
||||
collation, which can be useful to be able to use operating-system
|
||||
independent collation names in applications.
|
||||
collation, which can be useful to be able to use
|
||||
operating-system-independent collation names in applications.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Within any particular database, only collations that use that
|
||||
database's encoding are of interest. Other entries in
|
||||
<literal>pg_collation</literal> are ignored. Thus, a stripped collation
|
||||
name such as <literal>de_DE</literal> can be considered unique
|
||||
within a given database even though it would not be unique globally.
|
||||
Use of the stripped collation names is recommendable, since it will
|
||||
make one less thing you need to change if you decide to change to
|
||||
another database encoding.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
@ -21,7 +21,7 @@
|
||||
CREATE COLLATION <replaceable>name</replaceable> (
|
||||
[ LOCALE = <replaceable>locale</replaceable>, ]
|
||||
[ LC_COLLATE = <replaceable>lc_collate</replaceable>, ]
|
||||
[ LC_CTYPE = <replaceable>lc_ctype</replaceable>, ]
|
||||
[ LC_CTYPE = <replaceable>lc_ctype</replaceable> ]
|
||||
)
|
||||
CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_collation</replaceable>
|
||||
</synopsis>
|
||||
@ -32,7 +32,8 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
|
||||
|
||||
<para>
|
||||
<command>CREATE COLLATION</command> defines a new collation using
|
||||
the specified operating system locales or from an existing collation.
|
||||
the specified operating system locale settings,
|
||||
or by copying an existing collation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -53,26 +54,14 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
|
||||
<para>
|
||||
The name of the collation. The collation name can be
|
||||
schema-qualified. If it is not, the collation is defined in the
|
||||
current schema. The collation name must be unique within a
|
||||
current schema. The collation name must be unique within that
|
||||
schema. (The system catalogs can contain collations with the
|
||||
same name for other encodings, but these are not usable if the
|
||||
same name for other encodings, but these are ignored if the
|
||||
database encoding does not match.)
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><replaceable>existing_collation</replaceable></term>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
The name of an existing collation to copy. The new collation
|
||||
will have the same properties as the existing one, but they
|
||||
will become independent objects.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><replaceable>locale</replaceable></term>
|
||||
|
||||
@ -80,7 +69,7 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
|
||||
<para>
|
||||
This is a shortcut for setting <symbol>LC_COLLATE</symbol>
|
||||
and <symbol>LC_CTYPE</symbol> at once. If you specify this,
|
||||
you cannot specify either of the other parameters.
|
||||
you cannot specify either of those parameters.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@ -112,6 +101,18 @@ CREATE COLLATION <replaceable>name</replaceable> FROM <replaceable>existing_coll
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><replaceable>existing_collation</replaceable></term>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
The name of an existing collation to copy. The new collation
|
||||
will have the same properties as the existing one, but they
|
||||
will become independent objects.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</refsect1>
|
||||
|
||||
@ -145,8 +146,8 @@ CREATE COLLATION french (LOCALE = 'fr_FR.utf8');
|
||||
<programlisting>
|
||||
CREATE COLLATION german FROM "de_DE";
|
||||
</programlisting>
|
||||
This can be convenient to be able to use operating-system
|
||||
independent collation names in applications.
|
||||
This can be convenient to be able to use operating-system-independent
|
||||
collation names in applications.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
|
@ -94,7 +94,7 @@ DROP COLLATION german;
|
||||
<para>
|
||||
The <command>DROP COLLATION</command> command conforms to the
|
||||
<acronym>SQL</acronym> standard, apart from the <literal>IF
|
||||
EXISTS</> option, which is a <productname>PostgreSQL</> extension..
|
||||
EXISTS</> option, which is a <productname>PostgreSQL</> extension.
|
||||
</para>
|
||||
</refsect1>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user