Clarify that surrogate pairs are not encoded in UTF-8 directly
This commit is contained in:
parent
c5d94a34fb
commit
7cd082f907
@ -1,4 +1,4 @@
|
|||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.154 2010/09/01 18:22:29 tgl Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.155 2010/09/07 18:54:09 petere Exp $ -->
|
||||||
|
|
||||||
<chapter id="sql-syntax">
|
<chapter id="sql-syntax">
|
||||||
<title>SQL Syntax</title>
|
<title>SQL Syntax</title>
|
||||||
@ -236,12 +236,15 @@ U&"d!0061t!+000061" UESCAPE '!'
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
The Unicode escape syntax works only when the server encoding is
|
The Unicode escape syntax works only when the server encoding is
|
||||||
UTF8. When other server encodings are used, only code points in
|
<literal>UTF8</>. When other server encodings are used, only code
|
||||||
the ASCII range (up to <literal>\007F</literal>) can be specified.
|
points in the ASCII range (up to <literal>\007F</literal>) can be
|
||||||
Both the 4-digit and the 6-digit form can be used to specify
|
specified. Both the 4-digit and the 6-digit form can be used to
|
||||||
UTF-16 surrogate pairs to compose characters with code points
|
specify UTF-16 surrogate pairs to compose characters with code
|
||||||
larger than U+FFFF (although the availability of
|
points larger than U+FFFF, although the availability of the
|
||||||
the 6-digit form technically makes this unnecessary).
|
6-digit form technically makes this unnecessary. (When surrogate
|
||||||
|
pairs are used when the server encoding is <literal>UTF8</>, they
|
||||||
|
are first combined into a single code point that is then encoded
|
||||||
|
in UTF-8.)
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -431,13 +434,15 @@ SELECT 'foo' 'bar';
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
The Unicode escape syntax works fully only when the server
|
The Unicode escape syntax works fully only when the server
|
||||||
encoding is UTF-8. When other server encodings are used, only
|
encoding is <literal>UTF8</>. When other server encodings are
|
||||||
code points in the ASCII range (up to <literal>\u007F</>) can be
|
used, only code points in the ASCII range (up
|
||||||
specified. Both the 4-digit and the 8-digit form can be used to
|
to <literal>\u007F</>) can be specified. Both the 4-digit and
|
||||||
specify UTF-16 surrogate pairs to compose characters with code
|
the 8-digit form can be used to specify UTF-16 surrogate pairs to
|
||||||
points larger than U+FFFF (although the
|
compose characters with code points larger than U+FFFF, although
|
||||||
availability of the 8-digit form technically makes this
|
the availability of the 8-digit form technically makes this
|
||||||
unnecessary).
|
unnecessary. (When surrogate pairs are used when the server
|
||||||
|
encoding is <literal>UTF8</>, they are first combined into a
|
||||||
|
single code point that is then encoded in UTF-8.)
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<caution>
|
<caution>
|
||||||
@ -517,13 +522,15 @@ U&'d!0061t!+000061' UESCAPE '!'
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
The Unicode escape syntax works only when the server encoding is
|
The Unicode escape syntax works only when the server encoding is
|
||||||
UTF8. When other server encodings are used, only code points in
|
<literal>UTF8</>. When other server encodings are used, only
|
||||||
the ASCII range (up to <literal>\007F</literal>) can be
|
code points in the ASCII range (up to <literal>\007F</literal>)
|
||||||
specified.
|
can be specified. Both the 4-digit and the 6-digit form can be
|
||||||
Both the 4-digit and the 6-digit form can be used to specify
|
used to specify UTF-16 surrogate pairs to compose characters with
|
||||||
UTF-16 surrogate pairs to compose characters with code points
|
code points larger than U+FFFF, although the availability of the
|
||||||
larger than U+FFFF (although the availability
|
6-digit form technically makes this unnecessary. (When surrogate
|
||||||
of the 6-digit form technically makes this unnecessary).
|
pairs are used when the server encoding is <literal>UTF8</>, they
|
||||||
|
are first combined into a single code point that is then encoded
|
||||||
|
in UTF-8.)
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user