Clarify that surrogate pairs are not encoded in UTF-8 directly

2010-09-07 18:54:09 +00:00 · 2010-09-07 18:54:09 +00:00 · 7cd082f907
commit 7cd082f907
parent c5d94a34fb
1 changed files with 28 additions and 21 deletions
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.154 2010/09/01 18:22:29 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.155 2010/09/07 18:54:09 petere Exp $ -->
 <chapter id="sql-syntax">
 <title>SQL Syntax</title>
@ -236,12 +236,15 @@ U&amp;"d!0061t!+000061" UESCAPE '!'
   <para>
    The Unicode escape syntax works only when the server encoding is
-    UTF8.  When other server encodings are used, only code points in
+    <literal>UTF8</>.  When other server encodings are used, only code
-    the ASCII range (up to <literal>\007F</literal>) can be specified.
+    points in the ASCII range (up to <literal>\007F</literal>) can be
-    Both the 4-digit and the 6-digit form can be used to specify
+    specified.  Both the 4-digit and the 6-digit form can be used to
-    UTF-16 surrogate pairs to compose characters with code points
+    specify UTF-16 surrogate pairs to compose characters with code
-    larger than U+FFFF (although the availability of
+    points larger than U+FFFF, although the availability of the
-    the 6-digit form technically makes this unnecessary).
+    6-digit form technically makes this unnecessary.  (When surrogate
    pairs are used when the server encoding is <literal>UTF8</>, they
    are first combined into a single code point that is then encoded
    in UTF-8.)
   </para>
   <para>
@ -431,13 +434,15 @@ SELECT 'foo'      'bar';
    <para>
     The Unicode escape syntax works fully only when the server
-     encoding is UTF-8.  When other server encodings are used, only
+     encoding is <literal>UTF8</>.  When other server encodings are
-     code points in the ASCII range (up to <literal>\u007F</>) can be
+     used, only code points in the ASCII range (up
-     specified.  Both the 4-digit and the 8-digit form can be used to
+     to <literal>\u007F</>) can be specified.  Both the 4-digit and
-     specify UTF-16 surrogate pairs to compose characters with code
+     the 8-digit form can be used to specify UTF-16 surrogate pairs to
-     points larger than U+FFFF (although the
+     compose characters with code points larger than U+FFFF, although
-     availability of the 8-digit form technically makes this
+     the availability of the 8-digit form technically makes this
-     unnecessary).
+     unnecessary.  (When surrogate pairs are used when the server
     encoding is <literal>UTF8</>, they are first combined into a
     single code point that is then encoded in UTF-8.)
    </para>
    <caution>
@ -517,13 +522,15 @@ U&amp;'d!0061t!+000061' UESCAPE '!'
    <para>
     The Unicode escape syntax works only when the server encoding is
-     UTF8.  When other server encodings are used, only code points in
+     <literal>UTF8</>.  When other server encodings are used, only
-     the ASCII range (up to <literal>\007F</literal>) can be
+     code points in the ASCII range (up to <literal>\007F</literal>)
-     specified.
+     can be specified.  Both the 4-digit and the 6-digit form can be
-     Both the 4-digit and the 6-digit form can be used to specify
+     used to specify UTF-16 surrogate pairs to compose characters with
-     UTF-16 surrogate pairs to compose characters with code points
+     code points larger than U+FFFF, although the availability of the
-     larger than U+FFFF (although the availability
+     6-digit form technically makes this unnecessary.  (When surrogate
-     of the 6-digit form technically makes this unnecessary).
+     pairs are used when the server encoding is <literal>UTF8</>, they
     are first combined into a single code point that is then encoded
     in UTF-8.)
    </para>
    <para>