Put documentation on XML data type and functions in better positions. Add

some index terms.
2007-04-02 15:27:02 +00:00 · 2007-04-02 15:27:02 +00:00 · 626b4416b9
commit 626b4416b9
parent b7d3a84539
3 changed files with 807 additions and 797 deletions
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.118 2007/03/26 01:41:57 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.119 2007/04/02 15:27:02 petere Exp $ -->

 <chapter Id="runtime-config">
  <title>Server Configuration</title>
@ -3591,7 +3591,7 @@ SELECT * FROM parent WHERE key = 2400;
       <primary><varname>SET XML OPTION</></primary>
      </indexterm>
      <indexterm>
-       <primary><varname>XML option</></primary>
+       <primary>XML option</primary>
      </indexterm>
      <listitem>
       <para>
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.192 2007/04/02 03:49:36 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/datatype.sgml,v 1.193 2007/04/02 15:27:02 petere Exp $ -->

 <chapter id="datatype">
  <title id="datatype-title">Data Types</title>
@ -3190,6 +3190,144 @@ SELECT * FROM test;

  </sect1>

+  <sect1 id="datatype-xml">
+   <title><acronym>XML</> Type</title>
+
+   <indexterm zone="datatype-xml">
+    <primary>XML</primary>
+   </indexterm>
+
+   <para>
+    The data type <type>xml</type> can be used to store XML data.  Its
+    advantage over storing XML data in a <type>text</type> field is that it
+    checks the input values for well-formedness, and there are support
+    functions to perform type-safe operations on it; see <xref
+    linkend="functions-xml">.
+   </para>
+
+   <para>
+    In particular, the <type>xml</type> type can store well-formed
+    <quote>documents</quote>, as defined by the XML standard, as well
+    as <quote>content</quote> fragments, which are defined by the
+    production <literal>XMLDecl? content</literal> in the XML
+    standard.  Roughly, this means that content fragments can have
+    more than one top-level element or character node.  The expression
+    <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
+    can be used to evaluate whether a particular <type>xml</type>
+    value is a full document or only a content fragment.
+   </para>
+
+   <para>
+    To produce a value of type <type>xml</type> from character data,
+    use the function
+    <function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm>
+<synopsis>
+XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
+</synopsis>
+    Examples:
+<programlisting><![CDATA[
+XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
+XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
+]]></programlisting>
+    While this is the only way to convert character strings into XML
+    values according to the SQL standard, the PostgreSQL-specific
+    syntaxes:
+<programlisting><![CDATA[
+xml '<foo>bar</foo>'
+'<foo>bar</foo>'::xml
+]]></programlisting>
+    can also be used.
+   </para>
+
+   <para>
+    The <type>xml</type> type does not validate its input values
+    against a possibly included document type declaration
+    (DTD).<indexterm><primary>DTD</primary></indexterm>
+   </para>
+
+   <para>
+    The inverse operation, producing character string type values from
+    <type>xml</type>, uses the function
+    <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm>
+<synopsis>
+XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
+</synopsis>
+    <replaceable>type</replaceable> can be one of
+    <type>character</type>, <type>character varying</type>, or
+    <type>text</type> (or an alias name for those).  Again, according
+    to the SQL standard, this is the only way to convert between type
+    <type>xml</type> and character types, but PostgreSQL also allows
+    you to simply cast the value.
+   </para>
+
+   <para>
+    When character string values are cast to or from type
+    <type>xml</type> without going through <type>XMLPARSE</type> or
+    <type>XMLSERIALIZE</type>, respectively, the choice of
+    <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
+    determined by the <quote>XML option</quote>
+    <indexterm><primary>XML option</primary></indexterm>
+    session configuration parameter, which can be set using the
+    standard command
+<synopsis>
+SET XML OPTION { DOCUMENT | CONTENT };
+</synopsis>
+    or the more PostgreSQL-like syntax
+<synopsis>
+SET xmloption TO { DOCUMENT | CONTENT };
+</synopsis>
+    The default is <literal>CONTENT</literal>, so all forms of XML
+    data are allowed.
+   </para>
+
+   <para>
+    Care must be taken when dealing with multiple character encodings
+    on the client, server, and in the XML data passed through them.
+    When using the text mode to pass queries to the server and query
+    results to the client (which is the normal mode), PostgreSQL
+    converts all character data passed between the client and the
+    server and vice versa to the character encoding of the respective
+    end; see <xref linkend="multibyte">.  This includes string
+    representations of XML values, such as in the above examples.
+    This would ordinarily mean that encoding declarations contained in
+    XML data might become invalid as the character data is converted
+    to other encodings while travelling between client and server,
+    while the embedded encoding declaration is not changed.  To cope
+    with this behavior, an encoding declaration contained in a
+    character string presented for input to the <type>xml</type> type
+    is <emphasis>ignored</emphasis>, and the content is always assumed
+    to be in the current server encoding.  Consequently, for correct
+    processing, such character strings of XML data must be sent off
+    from the client in the current client encoding.  It is the
+    responsibility of the client to either convert the document to the
+    current client encoding before sending it off to the server or to
+    adjust the client encoding appropriately.  On output, values of
+    type <type>xml</type> will not have an encoding declaration, and
+    clients must assume that the data is in the current client
+    encoding.
+   </para>
+
+   <para>
+    When using the binary mode to pass query parameters to the server
+    and query results back the the client, no character set conversion
+    is performed, so the situation is different.  In this case, an
+    encoding declaration in the XML data will be observed, and if it
+    is absent, the data will be assumed to be in UTF-8 (as required by
+    the XML standard; note that PostgreSQL does not support UTF-16 at
+    all).  On output, data will have an encoding declaration
+    specifying the client encoding, unless the client encoding is
+    UTF-8, in which case it will be omitted.
+   </para>
+
+   <para>
+    Needless to say, processing XML data with PostgreSQL will be less
+    error-prone and more efficient if data encoding, client encoding,
+    and server encoding are the same.  Since XML data is internally
+    processed in UTF-8, computations will be most efficient if the
+    server encoding is also UTF-8.
+   </para>
+  </sect1>
+
  &array;

  &rowtypes;
@ -3579,138 +3717,4 @@ SELECT * FROM pg_attribute

  </sect1>

-  <sect1 id="datatype-xml">
-   <title><acronym>XML</> Type</title>
-
-   <indexterm zone="datatype-xml">
-    <primary>XML</primary>
-   </indexterm>
-
-   <para>
-    The data type <type>xml</type> can be used to store XML data.  Its
-    advantage over storing XML data in a <type>text</type> field is that it
-    checks the input values for well-formedness, and there are support
-    functions to perform type-safe operations on it; see <xref
-    linkend="functions-xml">.
-   </para>
-
-   <para>
-    In particular, the <type>xml</type> type can store well-formed
-    <quote>documents</quote>, as defined by the XML standard, as well
-    as <quote>content</quote> fragments, which are defined by the
-    production <literal>XMLDecl? content</literal> in the XML
-    standard.  Roughly, this means that content fragments can have
-    more than one top-level element or character node.  The expression
-    <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal>
-    can be used to evaluate whether a particular <type>xml</type>
-    value is a full document or only a content fragment.
-   </para>
-
-   <para>
-    To produce a value of type <type>xml</type> from character data,
-    use the function <function>xmlparse</function>:
-<synopsis>
-XMLPARSE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>)
-</synopsis>
-    Examples:
-<programlisting><![CDATA[
-XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter><book>')
-XMLPARSE (CONTENT 'abc<foo>bar</bar><bar>foo</foo>')
-]]></programlisting>
-    While this is the only way to convert character strings into XML
-    values according to the SQL standard, the PostgreSQL-specific
-    syntaxes:
-<programlisting><![CDATA[
-xml '<foo>bar</foo>'
-'<foo>bar</foo>'::xml
-]]></programlisting>
-    can also be used.
-   </para>
-
-   <para>
-    The <type>xml</type> type does not validate its input values
-    against a possibly included document type declaration (DTD).
-   </para>
-
-   <para>
-    The inverse operation, producing character string type values from
-    <type>xml</type>, uses the function
-    <function>xmlserialize</function>:
-<synopsis>
-XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable> AS <replaceable>type</replaceable> )
-</synopsis>
-    <replaceable>type</replaceable> can be one of
-    <type>character</type>, <type>character varying</type>, or
-    <type>text</type> (or an alias name for those).  Again, according
-    to the SQL standard, this is the only way to convert between type
-    <type>xml</type> and character types, but PostgreSQL also allows
-    you to simply cast the value.
-   </para>
-
-   <para>
-    When character string values are cast to or from type
-    <type>xml</type> without going through <type>XMLPARSE</type> or
-    <type>XMLSERIALIZE</type>, respectively, the choice of
-    <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is
-    determined by the <quote>XML option</quote> session configuration
-    parameter, which can be set using the standard command
-<synopsis>
-SET XML OPTION { DOCUMENT | CONTENT };
-</synopsis>
-    or the more PostgreSQL-like syntax
-<synopsis>
-SET xmloption TO { DOCUMENT | CONTENT };
-</synopsis>
-    The default is <literal>CONTENT</literal>, so all forms of XML
-    data are allowed.
-   </para>
-
-   <para>
-    Care must be taken when dealing with multiple character encodings
-    on the client, server, and in the XML data passed through them.
-    When using the text mode to pass queries to the server and query
-    results to the client (which is the normal mode), PostgreSQL
-    converts all character data passed between the client and the
-    server and vice versa to the character encoding of the respective
-    end; see <xref linkend="multibyte">.  This includes string
-    representations of XML values, such as in the above examples.
-    This would ordinarily mean that encoding declarations contained in
-    XML data might become invalid as the character data is converted
-    to other encodings while travelling between client and server,
-    while the embedded encoding declaration is not changed.  To cope
-    with this behavior, an encoding declaration contained in a
-    character string presented for input to the <type>xml</type> type
-    is <emphasis>ignored</emphasis>, and the content is always assumed
-    to be in the current server encoding.  Consequently, for correct
-    processing, such character strings of XML data must be sent off
-    from the client in the current client encoding.  It is the
-    responsibility of the client to either convert the document to the
-    current client encoding before sending it off to the server or to
-    adjust the client encoding appropriately.  On output, values of
-    type <type>xml</type> will not have an encoding declaration, and
-    clients must assume that the data is in the current client
-    encoding.
-   </para>
-
-   <para>
-    When using the binary mode to pass query parameters to the server
-    and query results back the the client, no character set conversion
-    is performed, so the situation is different.  In this case, an
-    encoding declaration in the XML data will be observed, and if it
-    is absent, the data will be assumed to be in UTF-8 (as required by
-    the XML standard; note that PostgreSQL does not support UTF-16 at
-    all).  On output, data will have an encoding declaration
-    specifying the client encoding, unless the client encoding is
-    UTF-8, in which case it will be omitted.
-   </para>
-
-   <para>
-    Needless to say, processing XML data with PostgreSQL will be less
-    error-prone and more efficient if data encoding, client encoding,
-    and server encoding are the same.  Since XML data is internally
-    processed in UTF-8, computations will be most efficient if the
-    server encoding is also UTF-8.
-   </para>
-  </sect1>
-
 </chapter>
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml