Editorial improvements.

2002-08-11 02:43:57 +00:00 · 2002-08-11 02:43:57 +00:00 · 014a86ac47
commit 014a86ac47
parent 74ce5c93c7
1 changed files with 38 additions and 50 deletions
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.18 2002/08/10 21:03:33 momjian Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.19 2002/08/11 02:43:57 tgl Exp $
 PostgreSQL documentation
 -->
@ -73,19 +73,6 @@ CLUSTER
       </para>
      </listitem>
     </varlistentry>
     <varlistentry>
      <term><computeroutput>
 ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exist!
       </computeroutput></term>
      <listitem>
       <para>
 	<comment>
 	 The specified relation was not shown in the error message,
 	 which contained a random string instead of the relation name.
 	</comment>
       </para>
      </listitem>
     </varlistentry>
    </variablelist>
   </para>
  </refsect2>
@ -101,7 +88,7 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
  <para>
   <command>CLUSTER</command> instructs <productname>PostgreSQL</productname> 
   to cluster the table specified
-   by <replaceable class="parameter">table</replaceable> approximately
+   by <replaceable class="parameter">table</replaceable>
   based on the index specified by
   <replaceable class="parameter">indexname</replaceable>. The index must
   already have been defined on 
@ -110,11 +97,11 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
  <para>
   When a table is clustered, it is physically reordered
-   based on the index information. The clustering is static.
+   based on the index information. Clustering is a one-time operation:
-   In other words, as the table is updated, the changes are
+   when the table is subsequently updated, the changes are
-   not clustered. No attempt is made to keep new instances or
+   not clustered.  That is, no attempt is made to store new or
-   updated tuples clustered.  If one wishes, one can
+   updated tuples according to their index order.  If one wishes, one can
-   re-cluster manually by issuing the command again.
+   periodically re-cluster by issuing the command again.
  </para>
  <refsect2 id="R2-SQL-CLUSTER-3">
@ -146,18 +133,34 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
   </para>
   <para>
-    There are two ways to cluster data. The first is with the
+    During the cluster operation, a temporary copy of the table is created
-    <command>CLUSTER</command> command, which reorders the original table with
+    that contains the table data in the index order.  Temporary copies of
    each index on the table are created as well.  Therefore, you need free
    space on disk at least equal to the sum of the table size and the index
    sizes.
   </para>
   <para>
    CLUSTER preserves GRANT, inheritance, index, foreign key, and other
    ancillary information about the table.
   </para>
   <para>
    Because the optimizer records statistics about the ordering of tables, it
    is advisable to run <command>ANALYZE</command> on the newly clustered
    table.  Otherwise, the optimizer may make poor choices of query plans.
   </para>
   <para>
    There is another way to cluster data. The
    <command>CLUSTER</command> command reorders the original table using
    the ordering of the index you specify. This can be slow
    on large tables because the rows are fetched from the heap
    in index order, and if the heap table is unordered, the
    entries are on random pages, so there is one disk page
-    retrieved for every row moved. <productname>PostgreSQL</productname> has a cache,
+    retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
-    but the majority of a big table will not fit in the cache.
+    but the majority of a big table will not fit in the cache.)
-   </para>
+    The other way to cluster a table is to use
   <para> 
    Another way to cluster data is to use
    <programlisting>
 SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <replaceable class="parameter">newtable</replaceable>
@ -165,30 +168,15 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
    </programlisting>
    which uses the <productname>PostgreSQL</productname> sorting code in 
-    the ORDER BY clause to match the index, and which is much faster for
+    the ORDER BY clause to create the desired order; this is usually much
    faster than an indexscan for
    unordered data. You then drop the old table, use
    <command>ALTER TABLE...RENAME</command>
    to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
-    recreate the table's indexes. The only problem is that <acronym>OID</acronym>s
+    recreate the table's indexes. However, this approach does not preserve
-    will not be preserved. From then on, <command>CLUSTER</command> should be
+    OIDs, constraints, foreign key relationships, granted privileges, and
-    fast because most of the heap data has already been
+    other ancillary properties of the table --- all such items must be
-    ordered, and the existing index is used.
+    manually recreated.
   </para>
   <para>
    During the cluster operation, a temporal table is created that contains
    the table in the index order. Due to this, you need to have free space
    on disk at least the size of the table itself, or the biggest index if
    you have one that is larger than the table.
   </para>
   <para>
    CLUSTER preserves GRANT, inheritance index, and foreign key information.
   </para>
   <para>
    Because the optimizer records the cluster status of tables, it is 
    advised to run <command>ANALYZE</command> on the newly clustered table.
   </para>
  </refsect2>
@ -199,7 +187,7 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
   Usage
  </title>
  <para>
-   Cluster the employees relation on the basis of its salary attribute:
+   Cluster the employees relation on the basis of its ID attribute:
  </para>
  <programlisting>
 CLUSTER emp_ind ON emp;