Editorial improvements.
This commit is contained in:
parent
74ce5c93c7
commit
014a86ac47
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.18 2002/08/10 21:03:33 momjian Exp $
|
||||
$Header: /cvsroot/pgsql/doc/src/sgml/ref/cluster.sgml,v 1.19 2002/08/11 02:43:57 tgl Exp $
|
||||
PostgreSQL documentation
|
||||
-->
|
||||
|
||||
@ -73,19 +73,6 @@ CLUSTER
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term><computeroutput>
|
||||
ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exist!
|
||||
</computeroutput></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<comment>
|
||||
The specified relation was not shown in the error message,
|
||||
which contained a random string instead of the relation name.
|
||||
</comment>
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
</refsect2>
|
||||
@ -101,7 +88,7 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
|
||||
<para>
|
||||
<command>CLUSTER</command> instructs <productname>PostgreSQL</productname>
|
||||
to cluster the table specified
|
||||
by <replaceable class="parameter">table</replaceable> approximately
|
||||
by <replaceable class="parameter">table</replaceable>
|
||||
based on the index specified by
|
||||
<replaceable class="parameter">indexname</replaceable>. The index must
|
||||
already have been defined on
|
||||
@ -110,11 +97,11 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
|
||||
|
||||
<para>
|
||||
When a table is clustered, it is physically reordered
|
||||
based on the index information. The clustering is static.
|
||||
In other words, as the table is updated, the changes are
|
||||
not clustered. No attempt is made to keep new instances or
|
||||
updated tuples clustered. If one wishes, one can
|
||||
re-cluster manually by issuing the command again.
|
||||
based on the index information. Clustering is a one-time operation:
|
||||
when the table is subsequently updated, the changes are
|
||||
not clustered. That is, no attempt is made to store new or
|
||||
updated tuples according to their index order. If one wishes, one can
|
||||
periodically re-cluster by issuing the command again.
|
||||
</para>
|
||||
|
||||
<refsect2 id="R2-SQL-CLUSTER-3">
|
||||
@ -146,18 +133,34 @@ ERROR: Relation <replaceable class="PARAMETER">table</replaceable> does not exis
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are two ways to cluster data. The first is with the
|
||||
<command>CLUSTER</command> command, which reorders the original table with
|
||||
During the cluster operation, a temporary copy of the table is created
|
||||
that contains the table data in the index order. Temporary copies of
|
||||
each index on the table are created as well. Therefore, you need free
|
||||
space on disk at least equal to the sum of the table size and the index
|
||||
sizes.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
CLUSTER preserves GRANT, inheritance, index, foreign key, and other
|
||||
ancillary information about the table.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Because the optimizer records statistics about the ordering of tables, it
|
||||
is advisable to run <command>ANALYZE</command> on the newly clustered
|
||||
table. Otherwise, the optimizer may make poor choices of query plans.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There is another way to cluster data. The
|
||||
<command>CLUSTER</command> command reorders the original table using
|
||||
the ordering of the index you specify. This can be slow
|
||||
on large tables because the rows are fetched from the heap
|
||||
in index order, and if the heap table is unordered, the
|
||||
entries are on random pages, so there is one disk page
|
||||
retrieved for every row moved. <productname>PostgreSQL</productname> has a cache,
|
||||
but the majority of a big table will not fit in the cache.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Another way to cluster data is to use
|
||||
retrieved for every row moved. (<productname>PostgreSQL</productname> has a cache,
|
||||
but the majority of a big table will not fit in the cache.)
|
||||
The other way to cluster a table is to use
|
||||
|
||||
<programlisting>
|
||||
SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <replaceable class="parameter">newtable</replaceable>
|
||||
@ -165,30 +168,15 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
|
||||
</programlisting>
|
||||
|
||||
which uses the <productname>PostgreSQL</productname> sorting code in
|
||||
the ORDER BY clause to match the index, and which is much faster for
|
||||
the ORDER BY clause to create the desired order; this is usually much
|
||||
faster than an indexscan for
|
||||
unordered data. You then drop the old table, use
|
||||
<command>ALTER TABLE...RENAME</command>
|
||||
to rename <replaceable class="parameter">newtable</replaceable> to the old name, and
|
||||
recreate the table's indexes. The only problem is that <acronym>OID</acronym>s
|
||||
will not be preserved. From then on, <command>CLUSTER</command> should be
|
||||
fast because most of the heap data has already been
|
||||
ordered, and the existing index is used.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
During the cluster operation, a temporal table is created that contains
|
||||
the table in the index order. Due to this, you need to have free space
|
||||
on disk at least the size of the table itself, or the biggest index if
|
||||
you have one that is larger than the table.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
CLUSTER preserves GRANT, inheritance index, and foreign key information.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Because the optimizer records the cluster status of tables, it is
|
||||
advised to run <command>ANALYZE</command> on the newly clustered table.
|
||||
recreate the table's indexes. However, this approach does not preserve
|
||||
OIDs, constraints, foreign key relationships, granted privileges, and
|
||||
other ancillary properties of the table --- all such items must be
|
||||
manually recreated.
|
||||
</para>
|
||||
|
||||
</refsect2>
|
||||
@ -199,7 +187,7 @@ SELECT <replaceable class="parameter">columnlist</replaceable> INTO TABLE <repla
|
||||
Usage
|
||||
</title>
|
||||
<para>
|
||||
Cluster the employees relation on the basis of its salary attribute:
|
||||
Cluster the employees relation on the basis of its ID attribute:
|
||||
</para>
|
||||
<programlisting>
|
||||
CLUSTER emp_ind ON emp;
|
||||
|
Loading…
x
Reference in New Issue
Block a user