From 014a86ac479272868c677b979b6049dedbf3bb33 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sun, 11 Aug 2002 02:43:57 +0000 Subject: [PATCH] Editorial improvements. --- doc/src/sgml/ref/cluster.sgml | 88 +++++++++++++++-------------------- 1 file changed, 38 insertions(+), 50 deletions(-) diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml index 73c188392f..ff489fe4c9 100644 --- a/doc/src/sgml/ref/cluster.sgml +++ b/doc/src/sgml/ref/cluster.sgml @@ -1,5 +1,5 @@ @@ -73,19 +73,6 @@ CLUSTER - - -ERROR: Relation table does not exist! - - - - - The specified relation was not shown in the error message, - which contained a random string instead of the relation name. - - - - @@ -101,7 +88,7 @@ ERROR: Relation table does not exis CLUSTER instructs PostgreSQL to cluster the table specified - by table approximately + by table based on the index specified by indexname. The index must already have been defined on @@ -110,11 +97,11 @@ ERROR: Relation table does not exis When a table is clustered, it is physically reordered - based on the index information. The clustering is static. - In other words, as the table is updated, the changes are - not clustered. No attempt is made to keep new instances or - updated tuples clustered. If one wishes, one can - re-cluster manually by issuing the command again. + based on the index information. Clustering is a one-time operation: + when the table is subsequently updated, the changes are + not clustered. That is, no attempt is made to store new or + updated tuples according to their index order. If one wishes, one can + periodically re-cluster by issuing the command again. @@ -146,18 +133,34 @@ ERROR: Relation table does not exis - There are two ways to cluster data. The first is with the - CLUSTER command, which reorders the original table with + During the cluster operation, a temporary copy of the table is created + that contains the table data in the index order. Temporary copies of + each index on the table are created as well. Therefore, you need free + space on disk at least equal to the sum of the table size and the index + sizes. + + + + CLUSTER preserves GRANT, inheritance, index, foreign key, and other + ancillary information about the table. + + + + Because the optimizer records statistics about the ordering of tables, it + is advisable to run ANALYZE on the newly clustered + table. Otherwise, the optimizer may make poor choices of query plans. + + + + There is another way to cluster data. The + CLUSTER command reorders the original table using the ordering of the index you specify. This can be slow on large tables because the rows are fetched from the heap in index order, and if the heap table is unordered, the entries are on random pages, so there is one disk page - retrieved for every row moved. PostgreSQL has a cache, - but the majority of a big table will not fit in the cache. - - - - Another way to cluster data is to use + retrieved for every row moved. (PostgreSQL has a cache, + but the majority of a big table will not fit in the cache.) + The other way to cluster a table is to use SELECT columnlist INTO TABLE newtable @@ -165,30 +168,15 @@ SELECT columnlist INTO TABLE which uses the PostgreSQL sorting code in - the ORDER BY clause to match the index, and which is much faster for + the ORDER BY clause to create the desired order; this is usually much + faster than an indexscan for unordered data. You then drop the old table, use ALTER TABLE...RENAME to rename newtable to the old name, and - recreate the table's indexes. The only problem is that OIDs - will not be preserved. From then on, CLUSTER should be - fast because most of the heap data has already been - ordered, and the existing index is used. - - - - During the cluster operation, a temporal table is created that contains - the table in the index order. Due to this, you need to have free space - on disk at least the size of the table itself, or the biggest index if - you have one that is larger than the table. - - - - CLUSTER preserves GRANT, inheritance index, and foreign key information. - - - - Because the optimizer records the cluster status of tables, it is - advised to run ANALYZE on the newly clustered table. + recreate the table's indexes. However, this approach does not preserve + OIDs, constraints, foreign key relationships, granted privileges, and + other ancillary properties of the table --- all such items must be + manually recreated. @@ -199,7 +187,7 @@ SELECT columnlist INTO TABLE - Cluster the employees relation on the basis of its salary attribute: + Cluster the employees relation on the basis of its ID attribute: CLUSTER emp_ind ON emp;