1044 lines
45 KiB
Plaintext
1044 lines
45 KiB
Plaintext
<!-- $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.64 2007/03/29 00:15:36 tgl Exp $ -->
|
|
|
|
<chapter id="performance-tips">
|
|
<title>Performance Tips</title>
|
|
|
|
<indexterm zone="performance-tips">
|
|
<primary>performance</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Query performance can be affected by many things. Some of these can
|
|
be manipulated by the user, while others are fundamental to the underlying
|
|
design of the system. This chapter provides some hints about understanding
|
|
and tuning <productname>PostgreSQL</productname> performance.
|
|
</para>
|
|
|
|
<sect1 id="using-explain">
|
|
<title>Using <command>EXPLAIN</command></title>
|
|
|
|
<indexterm zone="using-explain">
|
|
<primary>EXPLAIN</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="using-explain">
|
|
<primary>query plan</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> devises a <firstterm>query
|
|
plan</firstterm> for each query it is given. Choosing the right
|
|
plan to match the query structure and the properties of the data
|
|
is absolutely critical for good performance, so the system includes
|
|
a complex <firstterm>planner</> that tries to select good plans.
|
|
You can use the
|
|
<xref linkend="sql-explain" endterm="sql-explain-title"> command
|
|
to see what query plan the planner creates for any query.
|
|
Plan-reading is an art that deserves an extensive tutorial, which
|
|
this is not; but here is some basic information.
|
|
</para>
|
|
|
|
<para>
|
|
The structure of a query plan is a tree of <firstterm>plan nodes</>.
|
|
Nodes at the bottom level are table scan nodes: they return raw rows
|
|
from a table. There are different types of scan nodes for different
|
|
table access methods: sequential scans, index scans, and bitmap index
|
|
scans. If the query requires joining, aggregation, sorting, or other
|
|
operations on the raw rows, then there will be additional nodes
|
|
<quote>atop</> the scan nodes to perform these operations. Again,
|
|
there is usually more than one possible way to do these operations,
|
|
so different node types can appear here too. The output
|
|
of <command>EXPLAIN</command> has one line for each node in the plan
|
|
tree, showing the basic node type plus the cost estimates that the planner
|
|
made for the execution of that plan node. The first line (topmost node)
|
|
has the estimated total execution cost for the plan; it is this number
|
|
that the planner seeks to minimize.
|
|
</para>
|
|
|
|
<para>
|
|
Here is a trivial example, just to show what the output looks like.
|
|
<footnote>
|
|
<para>
|
|
Examples in this section are drawn from the regression test database
|
|
after doing a <command>VACUUM ANALYZE</>, using 8.2 development sources.
|
|
You should be able to get similar results if you try the examples yourself,
|
|
but your estimated costs and row counts will probably vary slightly
|
|
because <command>ANALYZE</>'s statistics are random samples rather
|
|
than being exact.
|
|
</para>
|
|
</footnote>
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------
|
|
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The numbers that are quoted by <command>EXPLAIN</command> are:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Estimated start-up cost (Time expended before output scan can start,
|
|
e.g., time to do the sorting in a sort node.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated total cost (If all rows were to be retrieved, though they might
|
|
not be: for example, a query with a <literal>LIMIT</> clause will stop
|
|
short of paying the total cost of the <literal>Limit</> plan node's
|
|
input node.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated number of rows output by this plan node (Again, only if
|
|
executed to completion.)
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Estimated average width (in bytes) of rows output by this plan
|
|
node
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
The costs are measured in arbitrary units determined by the planner's
|
|
cost parameters (see <xref linkend="runtime-config-query-constants">).
|
|
Traditional practice is to measure the costs in units of disk page
|
|
fetches; that is, <xref linkend="guc-seq-page-cost"> is conventionally
|
|
set to <literal>1.0</> and the other cost parameters are set relative
|
|
to that. The examples in this section are run with the default cost
|
|
parameters.
|
|
</para>
|
|
|
|
<para>
|
|
It's important to note that the cost of an upper-level node includes
|
|
the cost of all its child nodes. It's also important to realize that
|
|
the cost only reflects things that the planner cares about.
|
|
In particular, the cost does not consider the time spent transmitting
|
|
result rows to the client, which could be an important
|
|
factor in the true elapsed time; but the planner ignores it because
|
|
it cannot change it by altering the plan. (Every correct plan will
|
|
output the same row set, we trust.)
|
|
</para>
|
|
|
|
<para>
|
|
Rows output is a little tricky because it is <emphasis>not</emphasis> the
|
|
number of rows processed or scanned by the plan node. It is usually less,
|
|
reflecting the estimated selectivity of any <literal>WHERE</>-clause
|
|
conditions that are being
|
|
applied at the node. Ideally the top-level rows estimate will
|
|
approximate the number of rows actually returned, updated, or deleted
|
|
by the query.
|
|
</para>
|
|
|
|
<para>
|
|
Returning to our example:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------
|
|
Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=244)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
This is about as straightforward as it gets. If you do:
|
|
|
|
<programlisting>
|
|
SELECT relpages, reltuples FROM pg_class WHERE relname = 'tenk1';
|
|
</programlisting>
|
|
|
|
you will find out that <classname>tenk1</classname> has 358 disk
|
|
pages and 10000 rows. So the cost is estimated at 358 page
|
|
reads, costing <xref linkend="guc-seq-page-cost"> apiece (1.0 by
|
|
default), plus 10000 * <xref linkend="guc-cpu-tuple-cost"> which is
|
|
0.01 by default.
|
|
</para>
|
|
|
|
<para>
|
|
Now let's modify the query to add a <literal>WHERE</> condition:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 7000;
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------
|
|
Seq Scan on tenk1 (cost=0.00..483.00 rows=7033 width=244)
|
|
Filter: (unique1 < 7000)
|
|
</programlisting>
|
|
|
|
Notice that the <command>EXPLAIN</> output shows the <literal>WHERE</>
|
|
clause being applied as a <quote>filter</> condition; this means that
|
|
the plan node checks the condition for each row it scans, and outputs
|
|
only the ones that pass the condition.
|
|
The estimate of output rows has gone down because of the <literal>WHERE</>
|
|
clause.
|
|
However, the scan will still have to visit all 10000 rows, so the cost
|
|
hasn't decreased; in fact it has gone up a bit to reflect the extra CPU
|
|
time spent checking the <literal>WHERE</> condition.
|
|
</para>
|
|
|
|
<para>
|
|
The actual number of rows this query would select is 7000, but the rows
|
|
estimate is only approximate. If you try to duplicate this experiment,
|
|
you will probably get a slightly different estimate; moreover, it will
|
|
change after each <command>ANALYZE</command> command, because the
|
|
statistics produced by <command>ANALYZE</command> are taken from a
|
|
randomized sample of the table.
|
|
</para>
|
|
|
|
<para>
|
|
Now, let's make the condition more restrictive:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100;
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------------------------
|
|
Bitmap Heap Scan on tenk1 (cost=2.37..232.35 rows=106 width=244)
|
|
Recheck Cond: (unique1 < 100)
|
|
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0)
|
|
Index Cond: (unique1 < 100)
|
|
</programlisting>
|
|
|
|
Here the planner has decided to use a two-step plan: the bottom plan
|
|
node visits an index to find the locations of rows matching the index
|
|
condition, and then the upper plan node actually fetches those rows
|
|
from the table itself. Fetching the rows separately is much more
|
|
expensive than sequentially reading them, but because not all the pages
|
|
of the table have to be visited, this is still cheaper than a sequential
|
|
scan. (The reason for using two levels of plan is that the upper plan
|
|
node sorts the row locations identified by the index into physical order
|
|
before reading them, so as to minimize the costs of the separate fetches.
|
|
The <quote>bitmap</> mentioned in the node names is the mechanism that
|
|
does the sorting.)
|
|
</para>
|
|
|
|
<para>
|
|
If the <literal>WHERE</> condition is selective enough, the planner might
|
|
switch to a <quote>simple</> index scan plan:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 3;
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------------------------
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..10.00 rows=2 width=244)
|
|
Index Cond: (unique1 < 3)
|
|
</programlisting>
|
|
|
|
In this case the table rows are fetched in index order, which makes them
|
|
even more expensive to read, but there are so few that the extra cost
|
|
of sorting the row locations is not worth it. You'll most often see
|
|
this plan type for queries that fetch just a single row, and for queries
|
|
that request an <literal>ORDER BY</> condition that matches the index
|
|
order.
|
|
</para>
|
|
|
|
<para>
|
|
Add another condition to the <literal>WHERE</> clause:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 3 AND stringu1 = 'xxx';
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------------------------
|
|
Index Scan using tenk1_unique1 on tenk1 (cost=0.00..10.01 rows=1 width=244)
|
|
Index Cond: (unique1 < 3)
|
|
Filter: (stringu1 = 'xxx'::name)
|
|
</programlisting>
|
|
|
|
The added condition <literal>stringu1 = 'xxx'</literal> reduces the
|
|
output-rows estimate, but not the cost because we still have to visit the
|
|
same set of rows. Notice that the <literal>stringu1</> clause
|
|
cannot be applied as an index condition (since this index is only on
|
|
the <literal>unique1</> column). Instead it is applied as a filter on
|
|
the rows retrieved by the index. Thus the cost has actually gone up
|
|
a little bit to reflect this extra checking.
|
|
</para>
|
|
|
|
<para>
|
|
If there are indexes on several columns used in <literal>WHERE</>, the
|
|
planner might choose to use an AND or OR combination of the indexes:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 WHERE unique1 < 100 AND unique2 > 9000;
|
|
|
|
QUERY PLAN
|
|
-------------------------------------------------------------------------------------
|
|
Bitmap Heap Scan on tenk1 (cost=11.27..49.11 rows=11 width=244)
|
|
Recheck Cond: ((unique1 < 100) AND (unique2 > 9000))
|
|
-> BitmapAnd (cost=11.27..11.27 rows=11 width=0)
|
|
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0)
|
|
Index Cond: (unique1 < 100)
|
|
-> Bitmap Index Scan on tenk1_unique2 (cost=0.00..8.65 rows=1042 width=0)
|
|
Index Cond: (unique2 > 9000)
|
|
</programlisting>
|
|
|
|
But this requires visiting both indexes, so it's not necessarily a win
|
|
compared to using just one index and treating the other condition as
|
|
a filter. If you vary the ranges involved you'll see the plan change
|
|
accordingly.
|
|
</para>
|
|
|
|
<para>
|
|
Let's try joining two tables, using the columns we have been discussing:
|
|
|
|
<programlisting>
|
|
EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
--------------------------------------------------------------------------------------
|
|
Nested Loop (cost=2.37..553.11 rows=106 width=488)
|
|
-> Bitmap Heap Scan on tenk1 t1 (cost=2.37..232.35 rows=106 width=244)
|
|
Recheck Cond: (unique1 < 100)
|
|
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0)
|
|
Index Cond: (unique1 < 100)
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..3.01 rows=1 width=244)
|
|
Index Cond: ("outer".unique2 = t2.unique2)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
In this nested-loop join, the outer scan is the same bitmap index scan we
|
|
saw earlier, and so its cost and row count are the same because we are
|
|
applying the <literal>WHERE</> clause <literal>unique1 < 100</literal>
|
|
at that node.
|
|
The <literal>t1.unique2 = t2.unique2</literal> clause is not relevant yet,
|
|
so it doesn't affect row count of the outer scan. For the inner scan, the
|
|
<literal>unique2</> value of the current outer-scan row is plugged into
|
|
the inner index scan to produce an index condition like
|
|
<literal>t2.unique2 = <replaceable>constant</replaceable></literal>.
|
|
So we get the same inner-scan plan and costs that we'd get from, say,
|
|
<literal>EXPLAIN SELECT * FROM tenk2 WHERE unique2 = 42</literal>. The
|
|
costs of the loop node are then set on the basis of the cost of the outer
|
|
scan, plus one repetition of the inner scan for each outer row (106 * 3.01,
|
|
here), plus a little CPU time for join processing.
|
|
</para>
|
|
|
|
<para>
|
|
In this example the join's output row count is the same as the product
|
|
of the two scans' row counts, but that's not true in general, because
|
|
in general you can have <literal>WHERE</> clauses that mention both tables
|
|
and so can only be applied at the join point, not to either input scan.
|
|
For example, if we added
|
|
<literal>WHERE ... AND t1.hundred < t2.hundred</literal>,
|
|
that would decrease the output row count of the join node, but not change
|
|
either input scan.
|
|
</para>
|
|
|
|
<para>
|
|
One way to look at variant plans is to force the planner to disregard
|
|
whatever strategy it thought was the winner, using the enable/disable
|
|
flags described in <xref linkend="runtime-config-query-enable">.
|
|
(This is a crude tool, but useful. See
|
|
also <xref linkend="explicit-joins">.)
|
|
|
|
<programlisting>
|
|
SET enable_nestloop = off;
|
|
EXPLAIN SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
------------------------------------------------------------------------------------------
|
|
Hash Join (cost=232.61..741.67 rows=106 width=488)
|
|
Hash Cond: ("outer".unique2 = "inner".unique2)
|
|
-> Seq Scan on tenk2 t2 (cost=0.00..458.00 rows=10000 width=244)
|
|
-> Hash (cost=232.35..232.35 rows=106 width=244)
|
|
-> Bitmap Heap Scan on tenk1 t1 (cost=2.37..232.35 rows=106 width=244)
|
|
Recheck Cond: (unique1 < 100)
|
|
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0)
|
|
Index Cond: (unique1 < 100)
|
|
</programlisting>
|
|
|
|
This plan proposes to extract the 100 interesting rows of <classname>tenk1</classname>
|
|
using that same old index scan, stash them into an in-memory hash table,
|
|
and then do a sequential scan of <classname>tenk2</classname>, probing into the hash table
|
|
for possible matches of <literal>t1.unique2 = t2.unique2</literal> at each <classname>tenk2</classname> row.
|
|
The cost to read <classname>tenk1</classname> and set up the hash table is entirely start-up
|
|
cost for the hash join, since we won't get any rows out until we can
|
|
start reading <classname>tenk2</classname>. The total time estimate for the join also
|
|
includes a hefty charge for the CPU time to probe the hash table
|
|
10000 times. Note, however, that we are <emphasis>not</emphasis> charging 10000 times 232.35;
|
|
the hash table setup is only done once in this plan type.
|
|
</para>
|
|
|
|
<para>
|
|
It is possible to check on the accuracy of the planner's estimated costs
|
|
by using <command>EXPLAIN ANALYZE</>. This command actually executes the query,
|
|
and then displays the true run time accumulated within each plan node
|
|
along with the same estimated costs that a plain <command>EXPLAIN</command> shows.
|
|
For example, we might get a result like this:
|
|
|
|
<screen>
|
|
EXPLAIN ANALYZE SELECT * FROM tenk1 t1, tenk2 t2 WHERE t1.unique1 < 100 AND t1.unique2 = t2.unique2;
|
|
|
|
QUERY PLAN
|
|
----------------------------------------------------------------------------------------------------------------------------------
|
|
Nested Loop (cost=2.37..553.11 rows=106 width=488) (actual time=1.392..12.700 rows=100 loops=1)
|
|
-> Bitmap Heap Scan on tenk1 t1 (cost=2.37..232.35 rows=106 width=244) (actual time=0.878..2.367 rows=100 loops=1)
|
|
Recheck Cond: (unique1 < 100)
|
|
-> Bitmap Index Scan on tenk1_unique1 (cost=0.00..2.37 rows=106 width=0) (actual time=0.546..0.546 rows=100 loops=1)
|
|
Index Cond: (unique1 < 100)
|
|
-> Index Scan using tenk2_unique2 on tenk2 t2 (cost=0.00..3.01 rows=1 width=244) (actual time=0.067..0.078 rows=1 loops=100)
|
|
Index Cond: ("outer".unique2 = t2.unique2)
|
|
Total runtime: 14.452 ms
|
|
</screen>
|
|
|
|
Note that the <quote>actual time</quote> values are in milliseconds of
|
|
real time, whereas the <quote>cost</quote> estimates are expressed in
|
|
arbitrary units; so they are unlikely to match up.
|
|
The thing to pay attention to is whether the ratios of actual time and
|
|
estimated costs are consistent.
|
|
</para>
|
|
|
|
<para>
|
|
In some query plans, it is possible for a subplan node to be executed more
|
|
than once. For example, the inner index scan is executed once per outer
|
|
row in the above nested-loop plan. In such cases, the
|
|
<quote>loops</quote> value reports the
|
|
total number of executions of the node, and the actual time and rows
|
|
values shown are averages per-execution. This is done to make the numbers
|
|
comparable with the way that the cost estimates are shown. Multiply by
|
|
the <quote>loops</quote> value to get the total time actually spent in
|
|
the node.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>Total runtime</literal> shown by <command>EXPLAIN
|
|
ANALYZE</command> includes executor start-up and shut-down time, as well
|
|
as time spent processing the result rows. It does not include parsing,
|
|
rewriting, or planning time. For a <command>SELECT</> query, the total
|
|
run time will normally be just a little larger than the total time
|
|
reported for the top-level plan node. For <command>INSERT</>,
|
|
<command>UPDATE</>, and <command>DELETE</> commands, the total run time
|
|
might be considerably larger, because it includes the time spent processing
|
|
the result rows. In these commands, the time for the top plan node
|
|
essentially is the time spent computing the new rows and/or locating the
|
|
old ones, but it doesn't include the time spent applying the changes.
|
|
Time spent firing triggers, if any, is also outside the top plan node,
|
|
and is shown separately for each trigger.
|
|
</para>
|
|
|
|
<para>
|
|
It is worth noting that <command>EXPLAIN</> results should not be extrapolated
|
|
to situations other than the one you are actually testing; for example,
|
|
results on a toy-sized table cannot be assumed to apply to large tables.
|
|
The planner's cost estimates are not linear and so it might choose
|
|
a different plan for a larger or smaller table. An extreme example
|
|
is that on a table that only occupies one disk page, you'll nearly
|
|
always get a sequential scan plan whether indexes are available or not.
|
|
The planner realizes that it's going to take one disk page read to
|
|
process the table in any case, so there's no value in expending additional
|
|
page reads to look at an index.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="planner-stats">
|
|
<title>Statistics Used by the Planner</title>
|
|
|
|
<indexterm zone="planner-stats">
|
|
<primary>statistics</primary>
|
|
<secondary>of the planner</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
As we saw in the previous section, the query planner needs to estimate
|
|
the number of rows retrieved by a query in order to make good choices
|
|
of query plans. This section provides a quick look at the statistics
|
|
that the system uses for these estimates.
|
|
</para>
|
|
|
|
<para>
|
|
One component of the statistics is the total number of entries in
|
|
each table and index, as well as the number of disk blocks occupied
|
|
by each table and index. This information is kept in the table
|
|
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>, in
|
|
the columns <structfield>reltuples</structfield> and
|
|
<structfield>relpages</structfield>. We can look at it with
|
|
queries similar to this one:
|
|
|
|
<screen>
|
|
SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 'tenk1%';
|
|
|
|
relname | relkind | reltuples | relpages
|
|
----------------------+---------+-----------+----------
|
|
tenk1 | r | 10000 | 358
|
|
tenk1_hundred | i | 10000 | 30
|
|
tenk1_thous_tenthous | i | 10000 | 30
|
|
tenk1_unique1 | i | 10000 | 30
|
|
tenk1_unique2 | i | 10000 | 30
|
|
(5 rows)
|
|
</screen>
|
|
|
|
Here we can see that <structname>tenk1</structname> contains 10000
|
|
rows, as do its indexes, but the indexes are (unsurprisingly) much
|
|
smaller than the table.
|
|
</para>
|
|
|
|
<para>
|
|
For efficiency reasons, <structfield>reltuples</structfield>
|
|
and <structfield>relpages</structfield> are not updated on-the-fly,
|
|
and so they usually contain somewhat out-of-date values.
|
|
They are updated by <command>VACUUM</>, <command>ANALYZE</>, and a
|
|
few DDL commands such as <command>CREATE INDEX</>. A stand-alone
|
|
<command>ANALYZE</>, that is one not part of <command>VACUUM</>,
|
|
generates an approximate <structfield>reltuples</structfield> value
|
|
since it does not read every row of the table. The planner
|
|
will scale the values it finds in <structname>pg_class</structname>
|
|
to match the current physical table size, thus obtaining a closer
|
|
approximation.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>pg_statistic</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Most queries retrieve only a fraction of the rows in a table, due
|
|
to having <literal>WHERE</> clauses that restrict the rows to be
|
|
examined. The planner thus needs to make an estimate of the
|
|
<firstterm>selectivity</> of <literal>WHERE</> clauses, that is,
|
|
the fraction of rows that match each condition in the
|
|
<literal>WHERE</> clause. The information used for this task is
|
|
stored in the <link linkend="catalog-pg-statistic"><structname>pg_statistic</structname></link>
|
|
system catalog. Entries in <structname>pg_statistic</structname>
|
|
are updated by the <command>ANALYZE</> and <command>VACUUM
|
|
ANALYZE</> commands, and are always approximate even when freshly
|
|
updated.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>pg_stats</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Rather than look at <structname>pg_statistic</structname> directly,
|
|
it's better to look at its view <structname>pg_stats</structname>
|
|
when examining the statistics manually. <structname>pg_stats</structname>
|
|
is designed to be more easily readable. Furthermore,
|
|
<structname>pg_stats</structname> is readable by all, whereas
|
|
<structname>pg_statistic</structname> is only readable by a superuser.
|
|
(This prevents unprivileged users from learning something about
|
|
the contents of other people's tables from the statistics. The
|
|
<structname>pg_stats</structname> view is restricted to show only
|
|
rows about tables that the current user can read.)
|
|
For example, we might do:
|
|
|
|
<screen>
|
|
SELECT attname, n_distinct, most_common_vals FROM pg_stats WHERE tablename = 'road';
|
|
|
|
attname | n_distinct | most_common_vals
|
|
---------+------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|
|
name | -0.467008 | {"I- 580 Ramp","I- 880 Ramp","Sp Railroad ","I- 580 ","I- 680 Ramp","I- 80 Ramp","14th St ","5th St ","Mission Blvd","I- 880 "}
|
|
thepath | 20 | {"[(-122.089,37.71),(-122.0886,37.711)]"}
|
|
(2 rows)
|
|
</screen>
|
|
</para>
|
|
|
|
<para>
|
|
<structname>pg_stats</structname> is described in detail in
|
|
<xref linkend="view-pg-stats">.
|
|
</para>
|
|
|
|
<para>
|
|
The amount of information stored in <structname>pg_statistic</structname>,
|
|
in particular the maximum number of entries in the
|
|
<structfield>most_common_vals</> and <structfield>histogram_bounds</>
|
|
arrays for each column, can be set on a
|
|
column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>
|
|
command, or globally by setting the
|
|
<xref linkend="guc-default-statistics-target"> configuration variable.
|
|
The default limit is presently 10 entries. Raising the limit
|
|
might allow more accurate planner estimates to be made, particularly for
|
|
columns with irregular data distributions, at the price of consuming
|
|
more space in <structname>pg_statistic</structname> and slightly more
|
|
time to compute the estimates. Conversely, a lower limit might be
|
|
appropriate for columns with simple data distributions.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="explicit-joins">
|
|
<title>Controlling the Planner with Explicit <literal>JOIN</> Clauses</title>
|
|
|
|
<indexterm zone="explicit-joins">
|
|
<primary>join</primary>
|
|
<secondary>controlling the order</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
It is possible
|
|
to control the query planner to some extent by using the explicit <literal>JOIN</>
|
|
syntax. To see why this matters, we first need some background.
|
|
</para>
|
|
|
|
<para>
|
|
In a simple join query, such as:
|
|
<programlisting>
|
|
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
|
|
</programlisting>
|
|
the planner is free to join the given tables in any order. For
|
|
example, it could generate a query plan that joins A to B, using
|
|
the <literal>WHERE</> condition <literal>a.id = b.id</>, and then
|
|
joins C to this joined table, using the other <literal>WHERE</>
|
|
condition. Or it could join B to C and then join A to that result.
|
|
Or it could join A to C and then join them with B — but that
|
|
would be inefficient, since the full Cartesian product of A and C
|
|
would have to be formed, there being no applicable condition in the
|
|
<literal>WHERE</> clause to allow optimization of the join. (All
|
|
joins in the <productname>PostgreSQL</productname> executor happen
|
|
between two input tables, so it's necessary to build up the result
|
|
in one or another of these fashions.) The important point is that
|
|
these different join possibilities give semantically equivalent
|
|
results but might have hugely different execution costs. Therefore,
|
|
the planner will explore all of them to try to find the most
|
|
efficient query plan.
|
|
</para>
|
|
|
|
<para>
|
|
When a query only involves two or three tables, there aren't many join
|
|
orders to worry about. But the number of possible join orders grows
|
|
exponentially as the number of tables expands. Beyond ten or so input
|
|
tables it's no longer practical to do an exhaustive search of all the
|
|
possibilities, and even for six or seven tables planning might take an
|
|
annoyingly long time. When there are too many input tables, the
|
|
<productname>PostgreSQL</productname> planner will switch from exhaustive
|
|
search to a <firstterm>genetic</firstterm> probabilistic search
|
|
through a limited number of possibilities. (The switch-over threshold is
|
|
set by the <xref linkend="guc-geqo-threshold"> run-time
|
|
parameter.)
|
|
The genetic search takes less time, but it won't
|
|
necessarily find the best possible plan.
|
|
</para>
|
|
|
|
<para>
|
|
When the query involves outer joins, the planner has less freedom
|
|
than it does for plain (inner) joins. For example, consider:
|
|
<programlisting>
|
|
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
</programlisting>
|
|
Although this query's restrictions are superficially similar to the
|
|
previous example, the semantics are different because a row must be
|
|
emitted for each row of A that has no matching row in the join of B and C.
|
|
Therefore the planner has no choice of join order here: it must join
|
|
B to C and then join A to that result. Accordingly, this query takes
|
|
less time to plan than the previous query. In other cases, the planner
|
|
might be able to determine that more than one join order is safe.
|
|
For example, given:
|
|
<programlisting>
|
|
SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id);
|
|
</programlisting>
|
|
it is valid to join A to either B or C first. Currently, only
|
|
<literal>FULL JOIN</> completely constrains the join order. Most
|
|
practical cases involving <literal>LEFT JOIN</> or <literal>RIGHT JOIN</>
|
|
can be rearranged to some extent.
|
|
</para>
|
|
|
|
<para>
|
|
Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS
|
|
JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as
|
|
listing the input relations in <literal>FROM</>, so it does not
|
|
constrain the join order.
|
|
</para>
|
|
|
|
<para>
|
|
Even though most kinds of <literal>JOIN</> don't completely constrain
|
|
the join order, it is possible to instruct the
|
|
<productname>PostgreSQL</productname> query planner to treat all
|
|
<literal>JOIN</> clauses as constraining the join order anyway.
|
|
For example, these three queries are logically equivalent:
|
|
<programlisting>
|
|
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
|
|
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
|
|
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
|
|
</programlisting>
|
|
But if we tell the planner to honor the <literal>JOIN</> order,
|
|
the second and third take less time to plan than the first. This effect
|
|
is not worth worrying about for only three tables, but it can be a
|
|
lifesaver with many tables.
|
|
</para>
|
|
|
|
<para>
|
|
To force the planner to follow the join order laid out by explicit
|
|
<literal>JOIN</>s,
|
|
set the <xref linkend="guc-join-collapse-limit"> run-time parameter to 1.
|
|
(Other possible values are discussed below.)
|
|
</para>
|
|
|
|
<para>
|
|
You do not need to constrain the join order completely in order to
|
|
cut search time, because it's OK to use <literal>JOIN</> operators
|
|
within items of a plain <literal>FROM</> list. For example, consider:
|
|
<programlisting>
|
|
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
|
|
</programlisting>
|
|
With <varname>join_collapse_limit</> = 1, this
|
|
forces the planner to join A to B before joining them to other tables,
|
|
but doesn't constrain its choices otherwise. In this example, the
|
|
number of possible join orders is reduced by a factor of 5.
|
|
</para>
|
|
|
|
<para>
|
|
Constraining the planner's search in this way is a useful technique
|
|
both for reducing planning time and for directing the planner to a
|
|
good query plan. If the planner chooses a bad join order by default,
|
|
you can force it to choose a better order via <literal>JOIN</> syntax
|
|
— assuming that you know of a better order, that is. Experimentation
|
|
is recommended.
|
|
</para>
|
|
|
|
<para>
|
|
A closely related issue that affects planning time is collapsing of
|
|
subqueries into their parent query. For example, consider:
|
|
<programlisting>
|
|
SELECT *
|
|
FROM x, y,
|
|
(SELECT * FROM a, b, c WHERE something) AS ss
|
|
WHERE somethingelse;
|
|
</programlisting>
|
|
This situation might arise from use of a view that contains a join;
|
|
the view's <literal>SELECT</> rule will be inserted in place of the view
|
|
reference, yielding a query much like the above. Normally, the planner
|
|
will try to collapse the subquery into the parent, yielding:
|
|
<programlisting>
|
|
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
|
|
</programlisting>
|
|
This usually results in a better plan than planning the subquery
|
|
separately. (For example, the outer <literal>WHERE</> conditions might be such that
|
|
joining X to A first eliminates many rows of A, thus avoiding the need to
|
|
form the full logical output of the subquery.) But at the same time,
|
|
we have increased the planning time; here, we have a five-way join
|
|
problem replacing two separate three-way join problems. Because of the
|
|
exponential growth of the number of possibilities, this makes a big
|
|
difference. The planner tries to avoid getting stuck in huge join search
|
|
problems by not collapsing a subquery if more than <varname>from_collapse_limit</>
|
|
<literal>FROM</> items would result in the parent
|
|
query. You can trade off planning time against quality of plan by
|
|
adjusting this run-time parameter up or down.
|
|
</para>
|
|
|
|
<para>
|
|
<xref linkend="guc-from-collapse-limit"> and <xref
|
|
linkend="guc-join-collapse-limit">
|
|
are similarly named because they do almost the same thing: one controls
|
|
when the planner will <quote>flatten out</> subselects, and the
|
|
other controls when it will flatten out explicit joins. Typically
|
|
you would either set <varname>join_collapse_limit</> equal to
|
|
<varname>from_collapse_limit</> (so that explicit joins and subselects
|
|
act similarly) or set <varname>join_collapse_limit</> to 1 (if you want
|
|
to control join order with explicit joins). But you might set them
|
|
differently if you are trying to fine-tune the trade-off between planning
|
|
time and run time.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="populate">
|
|
<title>Populating a Database</title>
|
|
|
|
<para>
|
|
One might need to insert a large amount of data when first populating
|
|
a database. This section contains some suggestions on how to make
|
|
this process as efficient as possible.
|
|
</para>
|
|
|
|
<sect2 id="disable-autocommit">
|
|
<title>Disable Autocommit</title>
|
|
|
|
<indexterm>
|
|
<primary>autocommit</primary>
|
|
<secondary>bulk-loading data</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Turn off autocommit and just do one commit at the end. (In plain
|
|
SQL, this means issuing <command>BEGIN</command> at the start and
|
|
<command>COMMIT</command> at the end. Some client libraries might
|
|
do this behind your back, in which case you need to make sure the
|
|
library does it when you want it done.) If you allow each
|
|
insertion to be committed separately,
|
|
<productname>PostgreSQL</productname> is doing a lot of work for
|
|
each row that is added. An additional benefit of doing all
|
|
insertions in one transaction is that if the insertion of one row
|
|
were to fail then the insertion of all rows inserted up to that
|
|
point would be rolled back, so you won't be stuck with partially
|
|
loaded data.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-copy-from">
|
|
<title>Use <command>COPY</command></title>
|
|
|
|
<para>
|
|
Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load
|
|
all the rows in one command, instead of using a series of
|
|
<command>INSERT</command> commands. The <command>COPY</command>
|
|
command is optimized for loading large numbers of rows; it is less
|
|
flexible than <command>INSERT</command>, but incurs significantly
|
|
less overhead for large data loads. Since <command>COPY</command>
|
|
is a single command, there is no need to disable autocommit if you
|
|
use this method to populate a table.
|
|
</para>
|
|
|
|
<para>
|
|
If you cannot use <command>COPY</command>, it might help to use <xref
|
|
linkend="sql-prepare" endterm="sql-prepare-title"> to create a
|
|
prepared <command>INSERT</command> statement, and then use
|
|
<command>EXECUTE</command> as many times as required. This avoids
|
|
some of the overhead of repeatedly parsing and planning
|
|
<command>INSERT</command>. Different interfaces provide this facility
|
|
in different ways; look for <quote>prepared statements</> in the interface
|
|
documentation.
|
|
</para>
|
|
|
|
<para>
|
|
Note that loading a large number of rows using
|
|
<command>COPY</command> is almost always faster than using
|
|
<command>INSERT</command>, even if <command>PREPARE</> is used and
|
|
multiple insertions are batched into a single transaction.
|
|
</para>
|
|
|
|
<para>
|
|
<command>COPY</command> is fastest when used within the same
|
|
transaction as an earlier <command>CREATE TABLE</command> or
|
|
<command>TRUNCATE</command> command. In such cases no WAL
|
|
needs to be written, because in case of an error, the files
|
|
containing the newly loaded data will be removed anyway.
|
|
However, this consideration does not apply when
|
|
<xref linkend="guc-archive-command"> is set, as all commands
|
|
must write WAL in that case.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="populate-rm-indexes">
|
|
<title>Remove Indexes</title>
|
|
|
|
<para>
|
|
If you are loading a freshly created table, the fastest way is to
|
|
create the table, bulk load the table's data using
|
|
<command>COPY</command>, then create any indexes needed for the
|
|
table. Creating an index on pre-existing data is quicker than
|
|
updating it incrementally as each row is loaded.
|
|
</para>
|
|
|
|
<para>
|
|
If you are adding large amounts of data to an existing table,
|
|
it might be a win to drop the index,
|
|
load the table, and then recreate the index. Of course, the
|
|
database performance for other users might be adversely affected
|
|
during the time that the index is missing. One should also think
|
|
twice before dropping unique indexes, since the error checking
|
|
afforded by the unique constraint will be lost while the index is
|
|
missing.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-rm-fkeys">
|
|
<title>Remove Foreign Key Constraints</title>
|
|
|
|
<para>
|
|
Just as with indexes, a foreign key constraint can be checked
|
|
<quote>in bulk</> more efficiently than row-by-row. So it might be
|
|
useful to drop foreign key constraints, load data, and re-create
|
|
the constraints. Again, there is a trade-off between data load
|
|
speed and loss of error checking while the constraint is missing.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-work-mem">
|
|
<title>Increase <varname>maintenance_work_mem</varname></title>
|
|
|
|
<para>
|
|
Temporarily increasing the <xref linkend="guc-maintenance-work-mem">
|
|
configuration variable when loading large amounts of data can
|
|
lead to improved performance. This will help to speed up <command>CREATE
|
|
INDEX</> commands and <command>ALTER TABLE ADD FOREIGN KEY</> commands.
|
|
It won't do much for <command>COPY</> itself, so this advice is
|
|
only useful when you are using one or both of the above techniques.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-checkpoint-segments">
|
|
<title>Increase <varname>checkpoint_segments</varname></title>
|
|
|
|
<para>
|
|
Temporarily increasing the <xref
|
|
linkend="guc-checkpoint-segments"> configuration variable can also
|
|
make large data loads faster. This is because loading a large
|
|
amount of data into <productname>PostgreSQL</productname> will
|
|
cause checkpoints to occur more often than the normal checkpoint
|
|
frequency (specified by the <varname>checkpoint_timeout</varname>
|
|
configuration variable). Whenever a checkpoint occurs, all dirty
|
|
pages must be flushed to disk. By increasing
|
|
<varname>checkpoint_segments</varname> temporarily during bulk
|
|
data loads, the number of checkpoints that are required can be
|
|
reduced.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-pitr">
|
|
<title>Turn off <varname>archive_command</varname></title>
|
|
|
|
<para>
|
|
When loading large amounts of data into an installation that uses
|
|
WAL archiving, you might want to disable archiving (unset the
|
|
<xref linkend="guc-archive-command"> configuration variable)
|
|
while loading. It might be
|
|
faster to take a new base backup after the load has completed
|
|
than to process a large amount of incremental WAL data.
|
|
</para>
|
|
|
|
<para>
|
|
Aside from avoiding the time for the archiver to process the WAL data,
|
|
doing this will actually make certain commands faster, because they
|
|
are designed not to write WAL at all if <varname>archive_command</varname>
|
|
is unset. (They can guarantee crash safety more cheaply by doing an
|
|
<function>fsync</> at the end than by writing WAL.)
|
|
This applies to the following commands:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<command>CREATE TABLE AS SELECT</command>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>CREATE INDEX</command> (and variants such as
|
|
<command>ALTER TABLE ADD PRIMARY KEY</command>)
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>ALTER TABLE SET TABLESPACE</command>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>CLUSTER</command>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<command>COPY FROM</command>, when the target table has been
|
|
created or truncated earlier in the same transaction
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-analyze">
|
|
<title>Run <command>ANALYZE</command> Afterwards</title>
|
|
|
|
<para>
|
|
Whenever you have significantly altered the distribution of data
|
|
within a table, running <xref linkend="sql-analyze"
|
|
endterm="sql-analyze-title"> is strongly recommended. This
|
|
includes bulk loading large amounts of data into the table. Running
|
|
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
|
|
ensures that the planner has up-to-date statistics about the
|
|
table. With no statistics or obsolete statistics, the planner might
|
|
make poor decisions during query planning, leading to poor
|
|
performance on any tables with inaccurate or nonexistent
|
|
statistics.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="populate-pg-dump">
|
|
<title>Some Notes About <application>pg_dump</></title>
|
|
|
|
<para>
|
|
Dump scripts generated by <application>pg_dump</> automatically apply
|
|
several, but not all, of the above guidelines. To reload a
|
|
<application>pg_dump</> dump as quickly as possible, you need to
|
|
do a few extra things manually. (Note that these points apply while
|
|
<emphasis>restoring</> a dump, not while <emphasis>creating</> it.
|
|
The same points apply when using <application>pg_restore</> to load
|
|
from a <application>pg_dump</> archive file.)
|
|
</para>
|
|
|
|
<para>
|
|
By default, <application>pg_dump</> uses <command>COPY</>, and when
|
|
it is generating a complete schema-and-data dump, it is careful to
|
|
load data before creating indexes and foreign keys. So in this case
|
|
several guidelines are handled automatically. What is left
|
|
for you to do is to:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
Set appropriate (i.e., larger than normal) values for
|
|
<varname>maintenance_work_mem</varname> and
|
|
<varname>checkpoint_segments</varname>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
If using WAL archiving, consider disabling it during the restore.
|
|
To do that, unset <varname>archive_command</varname> before loading the
|
|
dump script, and afterwards restore <varname>archive_command</varname>
|
|
and take a fresh base backup.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Consider whether the whole dump should be restored as a single
|
|
transaction. To do that, pass the <option>-1</> or
|
|
<option>--single-transaction</> command-line option to
|
|
<application>psql</> or <application>pg_restore</>. When using this
|
|
mode, even the smallest of errors will rollback the entire restore,
|
|
possibly discarding many hours of processing. Depending on how
|
|
interrelated the data is, that might seem preferable to manual cleanup,
|
|
or not. <command>COPY</> commands will run fastest if you use a single
|
|
transaction and have WAL archiving turned off.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
Run <command>ANALYZE</> afterwards.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
A data-only dump will still use <command>COPY</>, but it does not
|
|
drop or recreate indexes, and it does not normally touch foreign
|
|
keys.
|
|
|
|
<footnote>
|
|
<para>
|
|
You can get the effect of disabling foreign keys by using
|
|
the <option>--disable-triggers</> option — but realize that
|
|
that eliminates, rather than just postponing, foreign key
|
|
validation, and so it is possible to insert bad data if you use it.
|
|
</para>
|
|
</footnote>
|
|
|
|
So when loading a data-only dump, it is up to you to drop and recreate
|
|
indexes and foreign keys if you wish to use those techniques.
|
|
It's still useful to increase <varname>checkpoint_segments</varname>
|
|
while loading the data, but don't bother increasing
|
|
<varname>maintenance_work_mem</varname>; rather, you'd do that while
|
|
manually recreating indexes and foreign keys afterwards.
|
|
And don't forget to <command>ANALYZE</> when you're done.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
</chapter>
|