Writeup from Tom Lane on how costs are estimated.
This commit is contained in:
parent
99281cf881
commit
ccad6d685a
236
doc/src/sgml/indexcost.sgml
Normal file
236
doc/src/sgml/indexcost.sgml
Normal file
@ -0,0 +1,236 @@
|
||||
<chapter>
|
||||
<title>Index Cost Estimation Functions</title>
|
||||
|
||||
<note>
|
||||
<title>Author</title>
|
||||
|
||||
<para>
|
||||
Written by <ulink url="mailto:tgl@sss.pgh.pa.us">Tom Lane</ulink>
|
||||
on 2000-01-24.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<!--
|
||||
I have written the attached bit of doco about the new index cost
|
||||
estimator procedure definition, but I am not sure where to put it.
|
||||
There isn't (AFAICT) any existing documentation about how to make
|
||||
a new kind of index, which would be the proper place for it.
|
||||
May I impose on you to find/make a place for this and mark it up
|
||||
properly?
|
||||
|
||||
Also, doc/src/graphics/catalogs.ag needs to be updated, but I have
|
||||
no idea how. (The amopselect and amopnpages fields of pg_amop
|
||||
are gone; pg_am has a new field amcostestimate.)
|
||||
|
||||
regards, tom lane
|
||||
-->
|
||||
|
||||
<para>
|
||||
Every index access method must provide a cost estimation function for
|
||||
use by the planner/optimizer. The procedure OID of this function is
|
||||
given in the <literal>amcostestimate</literal> field of the access
|
||||
method's <literal>pg_am</literal> entry.
|
||||
|
||||
<note>
|
||||
<para>
|
||||
Prior to Postgres 7.0, a different scheme was used for registering
|
||||
index-specific cost estimation functions.
|
||||
</para>
|
||||
</note>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The amcostestimate function is given a list of WHERE clauses that have
|
||||
been determined to be usable with the index. It must return estimates
|
||||
of the cost of accessing the index and the selectivity of the WHERE
|
||||
clauses (that is, the fraction of main-table tuples that will be
|
||||
retrieved during the index scan). For simple cases, nearly all the
|
||||
work of the cost estimator can be done by calling standard routines
|
||||
in the optimizer; the point of having an amcostestimate function is
|
||||
to allow index access methods to provide index-type-specific knowledge,
|
||||
in case it is possible to improve on the standard estimates.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Each amcostestimate function must have the signature:
|
||||
|
||||
<programlisting>
|
||||
void
|
||||
amcostestimate (Query *root,
|
||||
RelOptInfo *rel,
|
||||
IndexOptInfo *index,
|
||||
List *indexQuals,
|
||||
Cost *indexAccessCost,
|
||||
Selectivity *indexSelectivity);
|
||||
</programlisting>
|
||||
|
||||
The first four parameters are inputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>root</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The query being processed.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>rel</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The relation the index is on.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>index</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The index itself.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>indexQuals</term>
|
||||
<listitem>
|
||||
<para>
|
||||
List of index qual clauses (implicitly ANDed);
|
||||
a NIL list indicates no qualifiers are available.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The last two parameters are pass-by-reference outputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>*indexAccessCost</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to cost of index processing.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexSelectivity</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to index selectivity
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that cost estimate functions must be written in C, not in SQL or
|
||||
any available procedural language, because they must access internal
|
||||
data structures of the planner/optimizer.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexAccessCost should be computed in the units used by
|
||||
src/backend/optimizer/path/costsize.c: a disk block fetch has cost 1.0,
|
||||
and the cost of processing one index tuple should usually be taken as
|
||||
cpu_index_page_weight (which is a user-adjustable optimizer parameter).
|
||||
The access cost should include all disk and CPU costs associated with
|
||||
scanning the index itself, but NOT the cost of retrieving or processing
|
||||
the main-table tuples that are identified by the index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexSelectivity should be set to the estimated fraction of the main
|
||||
table tuples that will be retrieved during the index scan. In the case
|
||||
of a lossy index, this will typically be higher than the fraction of
|
||||
tuples that actually pass the given qual conditions.
|
||||
</para>
|
||||
|
||||
<procedure>
|
||||
<title>Cost Estimation</title>
|
||||
<para>
|
||||
A typical cost estimator will proceed as follows:
|
||||
</para>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate and return the fraction of main-table tuples that will be visited
|
||||
based on the given qual conditions. In the absence of any index-type-specific
|
||||
knowledge, use the standard optimizer function clauselist_selec():
|
||||
|
||||
<programlisting>
|
||||
*indexSelectivity = clauselist_selec(root, indexQuals);
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index tuples that will be visited during the
|
||||
scan. For many index types this is the same as indexSelectivity times
|
||||
the number of tuples in the index, but it might be more. (Note that the
|
||||
index's size in pages and tuples is available from the IndexOptInfo struct.)
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index pages that will be retrieved during the scan.
|
||||
This might be just indexSelectivity times the index's size in pages.
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Compute the index access cost as
|
||||
|
||||
<programlisting>
|
||||
*indexAccessCost = numIndexPages + cpu_index_page_weight * numIndexTuples;
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>
|
||||
Examples of cost estimator functions can be found in
|
||||
<filename>src/backend/utils/adt/selfuncs.c</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
By convention, the <literal>pg_proc</literal> entry for an
|
||||
<literal>amcostestimate</literal> function should show
|
||||
|
||||
<programlisting>
|
||||
prorettype = 0
|
||||
pronargs = 6
|
||||
proargtypes = 0 0 0 0 0 0
|
||||
</programlisting>
|
||||
|
||||
We use zero ("opaque") for all the arguments since none of them have types
|
||||
that are known in pg_type.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode:sgml
|
||||
sgml-omittag:nil
|
||||
sgml-shorttag:t
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:t
|
||||
sgml-parent-document:nil
|
||||
sgml-default-dtd-file:"./reference.ced"
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:("/usr/lib/sgml/CATALOG")
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
Loading…
x
Reference in New Issue
Block a user