Write some real documentation about the index access method API.
This commit is contained in:
parent
67ff8009cf
commit
c6521b1b93
@ -1,6 +1,6 @@
|
||||
<!--
|
||||
Documentation of the system catalogs, directed toward PostgreSQL developers
|
||||
$PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.95 2005/01/05 23:42:03 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.96 2005/02/13 03:04:15 tgl Exp $
|
||||
-->
|
||||
|
||||
<chapter id="catalogs">
|
||||
@ -289,9 +289,10 @@
|
||||
</indexterm>
|
||||
|
||||
<para>
|
||||
The catalog <structname>pg_am</structname> stores information about index access
|
||||
methods. There is one row for each index access method supported by
|
||||
the system.
|
||||
The catalog <structname>pg_am</structname> stores information about index
|
||||
access methods. There is one row for each index access method supported by
|
||||
the system. The contents of this catalog are discussed in detail in
|
||||
<xref linkend="indexam">.
|
||||
</para>
|
||||
|
||||
<table>
|
||||
@ -453,20 +454,6 @@
|
||||
</tgroup>
|
||||
</table>
|
||||
|
||||
<para>
|
||||
An index access method that supports multiple columns (has
|
||||
<structfield>amcanmulticol</structfield> true) <emphasis>must</>
|
||||
support indexing null values in columns after the first, because the planner
|
||||
will assume the index can be used for queries on just the first
|
||||
column(s). For example, consider an index on (a,b) and a query with
|
||||
<literal>WHERE a = 4</literal>. The system will assume the index can be used to scan for
|
||||
rows with <literal>a = 4</literal>, which is wrong if the index omits rows where <literal>b</> is null.
|
||||
It is, however, OK to omit rows where the first indexed column is null.
|
||||
(GiST currently does so.)
|
||||
<structfield>amindexnulls</structfield> should be set true only if the
|
||||
index access method indexes all rows, including arbitrary combinations of null values.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.41 2005/01/10 00:04:38 tgl Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/filelist.sgml,v 1.42 2005/02/13 03:04:15 tgl Exp $ -->
|
||||
|
||||
<!entity history SYSTEM "history.sgml">
|
||||
<!entity info SYSTEM "info.sgml">
|
||||
@ -77,7 +77,7 @@
|
||||
<!entity catalogs SYSTEM "catalogs.sgml">
|
||||
<!entity geqo SYSTEM "geqo.sgml">
|
||||
<!entity gist SYSTEM "gist.sgml">
|
||||
<!entity indexcost SYSTEM "indexcost.sgml">
|
||||
<!entity indexam SYSTEM "indexam.sgml">
|
||||
<!entity nls SYSTEM "nls.sgml">
|
||||
<!entity plhandler SYSTEM "plhandler.sgml">
|
||||
<!entity protocol SYSTEM "protocol.sgml">
|
||||
|
837
doc/src/sgml/indexam.sgml
Normal file
837
doc/src/sgml/indexam.sgml
Normal file
@ -0,0 +1,837 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/indexam.sgml,v 2.1 2005/02/13 03:04:15 tgl Exp $
|
||||
-->
|
||||
|
||||
<chapter id="indexam">
|
||||
<title>Index Access Method Interface Definition</title>
|
||||
|
||||
<para>
|
||||
This chapter defines the interface between the core
|
||||
<productname>PostgreSQL</productname> system and <firstterm>index access
|
||||
methods</>, which manage individual index types. The core system
|
||||
knows nothing about indexes beyond what is specified here, so it is
|
||||
possible to develop entirely new index types by writing add-on code.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
All indexes in <productname>PostgreSQL</productname> are what are known
|
||||
technically as <firstterm>secondary indexes</>; that is, the index is
|
||||
physically separate from the table file that it describes. Each index
|
||||
is stored as its own physical <firstterm>relation</> and so is described
|
||||
by an entry in the <structname>pg_class</> catalog. The contents of an
|
||||
index are entirely under the control of its index access method. In
|
||||
practice, all index access methods divide indexes into standard-size
|
||||
pages so that they can use the regular storage manager and buffer manager
|
||||
to access the index contents. (All the existing index access methods
|
||||
furthermore use the standard page layout described in <xref
|
||||
linkend="storage-page-layout">, and they all use the same format for index
|
||||
tuple headers; but these decisions are not forced on an access method.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
An index is effectively a mapping from some data key values to
|
||||
<firstterm>tuple identifiers</>, or <acronym>TIDs</>, of row versions
|
||||
(tuples) in the index's parent table. A TID consists of a
|
||||
block number and an item number within that block (see <xref
|
||||
linkend="storage-page-layout">). This is sufficient
|
||||
information to fetch a particular row version from the table.
|
||||
Indexes are not directly aware that under MVCC, there may be multiple
|
||||
extant versions of the same logical row; to an index, each tuple is
|
||||
an independent object that needs its own index entry. Thus, an
|
||||
update of a row always creates all-new index entries for the row, even if
|
||||
the key values did not change. Index entries for dead tuples are
|
||||
reclaimed (by vacuuming) when the dead tuples themselves are reclaimed.
|
||||
</para>
|
||||
|
||||
<sect1 id="index-catalog">
|
||||
<title>Catalog Entries for Indexes</title>
|
||||
|
||||
<para>
|
||||
Each index access method is described by a row in the
|
||||
<structname>pg_am</structname> system catalog (see
|
||||
<xref linkend="catalog-pg-am">). The principal contents of a
|
||||
<structname>pg_am</structname> row are references to
|
||||
<link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>
|
||||
entries that identify the index access
|
||||
functions supplied by the access method. The APIs for these functions
|
||||
are defined later in this chapter. In addition, the
|
||||
<structname>pg_am</structname> row specifies a few fixed properties of
|
||||
the access method, such as whether it can support multi-column indexes.
|
||||
There is not currently any special support
|
||||
for creating or deleting <structname>pg_am</structname> entries;
|
||||
anyone able to write a new access method is expected to be competent
|
||||
to insert an appropriate row for themselves.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To be useful, an index access method must also have one or more
|
||||
<firstterm>operator classes</> defined in
|
||||
<link linkend="catalog-pg-opclass"><structname>pg_opclass</structname></link>,
|
||||
<link linkend="catalog-pg-amop"><structname>pg_amop</structname></link>, and
|
||||
<link linkend="catalog-pg-amproc"><structname>pg_amproc</structname></link>.
|
||||
These entries allow the planner
|
||||
to determine what kinds of query qualifications can be used with
|
||||
indexes of this access method. Operator classes are described
|
||||
in <xref linkend="xindex">, which is prerequisite material for reading
|
||||
this chapter.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
An individual index is defined by a
|
||||
<link linkend="catalog-pg-class"><structname>pg_class</structname></link>
|
||||
entry that describes it as a physical relation, plus a
|
||||
<link linkend="catalog-pg-index"><structname>pg_index</structname></link>
|
||||
entry that shows the logical content of the index — that is, the set
|
||||
of index columns it has and the semantics of those columns, as captured by
|
||||
the associated operator classes. The index columns (key values) can be
|
||||
either simple columns of the underlying table or expressions over the table
|
||||
rows. The index access method normally has no interest in where the index
|
||||
key values come from (it is always handed precomputed key values) but it
|
||||
will be very interested in the operator class information in
|
||||
<structname>pg_index</structname>. Both of these catalog entries can be
|
||||
accessed as part of the <structname>Relation</> data structure that is
|
||||
passed to all operations on the index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Some of the flag columns of <structname>pg_am</structname> have nonobvious
|
||||
implications. The requirements of <structfield>amcanunique</structfield>
|
||||
are discussed in <xref linkend="index-unique-checks">, and those of
|
||||
<structfield>amconcurrent</structfield> in <xref linkend="index-locking">.
|
||||
The <structfield>amcanmulticol</structfield> flag asserts that the
|
||||
access method supports multi-column indexes, while
|
||||
<structfield>amindexnulls</structfield> asserts that index entries are
|
||||
created for NULL key values. Since most indexable operators are
|
||||
strict and hence cannot return TRUE for NULL inputs,
|
||||
it is at first sight attractive to not store index entries for NULLs:
|
||||
they could never be returned by an index scan anyway. However, this
|
||||
argument fails for a full-table index scan (one with no scan keys);
|
||||
such a scan should include null rows. In practice this means that
|
||||
indexes that support ordered scans (have <structfield>amorderstrategy</>
|
||||
nonzero) must index nulls, since the planner might decide to use such a
|
||||
scan as a substitute for sorting. Another restriction is that an index
|
||||
access method that supports multiple index columns <emphasis>must</>
|
||||
support indexing null values in columns after the first, because the planner
|
||||
will assume the index can be used for queries on just the first
|
||||
column(s). For example, consider an index on (a,b) and a query with
|
||||
<literal>WHERE a = 4</literal>. The system will assume the index can be
|
||||
used to scan for rows with <literal>a = 4</literal>, which is wrong if the
|
||||
index omits rows where <literal>b</> is null.
|
||||
It is, however, OK to omit rows where the first indexed column is null.
|
||||
(GiST currently does so.) Thus,
|
||||
<structfield>amindexnulls</structfield> should be set true only if the
|
||||
index access method indexes all rows, including arbitrary combinations of
|
||||
null values.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="index-functions">
|
||||
<title>Index Access Method Functions</title>
|
||||
|
||||
<para>
|
||||
The index construction and maintenance functions that an index access
|
||||
method must provide are:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
ambuild (Relation heapRelation,
|
||||
Relation indexRelation,
|
||||
IndexInfo *indexInfo);
|
||||
</programlisting>
|
||||
Build a new index. The index relation has been physically created,
|
||||
but is empty. It must be filled in with whatever fixed data the
|
||||
access method requires, plus entries for all tuples already existing
|
||||
in the table. Ordinarily the <function>ambuild</> function will call
|
||||
<function>IndexBuildHeapScan()</> to scan the table for existing tuples
|
||||
and compute the keys that need to be inserted into the index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
InsertIndexResult
|
||||
aminsert (Relation indexRelation,
|
||||
Datum *datums,
|
||||
char *nulls,
|
||||
ItemPointer heap_tid,
|
||||
Relation heapRelation,
|
||||
bool check_uniqueness);
|
||||
</programlisting>
|
||||
Insert a new tuple into an existing index. The <literal>datums</> and
|
||||
<literal>nulls</> arrays give the key values to be indexed, and
|
||||
<literal>heap_tid</> is the TID to be indexed.
|
||||
If the access method supports unique indexes (its
|
||||
<structname>pg_am</>.<structfield>amcanunique</> flag is true) then
|
||||
<literal>check_uniqueness</> may be true, in which case the access method
|
||||
must verify that there is no conflicting row; this is the only situation in
|
||||
which the access method normally needs the <literal>heapRelation</>
|
||||
parameter. See <xref linkend="index-unique-checks"> for details.
|
||||
The result is a struct that must be pfree'd by the caller. (The result
|
||||
struct is really quite useless and should be removed...)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
IndexBulkDeleteResult *
|
||||
ambulkdelete (Relation indexRelation,
|
||||
IndexBulkDeleteCallback callback,
|
||||
void *callback_state);
|
||||
</programlisting>
|
||||
Delete tuple(s) from the index. This is a <quote>bulk delete</> operation
|
||||
that is intended to be implemented by scanning the whole index and checking
|
||||
each entry to see if it should be deleted.
|
||||
The passed-in <literal>callback</> function may be called, in the style
|
||||
<literal>callback(<replaceable>TID</>, callback_state) returns bool</literal>,
|
||||
to determine whether any particular index entry, as identified by its
|
||||
referenced TID, is to be deleted. Must return either NULL or a palloc'd
|
||||
struct containing statistics about the effects of the deletion operation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
IndexBulkDeleteResult *
|
||||
amvacuumcleanup (Relation indexRelation,
|
||||
IndexVacuumCleanupInfo *info,
|
||||
IndexBulkDeleteResult *stats);
|
||||
</programlisting>
|
||||
Clean up after a <command>VACUUM</command> operation (one or more
|
||||
<function>ambulkdelete</> calls). An index access method does not have
|
||||
to provide this function (if so, the entry in <structname>pg_am</> must
|
||||
be zero). If it is provided, it is typically used for bulk cleanup
|
||||
such as reclaiming empty index pages. <literal>info</>
|
||||
provides some additional arguments such as a message level for statistical
|
||||
reports, and <literal>stats</> is whatever the last
|
||||
<function>ambulkdelete</> call returned. <function>amvacuumcleanup</>
|
||||
may replace or modify this struct before returning it. If the result
|
||||
is not NULL it must be a palloc'd struct. The statistics it contains
|
||||
will be reported by <command>VACUUM</> if <literal>VERBOSE</> is given.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The purpose of an index, of course, is to support scans for tuples matching
|
||||
an indexable <literal>WHERE</> condition, often called a
|
||||
<firstterm>qualifier</> or <firstterm>scan key</>. The semantics of
|
||||
index scanning are described more fully in <xref linkend="index-scanning">,
|
||||
below. The scan-related functions that an index access method must provide
|
||||
are:
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
IndexScanDesc
|
||||
ambeginscan (Relation indexRelation,
|
||||
int nkeys,
|
||||
ScanKey key);
|
||||
</programlisting>
|
||||
Begin a new scan. The <literal>key</> array (of length <literal>nkeys</>)
|
||||
describes the scan key(s) for the index scan. The result must be a
|
||||
palloc'd struct. For implementation reasons the index access method
|
||||
<emphasis>must</> create this struct by calling
|
||||
<function>RelationGetIndexScan()</>. In most cases
|
||||
<function>ambeginscan</> itself does little beyond making that call;
|
||||
the interesting parts of indexscan startup are in <function>amrescan</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
boolean
|
||||
amgettuple (IndexScanDesc scan,
|
||||
ScanDirection direction);
|
||||
</programlisting>
|
||||
Fetch the next tuple in the given scan, moving in the given
|
||||
direction (forward or backward in the index). Returns TRUE if a tuple was
|
||||
obtained, FALSE if no matching tuples remain. In the TRUE case the tuple
|
||||
TID is stored into the <literal>scan</> structure. Note that
|
||||
<quote>success</> means only that the index contains an entry that matches
|
||||
the scan keys, not that the tuple necessarily still exists in the heap or
|
||||
will pass the caller's snapshot test.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
amrescan (IndexScanDesc scan,
|
||||
ScanKey key);
|
||||
</programlisting>
|
||||
Restart the given scan, possibly with new scan keys (to continue using
|
||||
the old keys, NULL is passed for <literal>key</>). Note that it is not
|
||||
possible for the number of keys to be changed. In practice the restart
|
||||
feature is used when a new outer tuple is selected by a nestloop join
|
||||
and so a new key comparison value is needed, but the scan key structure
|
||||
remains the same. This function is also called by
|
||||
<function>RelationGetIndexScan()</>, so it is used for initial setup
|
||||
of an indexscan as well as rescanning.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
amendscan (IndexScanDesc scan);
|
||||
</programlisting>
|
||||
End a scan and release resources. The <literal>scan</> struct itself
|
||||
should not be freed, but any locks or pins taken internally by the
|
||||
access method must be released.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
ammarkpos (IndexScanDesc scan);
|
||||
</programlisting>
|
||||
Mark current scan position. The access method need only support one
|
||||
remembered scan position per scan.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
amrestrpos (IndexScanDesc scan);
|
||||
</programlisting>
|
||||
Restore the scan to the most recently marked position.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<programlisting>
|
||||
void
|
||||
amcostestimate (Query *root,
|
||||
RelOptInfo *rel,
|
||||
IndexOptInfo *index,
|
||||
List *indexQuals,
|
||||
Cost *indexStartupCost,
|
||||
Cost *indexTotalCost,
|
||||
Selectivity *indexSelectivity,
|
||||
double *indexCorrelation);
|
||||
</programlisting>
|
||||
Estimate the costs of an index scan. This function is described fully
|
||||
in <xref linkend="index-cost-estimation">, below.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
By convention, the <literal>pg_proc</literal> entry for any index
|
||||
access method function should show the correct number of arguments,
|
||||
but declare them all as type <type>internal</> (since most of the arguments
|
||||
have types that are not known to SQL, and we don't want users calling
|
||||
the functions directly anyway). The return type is declared as
|
||||
<type>void</>, <type>internal</>, or <type>boolean</> as appropriate.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="index-scanning">
|
||||
<title>Index Scanning</title>
|
||||
|
||||
<para>
|
||||
In an index scan, the index access method is responsible for regurgitating
|
||||
the TIDs of all the tuples it has been told about that match the
|
||||
<firstterm>scan keys</>. The access method is <emphasis>not</> involved in
|
||||
actually fetching those tuples from the index's parent table, nor in
|
||||
determining whether they pass the scan's time qualification test or other
|
||||
conditions.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
A scan key is the internal representation of a <literal>WHERE</> clause of
|
||||
the form <replaceable>index_key</> <replaceable>operator</>
|
||||
<replaceable>constant</>, where the index key is one of the columns of the
|
||||
index and the operator is one of the members of the operator class
|
||||
associated with that index column. An index scan has zero or more scan
|
||||
keys, which are implicitly ANDed — the returned tuples are expected
|
||||
to satisfy all the indicated conditions.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The operator class may indicate that the index is <firstterm>lossy</> for a
|
||||
particular operator; this implies that the index scan will return all the
|
||||
entries that pass the scan key, plus possibly additional entries that do
|
||||
not. The core system's indexscan machinery will then apply that operator
|
||||
again to the heap tuple to verify whether or not it really should be
|
||||
selected. For non-lossy operators, the index scan must return exactly the
|
||||
set of matching entries, as there is no recheck.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that it is entirely up to the access method to ensure that it
|
||||
correctly finds all and only the entries passing all the given scan keys.
|
||||
Also, the core system will simply hand off all the <literal>WHERE</>
|
||||
clauses that match the index keys and operator classes, without any
|
||||
semantic analysis to determine whether they are redundant or
|
||||
contradictory. As an example, given
|
||||
<literal>WHERE x > 4 AND x > 14</> where <literal>x</> is a b-tree
|
||||
indexed column, it is left to the b-tree <function>amrescan</> function
|
||||
to realize that the first scan key is redundant and can be discarded.
|
||||
The extent of preprocessing needed during <function>amrescan</> will
|
||||
depend on the extent to which the index access method needs to reduce
|
||||
the scan keys to a <quote>normalized</> form.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <function>amgettuple</> function has a <literal>direction</> argument,
|
||||
which can be either <literal>ForwardScanDirection</> (the normal case)
|
||||
or <literal>BackwardScanDirection</>. If the first call after
|
||||
<function>amrescan</> specifies <literal>BackwardScanDirection</>, then the
|
||||
set of matching index entries is to be scanned back-to-front rather than in
|
||||
the normal front-to-back direction, so <function>amgettuple</> must return
|
||||
the last matching tuple in the index, rather than the first one as it
|
||||
normally would. (This will only occur for access
|
||||
methods that advertise they support ordered scans by setting
|
||||
<structname>pg_am</>.<structfield>amorderstrategy</> nonzero.) After the
|
||||
first call, <function>amgettuple</> must be prepared to advance the scan in
|
||||
either direction from the most recently returned entry.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The access method must support <quote>marking</> a position in a scan
|
||||
and later returning to the marked position. The same position may be
|
||||
restored multiple times. However, only one position need be remembered
|
||||
per scan; a new <function>ammarkpos</> call overrides the previously
|
||||
marked position.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Both the scan position and the mark position (if any) must be maintained
|
||||
consistently in the face of concurrent insertions or deletions in the
|
||||
index. It is OK if a freshly-inserted entry is not returned by a scan that
|
||||
would have found the entry if it had existed when the scan started, or for
|
||||
the scan to return such an entry upon rescanning or backing
|
||||
up even though it had not been returned the first time through. Similarly,
|
||||
a concurrent delete may or may not be reflected in the results of a scan.
|
||||
What is important is that insertions or deletions not cause the scan to
|
||||
miss or multiply return entries that were not themselves being inserted or
|
||||
deleted. (For an index type that does not set
|
||||
<structname>pg_am</>.<structfield>amconcurrent</>, it is sufficient to
|
||||
handle these cases for insertions or deletions performed by the same
|
||||
backend that's doing the scan. But when <structfield>amconcurrent</> is
|
||||
true, insertions or deletions from other backends must be handled as well.)
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="index-locking">
|
||||
<title>Index Locking Considerations</title>
|
||||
|
||||
<para>
|
||||
An index access method can choose whether it supports concurrent updates
|
||||
of the index by multiple processes. If the method's
|
||||
<structname>pg_am</>.<structfield>amconcurrent</> flag is true, then
|
||||
the core <productname>PostgreSQL</productname> system obtains
|
||||
<literal>AccessShareLock</> on the index during an index scan, and
|
||||
<literal>RowExclusiveLock</> when updating the index. Since these lock
|
||||
types do not conflict, the access method is responsible for handling any
|
||||
fine-grained locking it may need. An exclusive lock on the index as a whole
|
||||
will be taken only during index creation, destruction, or
|
||||
<literal>REINDEX</>. When <structfield>amconcurrent</> is false,
|
||||
<productname>PostgreSQL</productname> still obtains
|
||||
<literal>AccessShareLock</> during index scans, but it obtains
|
||||
<literal>AccessExclusiveLock</> during any update. This ensures that
|
||||
updaters have sole use of the index. Note that this implicitly assumes
|
||||
that index scans are read-only; an access method that might modify the
|
||||
index during a scan will still have to do its own locking to handle the
|
||||
case of concurrent scans.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Recall that a backend's own locks never conflict; therefore, even a
|
||||
non-concurrent index type must be prepared to handle the case where
|
||||
a backend is inserting or deleting entries in an index that it is itself
|
||||
scanning. (This is of course necessary to support an <command>UPDATE</>
|
||||
that uses the index to find the rows to be updated.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Building an index type that supports concurrent updates usually requires
|
||||
extensive and subtle analysis of the required behavior. For the b-tree
|
||||
and hash index types, you can read about the design decisions involved in
|
||||
<filename>src/backend/access/nbtree/README</> and
|
||||
<filename>src/backend/access/hash/README</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Aside from the index's own internal consistency requirements, concurrent
|
||||
updates create issues about consistency between the parent table (the
|
||||
<firstterm>heap</>) and the index. Because
|
||||
<productname>PostgreSQL</productname> separates accesses
|
||||
and updates of the heap from those of the index, there are windows in
|
||||
which the index may be inconsistent with the heap. We handle this problem
|
||||
with the following rules:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
A new heap entry is made before making its index entries. (Therefore
|
||||
a concurrent index scan is likely to fail to see the heap entry.
|
||||
This is okay because the index reader would be uninterested in an
|
||||
uncommitted row anyway. But see <xref linkend="index-unique-checks">.)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
When a heap entry is to be deleted (by <command>VACUUM</>), all its
|
||||
index entries must be removed first.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
For concurrent index types, an indexscan must maintain a pin
|
||||
on the index page holding the item last returned by
|
||||
<function>amgettuple</>, and <function>ambulkdelete</> cannot delete
|
||||
entries from pages that are pinned by other backends. The need
|
||||
for this rule is explained below.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
If an index is concurrent then it is possible for an index reader to
|
||||
see an index entry just before it is removed by <command>VACUUM</>, and
|
||||
then to arrive at the corresponding heap entry after that was removed by
|
||||
<command>VACUUM</>. (With a nonconcurrent index, this is not possible
|
||||
because of the conflicting index-level locks that will be taken out.)
|
||||
This creates no serious problems if that item
|
||||
number is still unused when the reader reaches it, since an empty
|
||||
item slot will be ignored by <function>heap_fetch()</>. But what if a
|
||||
third backend has already re-used the item slot for something else?
|
||||
When using an MVCC-compliant snapshot, there is no problem because
|
||||
the new occupant of the slot is certain to be too new to pass the
|
||||
snapshot test. However, with a non-MVCC-compliant snapshot (such as
|
||||
<literal>SnapshotNow</>), it would be possible to accept and return
|
||||
a row that does not in fact match the scan keys. We could defend
|
||||
against this scenario by requiring the scan keys to be rechecked
|
||||
against the heap row in all cases, but that is too expensive. Instead,
|
||||
we use a pin on an index page as a proxy to indicate that the reader
|
||||
may still be <quote>in flight</> from the index entry to the matching
|
||||
heap entry. Making <function>ambulkdelete</> block on such a pin ensures
|
||||
that <command>VACUUM</> cannot delete the heap entry before the reader
|
||||
is done with it. This solution costs little in runtime, and adds blocking
|
||||
overhead only in the rare cases where there actually is a conflict.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This solution requires that index scans be <quote>synchronous</>: we have
|
||||
to fetch each heap tuple immediately after scanning the corresponding index
|
||||
entry. This is expensive for a number of reasons. An
|
||||
<quote>asynchronous</> scan in which we collect many TIDs from the index,
|
||||
and only visit the heap tuples sometime later, requires much less index
|
||||
locking overhead and may allow a more efficient heap access pattern.
|
||||
Per the above analysis, we must use the synchronous approach for
|
||||
non-MVCC-compliant snapshots, but an asynchronous scan would be safe
|
||||
for a query using an MVCC snapshot. This possibility is not exploited
|
||||
as of <productname>PostgreSQL</productname> 8.0, but it is likely to be
|
||||
investigated soon.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="index-unique-checks">
|
||||
<title>Index Uniqueness Checks</title>
|
||||
|
||||
<para>
|
||||
<productname>PostgreSQL</productname> enforces SQL uniqueness constraints
|
||||
using <firstterm>unique indexes</>, which are indexes that disallow
|
||||
multiple entries with identical keys. An access method that supports this
|
||||
feature sets <structname>pg_am</>.<structfield>amcanunique</> true.
|
||||
(At present, only b-tree supports it.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Because of MVCC, it is always necessary to allow duplicate entries to
|
||||
exist physically in an index: the entries might refer to successive
|
||||
versions of a single logical row. The behavior we actually want to
|
||||
enforce is that no MVCC snapshot could include two rows with equal
|
||||
index keys. This breaks down into the following cases that must be
|
||||
checked when inserting a new row into a unique index:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
If a conflicting valid row has been deleted by the current transaction,
|
||||
it's okay. (In particular, since an UPDATE always deletes the old row
|
||||
version before inserting the new version, this will allow an UPDATE on
|
||||
a row without changing the key.)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
If a conflicting row has been inserted by an as-yet-uncommitted
|
||||
transaction, the would-be inserter must wait to see if that transaction
|
||||
commits. If it rolls back then there is no conflict. If it commits
|
||||
without deleting the conflicting row again, there is a uniqueness
|
||||
violation. (In practice we just wait for the other transaction to
|
||||
end and then redo the visibility check in toto.)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Similarly, if a conflicting valid row has been deleted by an
|
||||
as-yet-uncommitted transaction, the would-be inserter must wait
|
||||
for that transaction to commit or abort, and then repeat the test.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We require the index access method to apply these tests itself, which
|
||||
means that it must reach into the heap to check the commit status of
|
||||
any row that is shown to have a duplicate key according to the index
|
||||
contents. This is without a doubt ugly and non-modular, but it saves
|
||||
redundant work: if we did a separate probe then the index lookup for
|
||||
a conflicting row would be essentially repeated while finding the place to
|
||||
insert the new row's index entry. What's more, there is no obvious way
|
||||
to avoid race conditions unless the conflict check is an integral part
|
||||
of insertion of the new index entry.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The main limitation of this scheme is that it has no convenient way
|
||||
to support deferred uniqueness checks.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="index-cost-estimation">
|
||||
<title>Index Cost Estimation Functions</title>
|
||||
|
||||
<para>
|
||||
The amcostestimate function is given a list of WHERE clauses that have
|
||||
been determined to be usable with the index. It must return estimates
|
||||
of the cost of accessing the index and the selectivity of the WHERE
|
||||
clauses (that is, the fraction of parent-table rows that will be
|
||||
retrieved during the index scan). For simple cases, nearly all the
|
||||
work of the cost estimator can be done by calling standard routines
|
||||
in the optimizer; the point of having an amcostestimate function is
|
||||
to allow index access methods to provide index-type-specific knowledge,
|
||||
in case it is possible to improve on the standard estimates.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Each amcostestimate function must have the signature:
|
||||
|
||||
<programlisting>
|
||||
void
|
||||
amcostestimate (Query *root,
|
||||
RelOptInfo *rel,
|
||||
IndexOptInfo *index,
|
||||
List *indexQuals,
|
||||
Cost *indexStartupCost,
|
||||
Cost *indexTotalCost,
|
||||
Selectivity *indexSelectivity,
|
||||
double *indexCorrelation);
|
||||
</programlisting>
|
||||
|
||||
The first four parameters are inputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>root</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The query being processed.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>rel</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The relation the index is on.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>index</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The index itself.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>indexQuals</term>
|
||||
<listitem>
|
||||
<para>
|
||||
List of index qual clauses (implicitly ANDed);
|
||||
a NIL list indicates no qualifiers are available.
|
||||
Note that the list contains expression trees, not ScanKeys.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The last four parameters are pass-by-reference outputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>*indexStartupCost</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to cost of index start-up processing
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexTotalCost</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to total cost of index processing
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexSelectivity</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to index selectivity
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexCorrelation</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to correlation coefficient between index scan order and
|
||||
underlying table's order
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that cost estimate functions must be written in C, not in SQL or
|
||||
any available procedural language, because they must access internal
|
||||
data structures of the planner/optimizer.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The index access costs should be computed in the units used by
|
||||
<filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
|
||||
has cost 1.0, a nonsequential fetch has cost random_page_cost, and
|
||||
the cost of processing one index row should usually be taken as
|
||||
cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
|
||||
In addition, an appropriate multiple of cpu_operator_cost should be charged
|
||||
for any comparison operators invoked during index processing (especially
|
||||
evaluation of the indexQuals themselves).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The access costs should include all disk and CPU costs associated with
|
||||
scanning the index itself, but NOT the costs of retrieving or processing
|
||||
the parent-table rows that are identified by the index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
|
||||
before we can begin to fetch the first row. For most indexes this can
|
||||
be taken as zero, but an index type with a high start-up cost might want
|
||||
to set it nonzero.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexSelectivity should be set to the estimated fraction of the parent
|
||||
table rows that will be retrieved during the index scan. In the case
|
||||
of a lossy index, this will typically be higher than the fraction of
|
||||
rows that actually pass the given qual conditions.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexCorrelation should be set to the correlation (ranging between
|
||||
-1.0 and 1.0) between the index order and the table order. This is used
|
||||
to adjust the estimate for the cost of fetching rows from the parent
|
||||
table.
|
||||
</para>
|
||||
|
||||
<procedure>
|
||||
<title>Cost Estimation</title>
|
||||
<para>
|
||||
A typical cost estimator will proceed as follows:
|
||||
</para>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate and return the fraction of parent-table rows that will be visited
|
||||
based on the given qual conditions. In the absence of any index-type-specific
|
||||
knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
|
||||
|
||||
<programlisting>
|
||||
*indexSelectivity = clauselist_selectivity(root, indexQuals,
|
||||
rel->relid, JOIN_INNER);
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index rows that will be visited during the
|
||||
scan. For many index types this is the same as indexSelectivity times
|
||||
the number of rows in the index, but it might be more. (Note that the
|
||||
index's size in pages and rows is available from the IndexOptInfo struct.)
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index pages that will be retrieved during the scan.
|
||||
This might be just indexSelectivity times the index's size in pages.
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Compute the index access cost. A generic estimator might do this:
|
||||
|
||||
<programlisting>
|
||||
/*
|
||||
* Our generic assumption is that the index pages will be read
|
||||
* sequentially, so they have cost 1.0 each, not random_page_cost.
|
||||
* Also, we charge for evaluation of the indexquals at each index row.
|
||||
* All the costs are assumed to be paid incrementally during the scan.
|
||||
*/
|
||||
cost_qual_eval(&index_qual_cost, indexQuals);
|
||||
*indexStartupCost = index_qual_cost.startup;
|
||||
*indexTotalCost = numIndexPages +
|
||||
(cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the index correlation. For a simple ordered index on a single
|
||||
field, this can be retrieved from pg_statistic. If the correlation
|
||||
is not known, the conservative estimate is zero (no correlation).
|
||||
</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>
|
||||
Examples of cost estimator functions can be found in
|
||||
<filename>src/backend/utils/adt/selfuncs.c</filename>.
|
||||
</para>
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode:sgml
|
||||
sgml-omittag:nil
|
||||
sgml-shorttag:t
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:t
|
||||
sgml-parent-document:nil
|
||||
sgml-default-dtd-file:"./reference.ced"
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
@ -1,285 +0,0 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/indexcost.sgml,v 2.19 2005/01/22 22:06:17 momjian Exp $
|
||||
-->
|
||||
|
||||
<chapter id="indexcost">
|
||||
<title>Index Cost Estimation Functions</title>
|
||||
|
||||
<note>
|
||||
<title>Author</title>
|
||||
|
||||
<para>
|
||||
Written by Tom Lane (<email>tgl@sss.pgh.pa.us</email>) on 2000-01-24
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
This must eventually become part of a much larger chapter about
|
||||
writing new index access methods.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
Every index access method must provide a cost estimation function for
|
||||
use by the planner/optimizer. The procedure OID of this function is
|
||||
given in the <literal>amcostestimate</literal> field of the access
|
||||
method's <literal>pg_am</literal> entry.
|
||||
|
||||
<note>
|
||||
<para>
|
||||
Prior to <productname>PostgreSQL</productname> 7.0, a different
|
||||
scheme was used for registering
|
||||
index-specific cost estimation functions.
|
||||
</para>
|
||||
</note>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The amcostestimate function is given a list of WHERE clauses that have
|
||||
been determined to be usable with the index. It must return estimates
|
||||
of the cost of accessing the index and the selectivity of the WHERE
|
||||
clauses (that is, the fraction of main-table rows that will be
|
||||
retrieved during the index scan). For simple cases, nearly all the
|
||||
work of the cost estimator can be done by calling standard routines
|
||||
in the optimizer; the point of having an amcostestimate function is
|
||||
to allow index access methods to provide index-type-specific knowledge,
|
||||
in case it is possible to improve on the standard estimates.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Each amcostestimate function must have the signature:
|
||||
|
||||
<programlisting>
|
||||
void
|
||||
amcostestimate (Query *root,
|
||||
RelOptInfo *rel,
|
||||
IndexOptInfo *index,
|
||||
List *indexQuals,
|
||||
Cost *indexStartupCost,
|
||||
Cost *indexTotalCost,
|
||||
Selectivity *indexSelectivity,
|
||||
double *indexCorrelation);
|
||||
</programlisting>
|
||||
|
||||
The first four parameters are inputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>root</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The query being processed.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>rel</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The relation the index is on.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>index</term>
|
||||
<listitem>
|
||||
<para>
|
||||
The index itself.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>indexQuals</term>
|
||||
<listitem>
|
||||
<para>
|
||||
List of index qual clauses (implicitly ANDed);
|
||||
a NIL list indicates no qualifiers are available.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The last four parameters are pass-by-reference outputs:
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>*indexStartupCost</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to cost of index start-up processing
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexTotalCost</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to total cost of index processing
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexSelectivity</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to index selectivity
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>*indexCorrelation</term>
|
||||
<listitem>
|
||||
<para>
|
||||
Set to correlation coefficient between index scan order and
|
||||
underlying table's order
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that cost estimate functions must be written in C, not in SQL or
|
||||
any available procedural language, because they must access internal
|
||||
data structures of the planner/optimizer.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The index access costs should be computed in the units used by
|
||||
<filename>src/backend/optimizer/path/costsize.c</filename>: a sequential disk block fetch
|
||||
has cost 1.0, a nonsequential fetch has cost random_page_cost, and
|
||||
the cost of processing one index row should usually be taken as
|
||||
cpu_index_tuple_cost (which is a user-adjustable optimizer parameter).
|
||||
In addition, an appropriate multiple of cpu_operator_cost should be charged
|
||||
for any comparison operators invoked during index processing (especially
|
||||
evaluation of the indexQuals themselves).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The access costs should include all disk and CPU costs associated with
|
||||
scanning the index itself, but NOT the costs of retrieving or processing
|
||||
the main-table rows that are identified by the index.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <quote>start-up cost</quote> is the part of the total scan cost that must be expended
|
||||
before we can begin to fetch the first row. For most indexes this can
|
||||
be taken as zero, but an index type with a high start-up cost might want
|
||||
to set it nonzero.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexSelectivity should be set to the estimated fraction of the main
|
||||
table rows that will be retrieved during the index scan. In the case
|
||||
of a lossy index, this will typically be higher than the fraction of
|
||||
rows that actually pass the given qual conditions.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The indexCorrelation should be set to the correlation (ranging between
|
||||
-1.0 and 1.0) between the index order and the table order. This is used
|
||||
to adjust the estimate for the cost of fetching rows from the main
|
||||
table.
|
||||
</para>
|
||||
|
||||
<procedure>
|
||||
<title>Cost Estimation</title>
|
||||
<para>
|
||||
A typical cost estimator will proceed as follows:
|
||||
</para>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate and return the fraction of main-table rows that will be visited
|
||||
based on the given qual conditions. In the absence of any index-type-specific
|
||||
knowledge, use the standard optimizer function <function>clauselist_selectivity()</function>:
|
||||
|
||||
<programlisting>
|
||||
*indexSelectivity = clauselist_selectivity(root, indexQuals,
|
||||
rel->relid, JOIN_INNER);
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index rows that will be visited during the
|
||||
scan. For many index types this is the same as indexSelectivity times
|
||||
the number of rows in the index, but it might be more. (Note that the
|
||||
index's size in pages and rows is available from the IndexOptInfo struct.)
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the number of index pages that will be retrieved during the scan.
|
||||
This might be just indexSelectivity times the index's size in pages.
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Compute the index access cost. A generic estimator might do this:
|
||||
|
||||
<programlisting>
|
||||
/*
|
||||
* Our generic assumption is that the index pages will be read
|
||||
* sequentially, so they have cost 1.0 each, not random_page_cost.
|
||||
* Also, we charge for evaluation of the indexquals at each index row.
|
||||
* All the costs are assumed to be paid incrementally during the scan.
|
||||
*/
|
||||
cost_qual_eval(&index_qual_cost, indexQuals);
|
||||
*indexStartupCost = index_qual_cost.startup;
|
||||
*indexTotalCost = numIndexPages +
|
||||
(cpu_index_tuple_cost + index_qual_cost.per_tuple) * numIndexTuples;
|
||||
</programlisting>
|
||||
</para>
|
||||
</step>
|
||||
|
||||
<step>
|
||||
<para>
|
||||
Estimate the index correlation. For a simple ordered index on a single
|
||||
field, this can be retrieved from pg_statistic. If the correlation
|
||||
is not known, the conservative estimate is zero (no correlation).
|
||||
</para>
|
||||
</step>
|
||||
</procedure>
|
||||
|
||||
<para>
|
||||
Examples of cost estimator functions can be found in
|
||||
<filename>src/backend/utils/adt/selfuncs.c</filename>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
By convention, the <literal>pg_proc</literal> entry for an
|
||||
<literal>amcostestimate</literal> function should show
|
||||
eight arguments all declared as <type>internal</> (since none of them have
|
||||
types that are known to SQL), and the return type is <type>void</>.
|
||||
</para>
|
||||
</chapter>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode:sgml
|
||||
sgml-omittag:nil
|
||||
sgml-shorttag:t
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:t
|
||||
sgml-parent-document:nil
|
||||
sgml-default-dtd-file:"./reference.ced"
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.73 2005/01/10 00:04:38 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.74 2005/02/13 03:04:15 tgl Exp $
|
||||
-->
|
||||
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
|
||||
@ -235,7 +235,7 @@ $PostgreSQL: pgsql/doc/src/sgml/postgres.sgml,v 1.73 2005/01/10 00:04:38 tgl Exp
|
||||
&nls;
|
||||
&plhandler;
|
||||
&geqo;
|
||||
&indexcost;
|
||||
&indexam;
|
||||
&gist;
|
||||
&storage;
|
||||
&bki;
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.38 2005/01/23 00:30:18 momjian Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.39 2005/02/13 03:04:15 tgl Exp $
|
||||
-->
|
||||
|
||||
<sect1 id="xindex">
|
||||
@ -43,7 +43,7 @@ $PostgreSQL: pgsql/doc/src/sgml/xindex.sgml,v 1.38 2005/01/23 00:30:18 momjian E
|
||||
described in <classname>pg_am</classname>. It is possible to add a
|
||||
new index method by defining the required interface routines and
|
||||
then creating a row in <classname>pg_am</classname> — but that is
|
||||
far beyond the scope of this chapter.
|
||||
beyond the scope of this chapter (see <xref linkend="indexam">).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -514,7 +514,7 @@ CREATE OPERATOR < (
|
||||
<listitem>
|
||||
<para>
|
||||
Although <productname>PostgreSQL</productname> can cope with
|
||||
functions having the same name as long as they have different
|
||||
functions having the same SQL name as long as they have different
|
||||
argument data types, C can only cope with one global function
|
||||
having a given name. So we shouldn't name the C function
|
||||
something simple like <filename>abs_eq</filename>. Usually it's
|
||||
@ -525,14 +525,12 @@ CREATE OPERATOR < (
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
We could have made the <productname>PostgreSQL</productname> name
|
||||
We could have made the SQL name
|
||||
of the function <filename>abs_eq</filename>, relying on
|
||||
<productname>PostgreSQL</productname> to distinguish it by
|
||||
argument data types from any other
|
||||
<productname>PostgreSQL</productname> function of the same name.
|
||||
argument data types from any other SQL function of the same name.
|
||||
To keep the example simple, we make the function have the same
|
||||
names at the C level and <productname>PostgreSQL</productname>
|
||||
level.
|
||||
names at the C level and SQL level.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
Loading…
x
Reference in New Issue
Block a user