Provide a bit more high-level documentation for the GEQO planner.
Per request from Luca Ferrari.
This commit is contained in:
parent
7abe764f17
commit
ddb93cac24
@ -1,4 +1,4 @@
|
|||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.29 2007/01/31 20:56:16 momjian Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.30 2007/07/21 04:02:41 tgl Exp $ -->
|
||||||
|
|
||||||
<chapter id="overview">
|
<chapter id="overview">
|
||||||
<title>Overview of PostgreSQL Internals</title>
|
<title>Overview of PostgreSQL Internals</title>
|
||||||
@ -345,9 +345,10 @@
|
|||||||
can be executed would take an excessive amount of time and memory
|
can be executed would take an excessive amount of time and memory
|
||||||
space. In particular, this occurs when executing queries
|
space. In particular, this occurs when executing queries
|
||||||
involving large numbers of join operations. In order to determine
|
involving large numbers of join operations. In order to determine
|
||||||
a reasonable (not optimal) query plan in a reasonable amount of
|
a reasonable (not necessarily optimal) query plan in a reasonable amount
|
||||||
time, <productname>PostgreSQL</productname> uses a <xref
|
of time, <productname>PostgreSQL</productname> uses a <xref
|
||||||
linkend="geqo" endterm="geqo-title">.
|
linkend="geqo" endterm="geqo-title"> when the number of joins
|
||||||
|
exceeds a threshold (see <xref linkend="guc-geqo-threshold">).
|
||||||
</para>
|
</para>
|
||||||
</note>
|
</note>
|
||||||
|
|
||||||
@ -380,20 +381,17 @@
|
|||||||
the index's <firstterm>operator class</>, another plan is created using
|
the index's <firstterm>operator class</>, another plan is created using
|
||||||
the B-tree index to scan the relation. If there are further indexes
|
the B-tree index to scan the relation. If there are further indexes
|
||||||
present and the restrictions in the query happen to match a key of an
|
present and the restrictions in the query happen to match a key of an
|
||||||
index further plans will be considered.
|
index, further plans will be considered. Index scan plans are also
|
||||||
|
generated for indexes that have a sort ordering that can match the
|
||||||
|
query's <literal>ORDER BY</> clause (if any), or a sort ordering that
|
||||||
|
might be useful for merge joining (see below).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
After all feasible plans have been found for scanning single relations,
|
If the query requires joining two or more relations,
|
||||||
plans for joining relations are created. The planner/optimizer
|
plans for joining relations are considered
|
||||||
preferentially considers joins between any two relations for which there
|
after all feasible plans have been found for scanning single relations.
|
||||||
exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for
|
The three available join strategies are:
|
||||||
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
|
|
||||||
exists). Join pairs with no join clause are considered only when there
|
|
||||||
is no other choice, that is, a particular relation has no available
|
|
||||||
join clauses to any other relation. All possible plans are generated for
|
|
||||||
every join pair considered
|
|
||||||
by the planner/optimizer. The three possible join strategies are:
|
|
||||||
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
@ -439,6 +437,26 @@
|
|||||||
cheapest one.
|
cheapest one.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
If the query uses fewer than <xref linkend="guc-geqo-threshold">
|
||||||
|
relations, a near-exhaustive search is conducted to find the best
|
||||||
|
join sequence. The planner preferentially considers joins between any
|
||||||
|
two relations for which there exist a corresponding join clause in the
|
||||||
|
<literal>WHERE</literal> qualification (i.e. for
|
||||||
|
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
|
||||||
|
exists). Join pairs with no join clause are considered only when there
|
||||||
|
is no other choice, that is, a particular relation has no available
|
||||||
|
join clauses to any other relation. All possible plans are generated for
|
||||||
|
every join pair considered by the planner, and the one that is
|
||||||
|
(estimated to be) the cheapest is chosen.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
When <varname>geqo_threshold</varname> is exceeded, the join
|
||||||
|
sequences considered are determined by heuristics, as described
|
||||||
|
in <xref linkend="geqo">. Otherwise the process is the same.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The finished plan tree consists of sequential or index scans of
|
The finished plan tree consists of sequential or index scans of
|
||||||
the base relations, plus nested-loop, merge, or hash join nodes as
|
the base relations, plus nested-loop, merge, or hash join nodes as
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.39 2007/02/16 03:50:29 momjian Exp $ -->
|
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.40 2007/07/21 04:02:41 tgl Exp $ -->
|
||||||
|
|
||||||
<chapter id="geqo">
|
<chapter id="geqo">
|
||||||
<chapterinfo>
|
<chapterinfo>
|
||||||
@ -186,11 +186,6 @@
|
|||||||
<productname>PostgreSQL</productname> optimizer.
|
<productname>PostgreSQL</productname> optimizer.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
|
||||||
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
|
|
||||||
algorithm.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Specific characteristics of the <acronym>GEQO</acronym>
|
Specific characteristics of the <acronym>GEQO</acronym>
|
||||||
implementation in <productname>PostgreSQL</productname>
|
implementation in <productname>PostgreSQL</productname>
|
||||||
@ -224,6 +219,11 @@
|
|||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's
|
||||||
|
Genitor algorithm.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The <acronym>GEQO</acronym> module allows
|
The <acronym>GEQO</acronym> module allows
|
||||||
the <productname>PostgreSQL</productname> query optimizer to
|
the <productname>PostgreSQL</productname> query optimizer to
|
||||||
@ -231,6 +231,42 @@
|
|||||||
non-exhaustive search.
|
non-exhaustive search.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<sect2>
|
||||||
|
<title>Generating Possible Plans with <acronym>GEQO</acronym></title>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
The <acronym>GEQO</acronym> planning process uses the standard planner
|
||||||
|
code to generate plans for scans of individual relations. Then join
|
||||||
|
plans are developed using the genetic approach. As shown above, each
|
||||||
|
candidate join plan is represented by a sequence in which to join
|
||||||
|
the base relations. In the initial stage, the <acronym>GEQO</acronym>
|
||||||
|
code simply generates some possible join sequences at random. For each
|
||||||
|
join sequence considered, the standard planner code is invoked to
|
||||||
|
estimate the cost of performing the query using that join sequence.
|
||||||
|
(For each step of the join sequence, all three possible join strategies
|
||||||
|
are considered; and all the initially-determined relation scan plans
|
||||||
|
are available. The estimated cost is the cheapest of these
|
||||||
|
possibilities.) Join sequences with lower estimated cost are considered
|
||||||
|
<quote>more fit</> than those with higher cost. The genetic algorithm
|
||||||
|
discards the least fit candidates. Then new candidates are generated
|
||||||
|
by combining genes of more-fit candidates — that is, by using
|
||||||
|
randomly-chosen portions of known low-cost join sequences to create
|
||||||
|
new sequences for consideration. This process is repeated until a
|
||||||
|
preset number of join sequences have been considered; then the best
|
||||||
|
one found at any time during the search is used to generate the finished
|
||||||
|
plan.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
This process is inherently nondeterministic, because of the randomized
|
||||||
|
choices made during both the initial population selection and subsequent
|
||||||
|
<quote>mutation</> of the best candidates. Hence different plans may
|
||||||
|
be selected from one run to the next, resulting in varying run time
|
||||||
|
and varying output row order.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="geqo-future">
|
<sect2 id="geqo-future">
|
||||||
<title>Future Implementation Tasks for
|
<title>Future Implementation Tasks for
|
||||||
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
||||||
@ -257,6 +293,16 @@
|
|||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<para>
|
||||||
|
In the current implementation, the fitness of each candidate join
|
||||||
|
sequence is estimated by running the standard planner's join selection
|
||||||
|
and cost estimation code from scratch. To the extent that different
|
||||||
|
candidates use similar sub-sequences of joins, a great deal of work
|
||||||
|
will be repeated. This could be made significantly faster by retaining
|
||||||
|
cost estimates for sub-joins. The problem is to avoid expending
|
||||||
|
unreasonable amounts of memory on retaining that state.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
At a more basic level, it is not clear that solving query optimization
|
At a more basic level, it is not clear that solving query optimization
|
||||||
with a GA algorithm designed for TSP is appropriate. In the TSP case,
|
with a GA algorithm designed for TSP is appropriate. In the TSP case,
|
||||||
|
Loading…
x
Reference in New Issue
Block a user