Provide a bit more high-level documentation for the GEQO planner.
Per request from Luca Ferrari.
This commit is contained in:
parent
7abe764f17
commit
ddb93cac24
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.29 2007/01/31 20:56:16 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/arch-dev.sgml,v 2.30 2007/07/21 04:02:41 tgl Exp $ -->
|
||||
|
||||
<chapter id="overview">
|
||||
<title>Overview of PostgreSQL Internals</title>
|
||||
@ -345,9 +345,10 @@
|
||||
can be executed would take an excessive amount of time and memory
|
||||
space. In particular, this occurs when executing queries
|
||||
involving large numbers of join operations. In order to determine
|
||||
a reasonable (not optimal) query plan in a reasonable amount of
|
||||
time, <productname>PostgreSQL</productname> uses a <xref
|
||||
linkend="geqo" endterm="geqo-title">.
|
||||
a reasonable (not necessarily optimal) query plan in a reasonable amount
|
||||
of time, <productname>PostgreSQL</productname> uses a <xref
|
||||
linkend="geqo" endterm="geqo-title"> when the number of joins
|
||||
exceeds a threshold (see <xref linkend="guc-geqo-threshold">).
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@ -380,20 +381,17 @@
|
||||
the index's <firstterm>operator class</>, another plan is created using
|
||||
the B-tree index to scan the relation. If there are further indexes
|
||||
present and the restrictions in the query happen to match a key of an
|
||||
index further plans will be considered.
|
||||
index, further plans will be considered. Index scan plans are also
|
||||
generated for indexes that have a sort ordering that can match the
|
||||
query's <literal>ORDER BY</> clause (if any), or a sort ordering that
|
||||
might be useful for merge joining (see below).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
After all feasible plans have been found for scanning single relations,
|
||||
plans for joining relations are created. The planner/optimizer
|
||||
preferentially considers joins between any two relations for which there
|
||||
exist a corresponding join clause in the <literal>WHERE</literal> qualification (i.e. for
|
||||
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
|
||||
exists). Join pairs with no join clause are considered only when there
|
||||
is no other choice, that is, a particular relation has no available
|
||||
join clauses to any other relation. All possible plans are generated for
|
||||
every join pair considered
|
||||
by the planner/optimizer. The three possible join strategies are:
|
||||
If the query requires joining two or more relations,
|
||||
plans for joining relations are considered
|
||||
after all feasible plans have been found for scanning single relations.
|
||||
The three available join strategies are:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
@ -439,6 +437,26 @@
|
||||
cheapest one.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If the query uses fewer than <xref linkend="guc-geqo-threshold">
|
||||
relations, a near-exhaustive search is conducted to find the best
|
||||
join sequence. The planner preferentially considers joins between any
|
||||
two relations for which there exist a corresponding join clause in the
|
||||
<literal>WHERE</literal> qualification (i.e. for
|
||||
which a restriction like <literal>where rel1.attr1=rel2.attr2</literal>
|
||||
exists). Join pairs with no join clause are considered only when there
|
||||
is no other choice, that is, a particular relation has no available
|
||||
join clauses to any other relation. All possible plans are generated for
|
||||
every join pair considered by the planner, and the one that is
|
||||
(estimated to be) the cheapest is chosen.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When <varname>geqo_threshold</varname> is exceeded, the join
|
||||
sequences considered are determined by heuristics, as described
|
||||
in <xref linkend="geqo">. Otherwise the process is the same.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The finished plan tree consists of sequential or index scans of
|
||||
the base relations, plus nested-loop, merge, or hash join nodes as
|
||||
|
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.39 2007/02/16 03:50:29 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/geqo.sgml,v 1.40 2007/07/21 04:02:41 tgl Exp $ -->
|
||||
|
||||
<chapter id="geqo">
|
||||
<chapterinfo>
|
||||
@ -186,11 +186,6 @@
|
||||
<productname>PostgreSQL</productname> optimizer.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's Genitor
|
||||
algorithm.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Specific characteristics of the <acronym>GEQO</acronym>
|
||||
implementation in <productname>PostgreSQL</productname>
|
||||
@ -224,6 +219,11 @@
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Parts of the <acronym>GEQO</acronym> module are adapted from D. Whitley's
|
||||
Genitor algorithm.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <acronym>GEQO</acronym> module allows
|
||||
the <productname>PostgreSQL</productname> query optimizer to
|
||||
@ -231,6 +231,42 @@
|
||||
non-exhaustive search.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
<title>Generating Possible Plans with <acronym>GEQO</acronym></title>
|
||||
|
||||
<para>
|
||||
The <acronym>GEQO</acronym> planning process uses the standard planner
|
||||
code to generate plans for scans of individual relations. Then join
|
||||
plans are developed using the genetic approach. As shown above, each
|
||||
candidate join plan is represented by a sequence in which to join
|
||||
the base relations. In the initial stage, the <acronym>GEQO</acronym>
|
||||
code simply generates some possible join sequences at random. For each
|
||||
join sequence considered, the standard planner code is invoked to
|
||||
estimate the cost of performing the query using that join sequence.
|
||||
(For each step of the join sequence, all three possible join strategies
|
||||
are considered; and all the initially-determined relation scan plans
|
||||
are available. The estimated cost is the cheapest of these
|
||||
possibilities.) Join sequences with lower estimated cost are considered
|
||||
<quote>more fit</> than those with higher cost. The genetic algorithm
|
||||
discards the least fit candidates. Then new candidates are generated
|
||||
by combining genes of more-fit candidates — that is, by using
|
||||
randomly-chosen portions of known low-cost join sequences to create
|
||||
new sequences for consideration. This process is repeated until a
|
||||
preset number of join sequences have been considered; then the best
|
||||
one found at any time during the search is used to generate the finished
|
||||
plan.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
This process is inherently nondeterministic, because of the randomized
|
||||
choices made during both the initial population selection and subsequent
|
||||
<quote>mutation</> of the best candidates. Hence different plans may
|
||||
be selected from one run to the next, resulting in varying run time
|
||||
and varying output row order.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="geqo-future">
|
||||
<title>Future Implementation Tasks for
|
||||
<productname>PostgreSQL</> <acronym>GEQO</acronym></title>
|
||||
@ -257,6 +293,16 @@
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In the current implementation, the fitness of each candidate join
|
||||
sequence is estimated by running the standard planner's join selection
|
||||
and cost estimation code from scratch. To the extent that different
|
||||
candidates use similar sub-sequences of joins, a great deal of work
|
||||
will be repeated. This could be made significantly faster by retaining
|
||||
cost estimates for sub-joins. The problem is to avoid expending
|
||||
unreasonable amounts of memory on retaining that state.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
At a more basic level, it is not clear that solving query optimization
|
||||
with a GA algorithm designed for TSP is appropriate. In the TSP case,
|
||||
|
Loading…
Reference in New Issue
Block a user