Information moved to doc/src/sgml/geqo.sgml.
This commit is contained in:
parent
be8300b18f
commit
596731dcf8
160
doc/README.GEQO
160
doc/README.GEQO
@ -1,160 +0,0 @@
|
|||||||
|
|
||||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
|
|
||||||
Genetic Query Optimization in Database Systems
|
|
||||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*
|
|
||||||
|
|
||||||
Martin Utesch
|
|
||||||
|
|
||||||
<utesch@aut.tu-freiberg.de>
|
|
||||||
|
|
||||||
Institute of Automatic Control
|
|
||||||
University of Mining and Technology
|
|
||||||
Freiberg, Germany
|
|
||||||
|
|
||||||
02/10/1997
|
|
||||||
|
|
||||||
|
|
||||||
1.) Query Handling as a Complex Optimization Problem
|
|
||||||
====================================================
|
|
||||||
|
|
||||||
Among all relational operators the most difficult one to process and
|
|
||||||
optimize is the JOIN. The number of alternative plans to answer a query
|
|
||||||
grows exponentially with the number of JOINs included in it. Further
|
|
||||||
optimization effort is caused by the support of a variety of *JOIN
|
|
||||||
methods* (e.g., nested loop, index scan, merge join in Postgres) to
|
|
||||||
process individual JOINs and a diversity of *indices* (e.g., r-tree,
|
|
||||||
b-tree, hash in Postgres) as access paths for relations.
|
|
||||||
|
|
||||||
The current Postgres optimizer implementation performs a *near-
|
|
||||||
exhaustive search* over the space of alternative strategies. This query
|
|
||||||
optimization technique is inadequate to support database application
|
|
||||||
domains that evolve the need for extensive queries, such as artifcial
|
|
||||||
intelligence.
|
|
||||||
|
|
||||||
The Institute of Automatic Control at the University of Mining and
|
|
||||||
Technology Freiberg, Germany encountered the described problems as its
|
|
||||||
folks wanted to take the Postgres DBMS as the backend for a decision
|
|
||||||
support knowledge based system for the maintenance of an electrical
|
|
||||||
power grid. The DBMS needed to handle large JOIN queries for the
|
|
||||||
inference machine of the knowledge based system.
|
|
||||||
|
|
||||||
Performance difficulties within exploring the space of possible query
|
|
||||||
plans arose the demand for a new optimization technique being developed.
|
|
||||||
|
|
||||||
In the following we propose the implementation of a *Genetic
|
|
||||||
Algorithm* as an option for the database query optimization problem.
|
|
||||||
|
|
||||||
|
|
||||||
2.) Genetic Algorithms (GA)
|
|
||||||
===========================
|
|
||||||
|
|
||||||
The GA is a heuristic optimization method which operates through
|
|
||||||
determined, randomized search. The set of possible solutions for the
|
|
||||||
optimization problem is considered as a *population* of *individuals*.
|
|
||||||
The degree of adaption of an individual to its environment is specified
|
|
||||||
by its *fitness*.
|
|
||||||
|
|
||||||
The coordinates of an individual in the search space are represented
|
|
||||||
by *chromosomes*, in essence a set of character strings. A *gene* is a
|
|
||||||
subsection of a chromosome which encodes the value of a single parameter
|
|
||||||
being optimized. Typical encodings for a gene could be *binary* or
|
|
||||||
*integer*.
|
|
||||||
|
|
||||||
Through simulation of the evolutionary operations *recombination*,
|
|
||||||
*mutation*, and *selection* new generations of search points are found
|
|
||||||
that show a higher average fitness than their ancestors.
|
|
||||||
|
|
||||||
According to the "comp.ai.genetic" FAQ it cannot be stressed too
|
|
||||||
strongly that a GA is not a pure random search for a solution to a
|
|
||||||
problem. A GA uses stochastic processes, but the result is distinctly
|
|
||||||
non-random (better than random).
|
|
||||||
|
|
||||||
Structured Diagram of a GA:
|
|
||||||
---------------------------
|
|
||||||
|
|
||||||
P(t) generation of ancestors at a time t
|
|
||||||
P''(t) generation of descendants at a time t
|
|
||||||
|
|
||||||
+=========================================+
|
|
||||||
|>>>>>>>>>>> Algorithm GA <<<<<<<<<<<<<<|
|
|
||||||
+=========================================+
|
|
||||||
| INITIALIZE t := 0 |
|
|
||||||
+=========================================+
|
|
||||||
| INITIALIZE P(t) |
|
|
||||||
+=========================================+
|
|
||||||
| evalute FITNESS of P(t) |
|
|
||||||
+=========================================+
|
|
||||||
| while not STOPPING CRITERION do |
|
|
||||||
| +-------------------------------------+
|
|
||||||
| | P'(t) := RECOMBINATION{P(t)} |
|
|
||||||
| +-------------------------------------+
|
|
||||||
| | P''(t) := MUTATION{P'(t)} |
|
|
||||||
| +-------------------------------------+
|
|
||||||
| | P(t+1) := SELECTION{P''(t) + P(t)} |
|
|
||||||
| +-------------------------------------+
|
|
||||||
| | evalute FITNESS of P''(t) |
|
|
||||||
| +-------------------------------------+
|
|
||||||
| | t := t + 1 |
|
|
||||||
+===+=====================================+
|
|
||||||
|
|
||||||
|
|
||||||
3.) Genetic Query Optimization (GEQO) in PostgreSQL
|
|
||||||
===================================================
|
|
||||||
|
|
||||||
The GEQO module is intended for the solution of the query
|
|
||||||
optimization problem similar to a traveling salesman problem (TSP).
|
|
||||||
Possible query plans are encoded as integer strings. Each string
|
|
||||||
represents the JOIN order from one relation of the query to the next.
|
|
||||||
E. g., the query tree /\
|
|
||||||
/\ 2
|
|
||||||
/\ 3
|
|
||||||
4 1 is encoded by the integer string '4-1-3-2',
|
|
||||||
which means, first join relation '4' and '1', then '3', and
|
|
||||||
then '2', where 1, 2, 3, 4 are relids in PostgreSQL.
|
|
||||||
|
|
||||||
Parts of the GEQO module are adapted from D. Whitley's Genitor
|
|
||||||
algorithm.
|
|
||||||
|
|
||||||
Specific characteristics of the GEQO implementation in PostgreSQL
|
|
||||||
are:
|
|
||||||
|
|
||||||
o usage of a *steady state* GA (replacement of the least fit
|
|
||||||
individuals in a population, not whole-generational replacement)
|
|
||||||
allows fast convergence towards improved query plans. This is
|
|
||||||
essential for query handling with reasonable time;
|
|
||||||
|
|
||||||
o usage of *edge recombination crossover* which is especially suited
|
|
||||||
to keep edge losses low for the solution of the TSP by means of a GA;
|
|
||||||
|
|
||||||
o mutation as genetic operator is deprecated so that no repair
|
|
||||||
mechanisms are needed to generate legal TSP tours.
|
|
||||||
|
|
||||||
The GEQO module gives the following benefits to the PostgreSQL DBMS
|
|
||||||
compared to the Postgres query optimizer implementation:
|
|
||||||
|
|
||||||
o handling of large JOIN queries through non-exhaustive search;
|
|
||||||
|
|
||||||
o improved cost size approximation of query plans since no longer
|
|
||||||
plan merging is needed (the GEQO module evaluates the cost for a
|
|
||||||
query plan as an individual).
|
|
||||||
|
|
||||||
|
|
||||||
References
|
|
||||||
==========
|
|
||||||
|
|
||||||
J. Heitk"otter, D. Beasley:
|
|
||||||
---------------------------
|
|
||||||
"The Hitch-Hicker's Guide to Evolutionary Computation",
|
|
||||||
FAQ in 'comp.ai.genetic',
|
|
||||||
'ftp://ftp.Germany.EU.net/pub/research/softcomp/EC/Welcome.html'
|
|
||||||
|
|
||||||
Z. Fong:
|
|
||||||
--------
|
|
||||||
"The Design and Implementation of the Postgres Query Optimizer",
|
|
||||||
file 'planner/Report.ps' in the 'postgres-papers' distribution
|
|
||||||
|
|
||||||
R. Elmasri, S. Navathe:
|
|
||||||
-----------------------
|
|
||||||
"Fundamentals of Database Systems",
|
|
||||||
The Benjamin/Cummings Pub., Inc.
|
|
||||||
|
|
@ -1,59 +0,0 @@
|
|||||||
|
|
||||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
|
||||||
* Things left to done for the PostgreSQL *
|
|
||||||
= Genetic Query Optimization (GEQO) =
|
|
||||||
* module implementation *
|
|
||||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
|
||||||
* Martin Utesch * Institute of Automatic Control *
|
|
||||||
= = University of Mining and Technology =
|
|
||||||
* utesch@aut.tu-freiberg.de * Freiberg, Germany *
|
|
||||||
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
|
|
||||||
|
|
||||||
|
|
||||||
1.) Basic Improvements
|
|
||||||
===============================================================
|
|
||||||
|
|
||||||
a) improve freeing of memory when query is already processed:
|
|
||||||
-------------------------------------------------------------
|
|
||||||
with large JOIN queries the computing time spent for the genetic query
|
|
||||||
optimization seems to be a mere *fraction* of the time Postgres
|
|
||||||
needs for freeing memory via routine 'MemoryContextFree',
|
|
||||||
file 'backend/utils/mmgr/mcxt.c';
|
|
||||||
debugging showed that it get stucked in a loop of routine
|
|
||||||
'OrderedElemPop', file 'backend/utils/mmgr/oset.c';
|
|
||||||
the same problems arise with long queries when using the normal
|
|
||||||
Postgres query optimization algorithm;
|
|
||||||
|
|
||||||
b) improve genetic algorithm parameter settings:
|
|
||||||
------------------------------------------------
|
|
||||||
file 'backend/optimizer/geqo/geqo_params.c', routines
|
|
||||||
'gimme_pool_size' and 'gimme_number_generations';
|
|
||||||
we have to find a compromise for the parameter settings
|
|
||||||
to satisfy two competing demands:
|
|
||||||
1. optimality of the query plan
|
|
||||||
2. computing time
|
|
||||||
|
|
||||||
c) find better solution for integer overflow:
|
|
||||||
---------------------------------------------
|
|
||||||
file 'backend/optimizer/geqo/geqo_eval.c', routine
|
|
||||||
'geqo_joinrel_size';
|
|
||||||
the present hack for MAXINT overflow is to set the Postgres integer
|
|
||||||
value of 'rel->size' to its logarithm;
|
|
||||||
modifications of 'struct Rel' in 'backend/nodes/relation.h' will
|
|
||||||
surely have severe impacts on the whole PostgreSQL implementation.
|
|
||||||
|
|
||||||
d) find solution for exhausted memory:
|
|
||||||
--------------------------------------
|
|
||||||
that may occur with more than 10 relations involved in a query,
|
|
||||||
file 'backend/optimizer/geqo/geqo_eval.c', routine
|
|
||||||
'gimme_tree' which is recursively called;
|
|
||||||
maybe I forgot something to be freed correctly, but I dunno what;
|
|
||||||
of course the 'rel' data structure of the JOIN keeps growing and
|
|
||||||
growing the more relations are packed into it;
|
|
||||||
suggestions are welcome :-(
|
|
||||||
|
|
||||||
|
|
||||||
2.) Further Improvements
|
|
||||||
===============================================================
|
|
||||||
Enable bushy query tree processing within PostgreSQL;
|
|
||||||
that may improve the quality of query plans.
|
|
Loading…
x
Reference in New Issue
Block a user