
Everyone using an [NOT] EXISTS subquery will have noticed that already. The bug is in "subselect.c" in the function "SS_process_sublinks()". Here the whole function as it *SHOULD BE*: Stephan
171 lines
7.0 KiB
HTML
171 lines
7.0 KiB
HTML
<HTML>
|
|
<HEAD>
|
|
<TITLE>How PostgreSQL Processes a Query</TITLE>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#FF0000" VLINK="#A00000" ALINK="#0000FF">
|
|
<H1 ALIGN=CENTER>
|
|
How PostgreSQL Processes a Query
|
|
</H1>
|
|
<H2 ALIGN=CENTER>
|
|
by Bruce Momjian
|
|
</H2>
|
|
<P>
|
|
<CENTER>
|
|
<EM><BIG>
|
|
Click on an item to see more detail or look at the full
|
|
<A HREF="backend_dirs.html">index.</A>
|
|
</BIG></EM>
|
|
<BR>
|
|
<BR>
|
|
<IMG src="flow.jpg" usemap="#flowmap" alt="flowchart">
|
|
</CENTER>
|
|
<MAP name="flowmap">
|
|
<AREA COORDS="290,10,450,50" HREF="backend_dirs.html#main">
|
|
<AREA COORDS="550,10,710,50" HREF="backend_dirs.html#bootstrap">
|
|
<AREA COORDS="290,90,450,130," HREF="backend_dirs.html#postmaster">
|
|
<AREA COORDS="550,90,710,130," HREF="backend_dirs.html#libpq">
|
|
<AREA COORDS="290,170,450,210" HREF="backend_dirs.html#tcop">
|
|
<AREA COORDS="550,170,710,210" HREF="backend_dirs.html#tcop">
|
|
<AREA COORDS="290,270,450,310" HREF="backend_dirs.html#parser">
|
|
<AREA COORDS="290,350,450,390" HREF="backend_dirs.html#tcop">
|
|
<AREA COORDS="290,430,450,470" HREF="backend_dirs.html#optimizer">
|
|
<AREA COORDS="290,510,450,550" HREF="backend_dirs.html#optimizer/plan">
|
|
<AREA COORDS="290,570,450,630" HREF="backend_dirs.html#executor">
|
|
<AREA COORDS="550,350,710,390" HREF="backend_dirs.html#commands">
|
|
<AREA COORDS="10,330,170,370" HREF="backend_dirs.html#access">
|
|
<AREA COORDS="10,390,170,430" HREF="backend_dirs.html#catalog">
|
|
<AREA COORDS="10,450,170,490" HREF="backend_dirs.html#utils">
|
|
<AREA COORDS="10,510,170,550" HREF="backend_dirs.html#nodes">
|
|
<AREA COORDS="10,570,170,610" HREF="backend_dirs.html#storage">
|
|
</MAP>
|
|
<BR>
|
|
<P>
|
|
|
|
<HR>
|
|
<P>
|
|
|
|
A query comes to the backend via data packets arriving through TCP/IP or
|
|
Unix Domain sockets. It is loaded into a string, and passed to the
|
|
<A HREF="../../backend/parser">parser,</A> where the lexical scanner,
|
|
<A HREF="../../backend/parser/scan.l">scan.l,</A> breaks the query up
|
|
into tokens(words). The parser uses <A
|
|
HREF="../../backend/parser/gram.y">gram.y</A> and the tokens to identify
|
|
the query type, and load the proper query-specific structure, like <A
|
|
HREF="../../include/nodes/parsenodes.h">CreateStmt</A> or <A
|
|
HREF="../../include/nodes/parsenodes.h">SelectStmt.</A><P>
|
|
|
|
|
|
The query is then identified as a <I>Utility</I> query or a more complex
|
|
query. A <I>Utility</I> query is processed by a query-specific function
|
|
in <A HREF="../../backend/commands"> commands.</A> A complex query, like
|
|
<I>SELECT, UPDATE,</I> and <I>DELETE</I> requires much more handling.<P>
|
|
|
|
|
|
The parser takes a complex query, and creates a
|
|
<A HREF="../../include/nodes/parsenodes.h">Query</A> structure that
|
|
contains all the elements used by complex queries. Query.qual holds the
|
|
<I>WHERE</I> clause qualification, which is filled in by <A
|
|
HREF="../../backend/parser/parse_clause.c">transformWhereClause().</A>
|
|
Each table referenced in the query is represented by a <A
|
|
HREF="../../include/nodes/parsenodes.h"> RangeTableEntry,</A> and they
|
|
are linked together to form the <I>range table</I> of the query, which
|
|
is generated by <A HREF="../../backend/parser/parse_clause.c">
|
|
makeRangeTable().</A> Query.rtable holds the query's range table.<P>
|
|
|
|
|
|
Certain queries, like <I>SELECT,</I> return columns of data. Other
|
|
queries, like <I>INSERT</I> and <I>UPDATE,</I> specify the columns
|
|
modified by the query. These column references are converted to <A
|
|
HREF="../../include/nodes/primnodes.h">Resdom</A> entries, which are
|
|
placed in <A HREF="../../include/nodes/parsenodes.h">target list
|
|
entries,</A> and linked together to make up the <I>target list</I> of
|
|
the query. The target list is stored in Query.targetList, which is
|
|
generated by <A
|
|
HREF="../../backend/parser/parse_target.c">transformTargetList().</A><P>
|
|
|
|
|
|
Other query elements, like aggregates(<I>SUM()</I>), <I>GROUP BY,</I>
|
|
and <I>ORDER BY</I> are also stored in their own Query fields.<P>
|
|
|
|
|
|
The next step is for the Query to be modified by any <I>VIEWS</I> or
|
|
<I>RULES</I> that may apply to the query. This is performed by the <A
|
|
HREF="../../backend/rewrite">rewrite</A> system.<P>
|
|
|
|
|
|
The <A HREF="../../backend/optimizer">optimizer</A> takes the Query
|
|
structure and generates an optimal <A
|
|
HREF="../..//include/nodes/plannodes.h">Plan,</A> which contains the
|
|
operations to be performed to execute the query. The <A
|
|
HREF="../../backend/optimizer/path">path</A> module determines the best
|
|
table join order and join type of each table in the RangeTable, using
|
|
Query.qual(<I>WHERE</I> clause) to consider optimal index usage.<P>
|
|
|
|
|
|
The Plan is then passed to the <A
|
|
HREF="../../backend/executor">executor</A> for execution, and the result
|
|
returned to the client. The Plan actually as set of nodes, arranged in
|
|
a tree structure with a top-level node, and various sub-nodes as
|
|
children.<P>
|
|
|
|
|
|
There are many other modules that support this basic functionality. They
|
|
can be accessed by clicking on the flowchart.<P>
|
|
|
|
|
|
<HR><P>
|
|
|
|
|
|
Another area of interest is the shared memory area, which contains data
|
|
accessable to all backends. It has table recently used data/index
|
|
blocks, locks, backend information, and lookup tables for these
|
|
structures:
|
|
|
|
<UL>
|
|
<LI>ShmemIndex - lookup shared memory addresses using structure names
|
|
<LI><A HREF="../../include/storage/buf_internals.h">Buffer
|
|
Descriptor</A> - control header for buffer cache block
|
|
<LI><A HREF="../../include/storage/buf_internals.h">Buffer Block</A> -
|
|
data/index buffer cache block
|
|
<LI>Shared Buffer Lookup Table - lookup of buffer cache block addresses
|
|
using table name and block number(<A
|
|
HREF="../../include/storage/buf_internals.h"> BufferTag</A>)
|
|
<LI>MultiLevelLockTable (ctl) - control structure for each locking
|
|
method. Currently, only multi-level locking is used(<A
|
|
HREF="../../include/storage/lock.h">LOCKMETHODCTL</A>).
|
|
<LI>MultiLevelLockTable (lock hash) - the <A
|
|
HREF="../../include/storage/lock.h">LOCK</A> structure, looked up using
|
|
relation, database object ids(<A
|
|
HREF="../../include/storage/lock.h">LOCKTAG)</A>. The lock table
|
|
structure contains the lock modes(read/write or shared/exclusive) and
|
|
circular linked list of backends (<A
|
|
HREF="../../include/storage/proc.h">PROC</A> structure pointers) waiting
|
|
on the lock.
|
|
<LI>MultiLevelLockTable (xid hash) - lookup of LOCK structure address
|
|
using transaction id, LOCK address. It is used to quickly check if the
|
|
current transaction already has any locks on a table, rather than having
|
|
to search through all the held locks. It also stores the modes
|
|
(read/write) of the locks held by the current transaction. The returned
|
|
<A HREF="../../include/storage/lock.h">XIDLookupEnt</A> structure also
|
|
contains a pointer to the backend's PROC.lockQueue.
|
|
<LI><A HREF="../../include/storage/proc.h">Proc Header</A> - information
|
|
about each backend, including locks held/waiting, indexed by process id
|
|
</UL>
|
|
|
|
Each data structure is created by calling <A
|
|
HREF="../../backend/storage/ipc/shmem.c">ShmemInitStruct(),</A> and the
|
|
lookups are created by <A
|
|
HREF="../../backend/storage/ipc/shmem.c">ShmemInitHash().</A><P>
|
|
|
|
|
|
<HR SIZE="2" NOSHADE>
|
|
<SMALL>
|
|
<ADDRESS>
|
|
Maintainer: Bruce Momjian (<A
|
|
HREF="mailto:maillist@candle.pha.pa.us">maillist@candle.pha.pa.us</A>)<BR>
|
|
Last updated: Tue Dec 9 17:56:08 EST 1997
|
|
</ADDRESS>
|
|
</SMALL>
|
|
</BODY>
|
|
</HTML>
|