postgres/doc/TODO.detail/subquery

From vadim@krs.ru Fri Aug  6 00:02:02 1999
Received: from sunpine.krs.ru (SunPine.krs.ru [195.161.16.37])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA22890
	for <maillist@candle.pha.pa.us>; Fri, 6 Aug 1999 00:02:00 -0400 (EDT)
Received: from krs.ru (dune.krs.ru [195.161.16.38])
	by sunpine.krs.ru (8.8.8/8.8.8) with ESMTP id MAA23302;
	Fri, 6 Aug 1999 12:01:59 +0800 (KRSS)
Sender: root@sunpine.krs.ru
Message-ID: <37AA5E35.66C03F2E@krs.ru>
Date: Fri, 06 Aug 1999 12:01:57 +0800
From: Vadim Mikheev <vadim@krs.ru>
Organization: OJSC Rostelecom (Krasnoyarsk)
X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.0-RELEASE i386)
X-Accept-Language: ru, en
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: Tom Lane <tgl@sss.pgh.pa.us>, pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries
References: <199908060331.XAA22277@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO

Bruce Momjian wrote:
> 
> Isn't it something that takes only a few hours to implement.  We can't
> keep telling people to us EXISTS, especially because most SQL people
> think correlated queries are slower that non-correlated ones.  Can we
> just on-the-fly rewrite the query to use exists?

This seems easy to implement. We could look does subquery have
aggregates or not before calling union_planner() in
subselect.c:_make_subplan() and rewrite it (change 
slink->subLinkType from IN to EXISTS and add quals).

Without caching implemented IN-->EXISTS rewriting always
has sence.

After implementation of caching we probably should call union_planner()
for both original/modified subqueries and compare costs/sizes
of EXISTS/IN_with_caching plans and maybe even make
decision what plan to use after parent query is planned
and we know for how many parent rows subplan will be executed.

Vadim

From tgl@sss.pgh.pa.us Fri Aug  6 00:15:23 1999
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
	by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id AAA23058
	for <maillist@candle.pha.pa.us>; Fri, 6 Aug 1999 00:15:22 -0400 (EDT)
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
	by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id AAA06786;
	Fri, 6 Aug 1999 00:14:50 -0400 (EDT)
To: Bruce Momjian <maillist@candle.pha.pa.us>
cc: Vadim Mikheev <vadim@krs.ru>, pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Idea for speeding up uncorrelated subqueries 
In-reply-to: Your message of Thu, 5 Aug 1999 23:31:01 -0400 (EDT) 
             <199908060331.XAA22277@candle.pha.pa.us> 
Date: Fri, 06 Aug 1999 00:14:50 -0400
Message-ID: <6783.933912890@sss.pgh.pa.us>
From: Tom Lane <tgl@sss.pgh.pa.us>
Status: RO

Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Isn't it something that takes only a few hours to implement.  We can't
> keep telling people to us EXISTS, especially because most SQL people
> think correlated queries are slower that non-correlated ones.  Can we
> just on-the-fly rewrite the query to use exists?

I was just about to suggest exactly that.  The "IN (subselect)"
notation seems to be a lot more intuitive --- at least, people
keep coming up with it --- so why not rewrite it to the EXISTS
form, if we can handle that more efficiently?

			regards, tom lane

From aixssd!darrenk@abs.net Thu Dec  5 10:30:53 1996
Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
          id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
          id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
Date: Thu, 5 Dec 1996 10:07:56 -0500
From: aixssd!darrenk@abs.net (Darren King)
Message-Id: <9612051507.AA34942@ceodev>
To: maillist@candle.pha.pa.us
Subject: Subselect info.
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
Status: OR

> Any of them deal with implementing subselects?

There's a white paper at the www.sybase.com that might
help a little.  It's just a copy of a presentation
given by the optimizer guru there.  Nothing code-wise,
but he gives a few ways of flattening them with temp
tables, etc...

Darren 

From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
	for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
Sender: root@www.krasnet.ru
Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
Date: Fri, 22 Aug 1997 12:04:31 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects
References: <199708220219.WAA23745@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Considering the complexity of the primary/secondary changes you are
> making, I believe subselects will be easier than that.

I don't do changes for P/F keys - just thinking...
Yes, I think that impl of referential integrity is
more complex work.

As for subselects:

in plannodes.h

typedef struct Plan {
...
    struct Plan         *lefttree;
    struct Plan         *righttree;
} Plan;

/* ----------------
 *  these are are defined to avoid confusion problems with "left"
                                   ^^^^^^^^^^^^^^^^^^
 *  and "right" and "inner" and "outer".  The convention is that   
 *  the "left" plan is the "outer" plan and the "right" plan is
 *  the inner plan, but these make the code more readable.
 * ----------------
 */
#define innerPlan(node)         (((Plan *)(node))->righttree)
#define outerPlan(node)         (((Plan *)(node))->lefttree)

First thought is avoid any confusions by re-defining

#define rightPlan(node)         (((Plan *)(node))->righttree)
#define leftPlan(node)          (((Plan *)(node))->lefttree)

and change all occurrences of 'outer' & 'inner' in code
to 'left' & 'inner' ones:

this will allow to use 'outer' & 'inner' things for subselects
latter, without confusion. My hope is that we may change Executor
very easy by adding outer/inner plans/TupleSlots to
EState, CommonState, JoinState, etc and by doing node
processing in right order.

Subselects are mostly Planner problem.

Unfortunately, I havn't time at the moment: CHECK/DEFAULT...

Vadim

From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
	for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
Sender: root@www.krasnet.ru
Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
Date: Fri, 22 Aug 1997 12:22:37 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects
References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Vadim B. Mikheev wrote:
> 
> this will allow to use 'outer' & 'inner' things for subselects
> latter, without confusion. My hope is that we may change Executor

Or may be use 'high' & 'low' for subselecs (to avoid confusion
with outter hoins).

> very easy by adding outer/inner plans/TupleSlots to
> EState, CommonState, JoinState, etc and by doing node
> processing in right order.
             ^^^^^^^^^^^^^^
Rule is easy:
1. Uncorrelated subselect - do 'low' plan node first
2. Correlated             - do left/right first

- just some flag in structures.

Vadim

From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
	for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

The only thing I have to add to what I had written earlier is that I
think it is best to have these subqueries executed as early in query
execution as possible.

Every piece of the backend: parser, optimizer, executor, is designed to
work on a single query.  The earlier we can split up the queries, the
better those pieces will work at doing their job.  You want to be able
to use the parser and optimizer on each part of the query separately, if
you can.


Forwarded message:
> I have done some thinking about subselects.  There are basically two
> issues:
 > 
> 	Does the query return one row or several rows?  This can be
> 	determined by seeing if the user uses equals on 'IN' to join the
> 	subquery. 
> 
> 	Is the query correlated, meaning "Does the subquery reference
> 	values from the outer query?"
> 
> (We already have the third type of subquery, the INSERT...SELECT query.)
> 
> So we have these four combinations:
> 
> 	1) one row, no correlation
> 	2) multiple rows, no correlation
> 	3) one row, correlated
> 	4) multiple rows, correlated
> 
> 
> With #1, we can execute the subquery, get the value, replace the
> subquery with the constant returned from the subquery, and execute the
> outer query.
> 
> With #2, we can execute the subquery and put the result into a temporary
> table.  We then rewrite the outer query to access the temporary table
> and replace the subquery with the column name from the temporary table. 
> We probabally put an index on the temp. table, which has only one
> column, because a subquery can only return one column.  We remove the
> temp. table after query execution.
> 
> With #3 and #4, we potentially need to execute the subquery for every
> row returned by the outer query.  Performance would be horrible for
> anything but the smallest query.  Another way to handle this is to
> execute the subquery WITHOUT using any of the outer-query columns to
> restrict the WHERE clause, and add those columns used to join the outer
> variables into the target list of the subquery.  So for query:
> 
> 	select t1.name
> 	from tab t1
> 	where t1.age = (select max(t2.age)
> 		        from tab2
> 		        where tab2.name = t1.name)
> 
> Execute the subquery and put it in a temporary table:
> 
> 	select t2.name, max(t2.age)
> 	into table temp999
> 	from tab2
> 	where tab2.name = t1.name
> 
> 	create index i_temp999 on temp999 (name)
> 
> Then re-write the outer query:
> 
> 	select t1.name
> 	from tab t1, temp999
> 	where t1.age = temp999.age and
> 	      t1.name = temp999.name
> 
> The only problem here is that the subselect is running for all entries
> in tab2, even if the outer query is only going to need a few rows. 
> Determining whether to execute the subquery each time, or create a temp.
> table is often difficult to determine.  Even some non-correlated
> subqueries are better to execute for each row rather the pre-execute the
> entire subquery, expecially if the outer query returns few rows.
> 
> One requirement to handle these issues is better column statistics,
> which I am working on.
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
	for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
	Fri, 31 Oct 1997 21:37:06 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
Subject: Re: [HACKERS] subselects
To: maillist@candle.pha.pa.us (Bruce Momjian)
Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

One more issue I thought of.  You can have multiple subselects in a
single query, and subselects can have their own subselects.

This makes it particularly important that we define a system that always
is able to process the subselect BEFORE the upper select.  This will
allow use to handle all these cases without limitations.

> 
> The only thing I have to add to what I had written earlier is that I
> think it is best to have these subqueries executed as early in query
> execution as possible.
> 
> Every piece of the backend: parser, optimizer, executor, is designed to
> work on a single query.  The earlier we can split up the queries, the
> better those pieces will work at doing their job.  You want to be able
> to use the parser and optimizer on each part of the query separately, if
> you can.
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From hannu@trust.ee Sun Nov  2 10:33:33 1997
Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
	by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
	Sun, 2 Nov 1997 17:30:11 +0200
Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
Date: Sun, 02 Nov 1997 17:27:57 +0200
From: Hannu Krosing <hannu@trust.ee>
X-Mailer: Mozilla 4.02 [en] (Win95; I)
MIME-Version: 1.0
To: hackers-digest@postgresql.org
CC: maillist@candle.pha.pa.us
Subject: Re: [HACKERS] subselects
References: <199711010401.XAA09216@hub.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
> From: Bruce Momjian <maillist@candle.pha.pa.us>
> Subject: Re: [HACKERS] subselects
>
> One more issue I thought of.  You can have multiple subselects in a
> single query, and subselects can have their own subselects.
>
> This makes it particularly important that we define a system that always
> is able to process the subselect BEFORE the upper select.  This will
> allow use to handle all these cases without limitations.

This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
search criteria for the subselect,
for example you can't do

update parts p1
set parts.current_id = (
    select new_id
    from parts p2
    where p1.old_id = p2.new_id);or

select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
from parts p1;

there may be of course ways to rewrite these queries (which the optimiser should do
if it can) but IMHO, these kinds of subselects should still be allowed

> > The only thing I have to add to what I had written earlier is that I
> > think it is best to have these subqueries executed as early in query
> > execution as possible.
> >
> > Every piece of the backend: parser, optimizer, executor, is designed to
> > work on a single query.  The earlier we can split up the queries, the
> > better those pieces will work at doing their job.  You want to be able
> > to use the parser and optimizer on each part of the query separately, if
> > you can.
> >
>

Hannu


From vadim@sable.krasnoyarsk.su Sun Nov  2 21:30:59 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
Date: Mon, 03 Nov 1997 09:22:38 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199711021848.NAA08319@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > One more issue I thought of.  You can have multiple subselects in a
> > > single query, and subselects can have their own subselects.
> > >
> > > This makes it particularly important that we define a system that always
> > > is able to process the subselect BEFORE the upper select.  This will
> > > allow use to handle all these cases without limitations.
> >
> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
> > search criteria for the subselect,
> > for example you can't do
> >
> > update parts p1
> > set parts.current_id = (
> >     select new_id
> >     from parts p2
> >     where p1.old_id = p2.new_id);or
> >
> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
> > from parts p1;
> >
> > there may be of course ways to rewrite these queries (which the optimiser should do
> > if it can) but IMHO, these kinds of subselects should still be allowed
> 
> I hadn't even gotten to this point yet, but it is a good thing to keep
> in mind.
> 
> In these cases, as in correlated subqueries in the where clause, we will
> create a temporary table, and add the proper join fields and tables to
> the clauses.  Our version of UPDATE accepts a FROM section, and we will
> certainly use this for this purpose.

We can't replace subselect with join if there is aggregate
in subselect.

Actually, I don't see any problems if we going to process subselect
like sql-funcs: non-correlated subselects can be emulated by
funcs without args, for correlated subselects parser (analyze.c)
has to change all upper query references to $1, $2,...

Vadim

From vadim@sable.krasnoyarsk.su Mon Nov  3 06:07:12 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
Date: Mon, 03 Nov 1997 18:09:43 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselects
References: <199711030316.WAA15401@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > > In these cases, as in correlated subqueries in the where clause, we will
> > > create a temporary table, and add the proper join fields and tables to
> > > the clauses.  Our version of UPDATE accepts a FROM section, and we will
> > > certainly use this for this purpose.
> >
> > We can't replace subselect with join if there is aggregate
> > in subselect.
> 
> I got lost here.  Why can't we handle aggregates?

Sorry, I missed using of temp tables. Sybase uses joins (without
temp tables) for non-correlated subqueries:

    A noncorrelated subquery can be evaluated as if it were an independent query.
    Conceptually, the results of the subquery are substituted in the main statement, or
    outer query. This is not how SQL Server actually processes statements with
    subqueries. Noncorrelated subqueries can be alternatively stated as joins and
    are processed as joins by SQL Server. 

but this is not possible if there are aggregates in subquery.

> 
> My idea was this.  This is a non-correlated subquery.
...
No problems with it...

> 
> Here is a correlated example:
> 
>         select *
>         from table_a
>         where table_a.col_a in (select table_b.col_b
>                         from table_b
>                         where table_b.col_b = table_a.col_c)
> 
> rewrite as:
> 
>         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
>         into table_sub
>         from table_a, table_b

First, could we add 'where table_b.col_b = table_a.col_c' here ?
Just to avoid Cartesian results ? I hope we can.

Note that for query

        select *
        from table_a
        where table_a.col_a in (select table_b.col_b * table_a.col_c
                        from table_b)

it's better to do

	select distinct table_a.col_a
	into table table_sub
	from table_b, table_a
        where table_a.col_a = table_b.col_b * table_a.col_c

once again - to avoid Cartesians.

But what could we do for

        select *
        from table_a
        where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
                        from table_b)
???
	select max(table_b.col_b * table_a.col_c), table_a.col_a
	into table table_sub
	from table_b, table_a
        group by table_a.col_a

first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
For tables big and small with 100 000 and 1000 tuples 

select max(x*y), x from big, small group by x

"ate" all free 140M in my file system after 20 minutes (just for
sorting - nothing more) and was killed...

select x from big where x = cor(x);
(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
this is bad too.

> >
> > Actually, I don't see any problems if we going to process subselect
> > like sql-funcs: non-correlated subselects can be emulated by
> > funcs without args, for correlated subselects parser (analyze.c)
> > has to change all upper query references to $1, $2,...
> 
> Yes, logically, they are SQL functions, but aren't we going to see
> terrible performance in such circumstances.  My experience is that when
  ^^^^^^^^^^^^^^^^^^^^
You're right.

> people are given subselects, they start to do huge jobs with them.
> 
> In fact, the final solution may be to have both methods available, and
> switch between them depending on the size of the query sets.  Each
> method has its advantages.  The function example lets the outside query
> be executed, and only calls the subquery when needed.
> 
> For large tables where the subselect is small and is the entire WHERE
> restriction, the SQL function gets call much too often.  A simple join
> of the subquery result and the large table would be much better.  This
> method also allows for sort/merge join of the subquery results, and
> index use.

...keep thinking...

Vadim

From owner-pgsql-hackers@hub.org Mon Nov  3 11:01:01 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
	Mon, 3 Nov 1997 10:25:34 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
Subject: Re: [HACKERS] subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> Sorry, I missed using of temp tables. Sybase uses joins (without
> temp tables) for non-correlated subqueries:
> 
>     A noncorrelated subquery can be evaluated as if it were an independent query.
>     Conceptually, the results of the subquery are substituted in the main statement, or
>     outer query. This is not how SQL Server actually processes statements with
>     subqueries. Noncorrelated subqueries can be alternatively stated as joins and
>     are processed as joins by SQL Server. 
> 
> but this is not possible if there are aggregates in subquery.
> 
> > 
> > My idea was this.  This is a non-correlated subquery.
> ...
> No problems with it...
> 
> > 
> > Here is a correlated example:
> > 
> >         select *
> >         from table_a
> >         where table_a.col_a in (select table_b.col_b
> >                         from table_b
> >                         where table_b.col_b = table_a.col_c)
> > 
> > rewrite as:
> > 
> >         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
> >         into table_sub
> >         from table_a, table_b
> 
> First, could we add 'where table_b.col_b = table_a.col_c' here ?
> Just to avoid Cartesian results ? I hope we can.

Yes, of course.  I forgot that line here.  We can also be fancy and move
some of the outer where restrictions on table_a into the subquery.

I think the classic subquery for this would be if someone wanted all
customer names that had invoices in the past month:

select custname
from customer
where custid in (select order.custid
		 from order
		 where order.date >= "09/01/97" and
		       order.date <= "09/30/97"

In this case, the subquery can use an index on 'date' to quickly
evaluate the query, and the resulting temp table can quickly be joined
to the customer table.  If we used SQL functions, every customer would
have an order query evaluated for it, and there may be no multi-column
index on customer and date, or even if there is, this could be many
query executions.


> 
> Note that for query
> 
>         select *
>         from table_a
>         where table_a.col_a in (select table_b.col_b * table_a.col_c
>                         from table_b)
> 
> it's better to do
> 
> 	select distinct table_a.col_a
> 	into table table_sub
> 	from table_b, table_a
>         where table_a.col_a = table_b.col_b * table_a.col_c

Yes, I had not thought of cases where they are doing correlated column
arithmetic, but it looks like this would work.

> 
> once again - to avoid Cartesians.
> 
> But what could we do for
> 
>         select *
>         from table_a
>         where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
>                         from table_b)

OK, who wrote this horrible query. :-)

Without a join of table_b and table_a, even an SQL function would die on
this.  You have to take the current value table_a.col_c, and multiply by
every value of table_b.col_b to get the maximum.

Trying to do a temp table on this is certainly going to be a cartesian
product, but using an SQL function is also going to be a cartesian
product, except that the product is generated in small pieces instead of
in one big query.  The SQL function example may eventually complete, but
it will take forever to do so in cases where the temp table would bomb.

I can recommend some SQL books for anyone go sends in a bug report on
this query. :-)


> ???
> 	select max(table_b.col_b * table_a.col_c), table_a.col_a
> 	into table table_sub
> 	from table_b, table_a
>         group by table_a.col_a
> 
> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
> For tables big and small with 100 000 and 1000 tuples 
> 
> select max(x*y), x from big, small group by x
> 
> "ate" all free 140M in my file system after 20 minutes (just for
> sorting - nothing more) and was killed...
> 
> select x from big where x = cor(x);
> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
> this is bad too.

Again, my feeling is that in cases where the temp table would bomb, the
SQL function will be so slow that neither will be acceptable.

> 
> > >
> > > Actually, I don't see any problems if we going to process subselect
> > > like sql-funcs: non-correlated subselects can be emulated by
> > > funcs without args, for correlated subselects parser (analyze.c)
> > > has to change all upper query references to $1, $2,...
> > 
> > Yes, logically, they are SQL functions, but aren't we going to see
> > terrible performance in such circumstances.  My experience is that when
>   ^^^^^^^^^^^^^^^^^^^^
> You're right.
> 
> > people are given subselects, they start to do huge jobs with them.
> > 
> > In fact, the final solution may be to have both methods available, and
> > switch between them depending on the size of the query sets.  Each
> > method has its advantages.  The function example lets the outside query
> > be executed, and only calls the subquery when needed.
> > 
> > For large tables where the subselect is small and is the entire WHERE
> > restriction, the SQL function gets call much too often.  A simple join
> > of the subquery result and the large table would be much better.  This
> > method also allows for sort/merge join of the subquery results, and
> > index use.
> 
> ...keep thinking...
> 
> Vadim
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
	for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
Subject: [HACKERS] subselect
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

I am going to overhaul all the /parser files, and I may give subselects
a try while I am in there.  This is where it going to have to be done.

Two things I think I need are:

	temp tables that go away at the end of a statement, so if the
query elog's out, the temp file gets destroyed

	how do I implement "not in":

		select * from a where x not in (select y from b)

Using <> is not going to work because that returns multiple copies of a,
one for every one that doesn't equal.  It is like we need not equals,
but don't return multiple rows.

Any ideas?

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
	Thu, 20 Nov 1997 06:27:21 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
Date: Thu, 20 Nov 1997 06:27:21 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgresql.org>
Subject: Re: [HACKERS] subselect
References: <199711200457.XAA03103@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> I am going to overhaul all the /parser files

??

> , and I may give subselects
> a try while I am in there.  This is where it going to have to be done.

A first cut at the subselect syntax is already in gram.y. I'm sure that the
e-mail you had sent which collected several items regarding subselects
covers some of this topic. I've been thinking about subselects also, and
had thought that there must be some existing mechanisms in the backend
which can be used to help implement subselects. It seems to me that UNION
might be a good thing to implement first, because it has a fairly
well-defined set of behaviors:

  select a union select b;

chooses elements from a and from b and then sorts/uniques the result.

  select a union all select b;

chooses elements from a, sorts/uniques, and then adds all elements from b.

  select a union select b union all select c;

evaluates left to right, and first evaluates a union b, sorts/uniques, and
then evaluates

  (result) union all select c;

There are several types of subselects. Examples of some are:

1) select a.f from a union select b.f from b order by 1;
Needs temporary table(s), optional sort/unique, final order by.

2) select a.f from a where a.f in (select b.f from b);
Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
would be better performance to have the backend return after the first
match.

3) select a.f from a where exists (select b.f from b where b.f = a);
Need to do the select and do a subselect on _each_ of the returned values?
Again could use count(*) to help implement.

This brings up the point that perhaps the backend needs a row-counting
atomic operation and count(*) could be re-implemented using that. At the
moment count(*) is transformed to a select of OID columns and does not
quite work on table joins.

I would think that outer joins could use some of these support routines
also.

                                                       - Tom

> Two things I think I need are:
>
>         temp tables that go away at the end of a statement, so if the
> query elog's out, the temp file gets destroyed
>
>         how do I implement "not in":
>
>                 select * from a where x not in (select y from b)
>
> Using <> is not going to work because that returns multiple copies of a,
> one for every one that doesn't equal.  It is like we need not equals,
> but don't return multiple rows.
>
> Any ideas?
>
> --
> Bruce Momjian
> maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

OK, a few questions:

	Should we use sortmerge, so we can use our psort as temp tables,
or do we use hashunique?

	How do we pass the query to the optimizer?  How do we represent
the range table for each, and the links between them in correlated
subqueries?

I have to think about this.  Comments are welcome.
-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
Subject: [HACKERS] subselects (fwd)
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Forwarded message:
> OK, a few questions:
> 
> 	Should we use sortmerge, so we can use our psort as temp tables,
> or do we use hashunique?
> 
> 	How do we pass the query to the optimizer?  How do we represent
> the range table for each, and the links between them in correlated
> subqueries?
> 
> I have to think about this.  Comments are welcome.

One more thing.  I guess I am seeing subselects as a different thing
that temp tables.  I can see people wanting to put indexes on their temp
tables, so I think they will need more system catalog support.  For
subselects, I think we can just stuff them into psort, perhaps, and do
the unique as we unload them.

Seems like a natural to me.


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
	Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
Date: Tue, 23 Dec 1997 16:08:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects (fwd)
References: <199712220605.BAA17354@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Forwarded message:
> > OK, a few questions:
> >
> >       Should we use sortmerge, so we can use our psort as temp tables,
> > or do we use hashunique?
> >
> >       How do we pass the query to the optimizer?  How do we represent
> > the range table for each, and the links between them in correlated
> > subqueries?
> >
> > I have to think about this.  Comments are welcome.
> 
> One more thing.  I guess I am seeing subselects as a different thing
> that temp tables.  I can see people wanting to put indexes on their temp
> tables, so I think they will need more system catalog support.  For
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What's the difference between temp tables and temp indices ?
Both of them are handled via catalog cache...

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan  3 04:01:00 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
	for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
	Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
Date: Sat, 03 Jan 1998 16:08:51 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>,
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Subject: Re: subselects
References: <199712290516.AAA12579@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> With UNIONs done, how are things going with you on subselects?  UNIONs
> are much easier that subselects.
> 
> I am stumped on how to record the subselect query information in the
> parser and stuff.

   And I'm too. We definitely need in EXISTS node and may be in IN one.
Also, we have to support ANY and ALL modifiers of comparison operators
(it would be nice to support ANY and ALL for all operators returning
bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
and so, we could avoid IN node, but I'm not sure that I like such
assumption: postgres is OO-like system allowing operators to be overriden
and so, '=' can, in theory, mean not EQUAL but something else (someday
we could allow to specify "meaning" of operator in CREATE OPERATOR) -
in short, I would like IN node.
   Also, I would suggest nodes for ANY and ALL.
   (I need in few days to think more about recording of this stuff...)

> 
> Please let me know what I can do to help, if anything.

Thanks. As I remember, Tom also wished to work here. Tom ?

Bye,
   Vadim

P.S. I'll be "on-line" Jan 5.

From owner-pgsql-hackers@hub.org Mon Jan  5 07:30:51 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
	Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
Date: Mon, 05 Jan 1998 19:35:59 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> I was thinking about subselects, and how to attach the two queries.
> 
> What if the subquery makes a range table entry in the outer query, and
> the query is set up like the UNION queries where we put the scans in a
> row, but in the case we put them over/under each other.
> 
> And we push a temp table into the catalog cache that represents the
> result of the subquery, then we could join to it in the outer query as
> though it was a real table.
> 
> Also, can't we do the correlated subqueries by adding the proper
> target/output columns to the subquery, and have the outer query
> reference those columns in the subquery range table entry.

Yes, this is a way to handle subqueries by joining to temp table.
After getting plan we could change temp table access path to
node material. On the other hand, it could be useful to let optimizer
know about cost of temp table creation (have to think more about it)...
Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
is one example of this - joining by <> will give us invalid results.
Setting special NOT EQUAL flag is not enough: subquery plan must be
always inner one in this case. The same for handling ALL modifier.
Note, that we generaly can't use aggregates here: we can't add MAX to 
subquery in the case of > ALL (subquery), because of > ALL should return FALSE
if subquery returns NULL(s) but aggregates don't take NULLs into account.

> 
> Maybe I can write up a sample of this?  Vadim, would this help?  Is this
> the point we are stuck at?

Personally, I was stuck by holydays -:)
Now I can spend ~ 8 hours ~ each day for development...

Vadim


From owner-pgsql-hackers@hub.org Mon Jan  5 10:45:30 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
	Mon, 5 Jan 1998 10:28:48 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
Subject: Re: [HACKERS] subselect
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> Yes, this is a way to handle subqueries by joining to temp table.
> After getting plan we could change temp table access path to
> node material. On the other hand, it could be useful to let optimizer
> know about cost of temp table creation (have to think more about it)...
> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
> is one example of this - joining by <> will give us invalid results.
> Setting special NOT EQUAL flag is not enough: subquery plan must be
> always inner one in this case. The same for handling ALL modifier.
> Note, that we generaly can't use aggregates here: we can't add MAX to 
> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> if subquery returns NULL(s) but aggregates don't take NULLs into account.

OK, here are my ideas.  First, I think you have to handle subselects in
the outer node because a subquery could have its own subquery.  Also, we
now have a field in Aggreg to all us to 'usenulls'.

OK, here it is.  I recommend we pass the outer and subquery through
the parser and optimizer separately.

We parse the subquery first.  If the subquery is not correlated, it
should parse fine.  If it is correlated, any columns we find in the
subquery that are not already in the FROM list, we add the table to the
subquery FROM list, and add the referenced column to the target list of
the subquery.

When we are finished parsing the subquery, we create a catalog cache
entry for it called 'sub1' and make its fields match the target
list of the subquery.

In the outer query, we add 'sub1' to its target list, and change
the subquery reference to point to the new range table.  We also add
WHERE clauses to do any correlated joins.

Here is a simple example:

	select *
	from taba
	where col1 = (select col2
		      from tabb)

This is not correlated, and the subquery parser easily.  We create a
'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
clause.  We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.

Here is a more complex correlated subquery:

	select *
	from taba
	where col1 = (select col2
		      from tabb
		      where taba.col3 = tabb.col4)

Here we must add 'taba' to the subquery's FROM list, and add col3 to the
target list of the subquery.  After we parse the subquery, add 'sub1' to
the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
THe optimizer will do the correlation for us.

In the optimizer, we can parse the subquery first, then the outer query,
and then replace all 'sub1' references in the outer query to use the
subquery plan.

I realize making merging the two plans and doing IN and NOT IN is the
real challenge, but I hoped this would give us a start.

What do you think?

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Mon Jan  5 15:02:46 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
	Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 02:55:57 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801051528.KAA10375@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > always inner one in this case. The same for handling ALL modifier.
> > Note, that we generaly can't use aggregates here: we can't add MAX to
> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
> 
> OK, here are my ideas.  First, I think you have to handle subselects in
> the outer node because a subquery could have its own subquery.  Also, we

I hope that this is no matter: if results of subquery (with/without sub-subqueries)
will go into temp table then this table will be re-scanned for each outer tuple.

> now have a field in Aggreg to all us to 'usenulls'.
                                           ^^^^^^^^
 This can't help:

vac=> select * from x;
y
-
1
2
3
 <<< this is NULL
(4 rows)

vac=> select max(y) from x;
max
---
  3

==> we can't replace 

select * from A where A.a > ALL (select y from x);
                                 ^^^^^^^^^^^^^^^
           (NULL will be returned and so A.a > ALL is FALSE - this is what 
            Sybase does, is it right ?)
with

select * from A where A.a > (select max(y) from x);
                             ^^^^^^^^^^^^^^^^^^^^
just because of we lose knowledge about NULLs here.

Also, I would like to handle ANY and ALL modifiers for all bool
operators, either built-in or user-defined, for all data types -
isn't PostgreSQL OO-like RDBMS -:)

> OK, here it is.  I recommend we pass the outer and subquery through
> the parser and optimizer separately.

I don't like this. I would like to get parse-tree from parser for
entire query and let optimizer (on upper level) decide how to rewrite
parse-tree and what plans to produce and how these plans should be
merged. Note, that I don't object your methods below, but only where
to place handling of this. I don't understand why should we add
new part to the system which will do optimizer' work (parse-tree --> 
execution plan) and deal with optimizer nodes. Imho, upper optimizer
level is nice place to do this.

> 
> We parse the subquery first.  If the subquery is not correlated, it
> should parse fine.  If it is correlated, any columns we find in the
> subquery that are not already in the FROM list, we add the table to the
> subquery FROM list, and add the referenced column to the target list of
> the subquery.
> 
> When we are finished parsing the subquery, we create a catalog cache
> entry for it called 'sub1' and make its fields match the target
> list of the subquery.
> 
> In the outer query, we add 'sub1' to its target list, and change
> the subquery reference to point to the new range table.  We also add
> WHERE clauses to do any correlated joins.
...
> Here is a more complex correlated subquery:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
> target list of the subquery.  After we parse the subquery, add 'sub1' to
> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
> THe optimizer will do the correlation for us.
> 
> In the optimizer, we can parse the subquery first, then the outer query,
> and then replace all 'sub1' references in the outer query to use the
> subquery plan.
> 
> I realize making merging the two plans and doing IN and NOT IN is the
                   ^^^^^^^^^^^^^^^^^^^^^
This is very easy to do! As I already said we have just change sub1
access path (SeqScan of sub1) with SeqScan of Material node with 
subquery plan.

> real challenge, but I hoped this would give us a start.

Decision about how to record subquery stuff in to parse-tree
would be very good start -:)

BTW, note that for _expression_ subqueries (which are introduced without
IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
we have to check that subquery returns single tuple...

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:03 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
	Mon, 5 Jan 1998 17:16:40 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
Subject: Re: [HACKERS] subselect
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> > I am confused.  Do you want one flat query and want to pass the whole
> > thing into the optimizer?  That brings up some questions:
> 
> No. I just want to follow Tom's way: I would like to see new
> SubSelect node as shortened version of struct Query (or use
> Query structure for each subquery - no matter for me), some 
> subquery-related stuff added to Query (and SubSelect) to help
> optimizer to start, and see

OK, so you want the subquery to actually be INSIDE the outer query
expression.  Do they share a common range table?  If they don't, we
could very easily just fly through when processing the WHERE clause, and
start a new query using a new query structure for the subquery.  Believe
me, you don't want a separate SubQuery-type, just re-use Query for it. 
It allows you to call all the normal query stuff with a consistent
structure.

The parser will need to know it is in a subquery, so it can add the
proper target columns to the subquery, or are you going to do that in
the optimizer.  You can do it in the optimizer, and join the range table
references there too.

> 
> typedef struct A_Expr
> {
>     NodeTag     type;
>     int         oper;           /* type of operation
>                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>             IN, NOT IN, ANY, ALL, EXISTS here,
> 
>     char       *opname;         /* name of operator/function */
>     Node       *lexpr;          /* left argument */
>     Node       *rexpr;          /* right argument */
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>             and SubSelect (Query) here (as possible case).
> 
> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> Query - how else can we implement VIEWs on selects with subqueries ?

Views are stored as nodeout structures, and are merged into the query's
from list, target list, and where clause.  I am working out
readfunc,outfunc now to make sure they are up-to-date with all the
current fields.

> 
> BTW, is
> 
> select * from A where (select TRUE from B);
> 
> valid syntax ?

I don't think so.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Mon Jan  5 17:01:54 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
	Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:18:11 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052051.PAA29341@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > OK, here it is.  I recommend we pass the outer and subquery through
> > > the parser and optimizer separately.
> >
> > I don't like this. I would like to get parse-tree from parser for
> > entire query and let optimizer (on upper level) decide how to rewrite
> > parse-tree and what plans to produce and how these plans should be
> > merged. Note, that I don't object your methods below, but only where
> > to place handling of this. I don't understand why should we add
> > new part to the system which will do optimizer' work (parse-tree -->
> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
> > level is nice place to do this.
> 
> I am confused.  Do you want one flat query and want to pass the whole
> thing into the optimizer?  That brings up some questions:

No. I just want to follow Tom's way: I would like to see new
SubSelect node as shortened version of struct Query (or use
Query structure for each subquery - no matter for me), some 
subquery-related stuff added to Query (and SubSelect) to help
optimizer to start, and see

typedef struct A_Expr
{
    NodeTag     type;
    int         oper;           /* type of operation
                                 * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            IN, NOT IN, ANY, ALL, EXISTS here,

    char       *opname;         /* name of operator/function */
    Node       *lexpr;          /* left argument */
    Node       *rexpr;          /* right argument */
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            and SubSelect (Query) here (as possible case).

One thought to follow this way: RULEs (and so - VIEWs) are handled by using
Query - how else can we implement VIEWs on selects with subqueries ?

BTW, is

select * from A where (select TRUE from B);

valid syntax ?

Vadim

From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:57 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:48:58 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Goran Thyni <goran@bildbasen.se>
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Goran Thyni wrote:
> 
> Vadim,
> 
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
>    is one example of this - joining by <> will give us invalid results.
> 
> What is you approach towards this problem?

Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
and so, we have to have not just NOT EQUAL flag but some ALL node
with modified operator.

After that, one way is put subquery into inner plan of an join node
to be sure that for an outer tuple all corresponding subquery tuples
will be tested with modified operator (this will require either
changing code of all join nodes or addition of new plan type - we'll see)
and another way is ... suggested by you:

> I got an idea that one could reverse the order,
> that is execute the outer first into a temptable
> and delete from that according to the result of the
> subquery and then return it.
> Probably this is too raw and slow. ;-)

This will be faster in some cases (when subquery returns many results
and there are "not so many" results from outer query) - thanks for idea!

> 
>    Personally, I was stuck by holydays -:)
>    Now I can spend ~ 8 hours ~ each day for development...
> 
> Oh, isn't it christmas eve right now in Russia?

Due to historic reasons New Year is mu-u-u-uch popular
holiday in Russia -:)

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 19:32:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:48:58 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Goran Thyni <goran@bildbasen.se>
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Goran Thyni wrote:
> 
> Vadim,
> 
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
>    is one example of this - joining by <> will give us invalid results.
> 
> What is you approach towards this problem?

Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
and so, we have to have not just NOT EQUAL flag but some ALL node
with modified operator.

After that, one way is put subquery into inner plan of an join node
to be sure that for an outer tuple all corresponding subquery tuples
will be tested with modified operator (this will require either
changing code of all join nodes or addition of new plan type - we'll see)
and another way is ... suggested by you:

> I got an idea that one could reverse the order,
> that is execute the outer first into a temptable
> and delete from that according to the result of the
> subquery and then return it.
> Probably this is too raw and slow. ;-)

This will be faster in some cases (when subquery returns many results
and there are "not so many" results from outer query) - thanks for idea!

> 
>    Personally, I was stuck by holydays -:)
>    Now I can spend ~ 8 hours ~ each day for development...
> 
> Oh, isn't it christmas eve right now in Russia?

Due to historic reasons New Year is mu-u-u-uch popular
holiday in Russia -:)

Vadim


From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 06:09:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052216.RAA02675@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > I am confused.  Do you want one flat query and want to pass the whole
> > > thing into the optimizer?  That brings up some questions:
> >
> > No. I just want to follow Tom's way: I would like to see new
> > SubSelect node as shortened version of struct Query (or use
> > Query structure for each subquery - no matter for me), some
> > subquery-related stuff added to Query (and SubSelect) to help
> > optimizer to start, and see
> 
> OK, so you want the subquery to actually be INSIDE the outer query
> expression.  Do they share a common range table?  If they don't, we
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No.

> could very easily just fly through when processing the WHERE clause, and
> start a new query using a new query structure for the subquery.  Believe
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... and filling some subquery-related stuff in upper query structure -
still don't know what exactly this could be -:)

> me, you don't want a separate SubQuery-type, just re-use Query for it.
> It allows you to call all the normal query stuff with a consistent
> structure.

No objections.

> 
> The parser will need to know it is in a subquery, so it can add the
> proper target columns to the subquery, or are you going to do that in

I don't think that we need in it, but list of correlation clauses
could be good thing - all in all parser has to check all column 
references...

> the optimizer.  You can do it in the optimizer, and join the range table
> references there too.

Yes.

> > typedef struct A_Expr
> > {
> >     NodeTag     type;
> >     int         oper;           /* type of operation
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             IN, NOT IN, ANY, ALL, EXISTS here,
> >
> >     char       *opname;         /* name of operator/function */
> >     Node       *lexpr;          /* left argument */
> >     Node       *rexpr;          /* right argument */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             and SubSelect (Query) here (as possible case).
> >
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> > Query - how else can we implement VIEWs on selects with subqueries ?
> 
> Views are stored as nodeout structures, and are merged into the query's
> from list, target list, and where clause.  I am working out
> readfunc,outfunc now to make sure they are up-to-date with all the
> current fields.

Nice! This stuff was out-of-date for too long time.

> > BTW, is
> >
> > select * from A where (select TRUE from B);
> >
> > valid syntax ?
> 
> I don't think so.

And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
ANY, ALL, EXISTS - well.

(Time to sleep -:)

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:08 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 06:09:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052216.RAA02675@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> > > I am confused.  Do you want one flat query and want to pass the whole
> > > thing into the optimizer?  That brings up some questions:
> >
> > No. I just want to follow Tom's way: I would like to see new
> > SubSelect node as shortened version of struct Query (or use
> > Query structure for each subquery - no matter for me), some
> > subquery-related stuff added to Query (and SubSelect) to help
> > optimizer to start, and see
> 
> OK, so you want the subquery to actually be INSIDE the outer query
> expression.  Do they share a common range table?  If they don't, we
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No.

> could very easily just fly through when processing the WHERE clause, and
> start a new query using a new query structure for the subquery.  Believe
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... and filling some subquery-related stuff in upper query structure -
still don't know what exactly this could be -:)

> me, you don't want a separate SubQuery-type, just re-use Query for it.
> It allows you to call all the normal query stuff with a consistent
> structure.

No objections.

> 
> The parser will need to know it is in a subquery, so it can add the
> proper target columns to the subquery, or are you going to do that in

I don't think that we need in it, but list of correlation clauses
could be good thing - all in all parser has to check all column 
references...

> the optimizer.  You can do it in the optimizer, and join the range table
> references there too.

Yes.

> > typedef struct A_Expr
> > {
> >     NodeTag     type;
> >     int         oper;           /* type of operation
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             IN, NOT IN, ANY, ALL, EXISTS here,
> >
> >     char       *opname;         /* name of operator/function */
> >     Node       *lexpr;          /* left argument */
> >     Node       *rexpr;          /* right argument */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             and SubSelect (Query) here (as possible case).
> >
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> > Query - how else can we implement VIEWs on selects with subqueries ?
> 
> Views are stored as nodeout structures, and are merged into the query's
> from list, target list, and where clause.  I am working out
> readfunc,outfunc now to make sure they are up-to-date with all the
> current fields.

Nice! This stuff was out-of-date for too long time.

> > BTW, is
> >
> > select * from A where (select TRUE from B);
> >
> > valid syntax ?
> 
> I don't think so.

And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
ANY, ALL, EXISTS - well.

(Time to sleep -:)

Vadim


From owner-pgsql-hackers@hub.org Thu Jan  8 23:10:50 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
	for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
	Thu, 8 Jan 1998 22:55:03 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
Cc: hackers@postgreSQL.org (PostgreSQL-development)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Vadim, I know you are still thinking about subselects, but I have some
more clarification that may help.

We have to add phantom range table entries to correlated subselects so
they will pass the parser.  We might as well add those fields to the
target list of the subquery at the same time:

	select *
	from taba
	where col1 = (select col2
		      from tabb
		      where taba.col3 = tabb.col4)

becomes:

	select *
	from taba
	where col1 = (select col2, tabb.col4 <---
		      from tabb, taba  <---
		      where taba.col3 = tabb.col4)

We add a field to TargetEntry and RangeTblEntry to mark the fact that it
was entered as a correlation entry:

	bool	isCorrelated;

Second, we need to hook the subselect to the main query.  I recommend we
add two fields to Query for this:

	Query *parentQuery;
	List *subqueries;

The parentQuery pointer is used to resolve field names in the correlated
subquery.

	select *
	from taba
	where col1 = (select col2, tabb.col4 <---
		      from tabb, taba  <---
		      where taba.col3 = tabb.col4)

In the query above, the subquery can be easily parsed, and we add the
subquery to the parsent's parentQuery list.

In the parent query, to parse the WHERE clause, we create a new operator
type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
right side is an index to a slot in the subqueries List.

We can then do the rest in the upper optimizer.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Fri Jan  9 10:01:01 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
	Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
Date: Fri, 09 Jan 1998 22:10:06 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgresql.org>
Subject: Re: subselects
References: <199801090355.WAA09243@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Vadim, I know you are still thinking about subselects, but I have some
> more clarification that may help.
> 
> We have to add phantom range table entries to correlated subselects so
> they will pass the parser.  We might as well add those fields to the
> target list of the subquery at the same time:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> becomes:
> 
>         select *
>         from taba
>         where col1 = (select col2, tabb.col4 <---
>                       from tabb, taba  <---
>                       where taba.col3 = tabb.col4)
> 
> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> was entered as a correlation entry:
> 
>         bool    isCorrelated;

No, I don't like to add anything in parser. Example:

        select *
        from tabA
        where col1 = (select col2
                      from tabB
                      where tabA.col3 = tabB.col4
                      and exists (select * 
                                  from tabC 
                                  where tabB.colX = tabC.colX and
                                        tabC.colY = tabA.col2)
                     )

: a column of tabA is referenced in sub-subselect 
(is it allowable by standards ?) - in this case it's better 
to don't add tabA to 1st subselect but add tabA to second one
and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
this gives us 2-tables join in 1st subquery instead of 3-tables join.
(And I'm still not sure that using temp tables is best of what can be 
done in all cases...)

Instead of using isCorrelated in TE & RTE we can add 

Index varlevel;

to Var node to reflect (sub)query from where this Var is come
(where is range table to find var's relation using varno). Upmost query
will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
                        ^^^                          ^^^^^^^^^^^^
(I don't see problems with distinguishing Vars of different children
on the same level...)

> 
> Second, we need to hook the subselect to the main query.  I recommend we
> add two fields to Query for this:
> 
>         Query *parentQuery;
>         List *subqueries;

Agreed. And maybe Index queryLevel.

> In the parent query, to parse the WHERE clause, we create a new operator
> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
                                               ^^^^^^^^^^^^^^^^^^
No. We have to handle (a,b,c) OP (select x, y, z ...) and 
'_a_constant_' OP (select ...) - I don't know is last in standards,
Sybase has this.

Well,

typedef enum OpType
{
    OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR

+ OP_EXISTS, OP_ALL, OP_ANY

} OpType;

typedef struct Expr
{
    NodeTag     type;
    Oid         typeOid;        /* oid of the type of this expr */
    OpType      opType;         /* type of the op */
    Node       *oper;           /* could be Oper or Func */
    List       *args;           /* list of argument nodes */
} Expr;

OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
           List, following your suggestion)

OP_ALL, OP_ANY:

oper is List of Oper nodes. We need in list because of data types of
a, b, c (above) can be different and so Oper nodes will be different too.

lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
left side of subquery' operator.
lsecond(args) is SubSelect.

Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
by parser into corresponding ANY and ALL. At the moment we can do:

IN --> = ANY, NOT IN --> <> ALL

but this will be "known bug": this breaks OO-nature of Postgres, because of
operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
Example: box data type. For boxes, = means equality of _areas_ and =~
means that boxes are the same ==> =~ ANY should be used for IN.

> right side is an index to a slot in the subqueries List.

Vadim

From owner-pgsql-hackers@hub.org Fri Jan  9 17:44:04 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
	Fri, 9 Jan 1998 17:31:41 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
Subject: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> > 
> > Vadim, I know you are still thinking about subselects, but I have some
> > more clarification that may help.
> > 
> > We have to add phantom range table entries to correlated subselects so
> > they will pass the parser.  We might as well add those fields to the
> > target list of the subquery at the same time:
> > 
> >         select *
> >         from taba
> >         where col1 = (select col2
> >                       from tabb
> >                       where taba.col3 = tabb.col4)
> > 
> > becomes:
> > 
> >         select *
> >         from taba
> >         where col1 = (select col2, tabb.col4 <---
> >                       from tabb, taba  <---
> >                       where taba.col3 = tabb.col4)
> > 
> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> > was entered as a correlation entry:
> > 
> >         bool    isCorrelated;
> 
> No, I don't like to add anything in parser. Example:
> 
>         select *
>         from tabA
>         where col1 = (select col2
>                       from tabB
>                       where tabA.col3 = tabB.col4
>                       and exists (select * 
>                                   from tabC 
>                                   where tabB.colX = tabC.colX and
>                                         tabC.colY = tabA.col2)
>                      )
> 
> : a column of tabA is referenced in sub-subselect 

This is a strange case that I don't think we need to handle in our first
implementation.

> (is it allowable by standards ?) - in this case it's better 
> to don't add tabA to 1st subselect but add tabA to second one
> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> this gives us 2-tables join in 1st subquery instead of 3-tables join.
> (And I'm still not sure that using temp tables is best of what can be 
> done in all cases...)

I don't see any use for temp tables in subselects anymore.  After having
implemented UNIONS, I now see how much can be done in the upper
optimizer.  I see you just putting the subquery PLAN into the proper
place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

> 
> Instead of using isCorrelated in TE & RTE we can add 
> 
> Index varlevel;

OK.  Sounds good.

> 
> to Var node to reflect (sub)query from where this Var is come
> (where is range table to find var's relation using varno). Upmost query
> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
>                         ^^^                          ^^^^^^^^^^^^
> (I don't see problems with distinguishing Vars of different children
> on the same level...)
> 
> > 
> > Second, we need to hook the subselect to the main query.  I recommend we
> > add two fields to Query for this:
> > 
> >         Query *parentQuery;
> >         List *subqueries;
> 
> Agreed. And maybe Index queryLevel.

Sure.  If it helps.

> 
> > In the parent query, to parse the WHERE clause, we create a new operator
> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
>                                                ^^^^^^^^^^^^^^^^^^
> No. We have to handle (a,b,c) OP (select x, y, z ...) and 
> '_a_constant_' OP (select ...) - I don't know is last in standards,
> Sybase has this.

I have never seen this in my eight years of SQL.  Perhaps we can leave
this for later, maybe much later.

> 
> Well,
> 
> typedef enum OpType
> {
>     OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
> 
> + OP_EXISTS, OP_ALL, OP_ANY
> 
> } OpType;
> 
> typedef struct Expr
> {
>     NodeTag     type;
>     Oid         typeOid;        /* oid of the type of this expr */
>     OpType      opType;         /* type of the op */
>     Node       *oper;           /* could be Oper or Func */
>     List       *args;           /* list of argument nodes */
> } Expr;
> 
> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
>            List, following your suggestion)
> 
> OP_ALL, OP_ANY:
> 
> oper is List of Oper nodes. We need in list because of data types of
> a, b, c (above) can be different and so Oper nodes will be different too.
> 
> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
> left side of subquery' operator.
> lsecond(args) is SubSelect.
> 
> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> by parser into corresponding ANY and ALL. At the moment we can do:
> 
> IN --> = ANY, NOT IN --> <> ALL
> 
> but this will be "known bug": this breaks OO-nature of Postgres, because of
> operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> Example: box data type. For boxes, = means equality of _areas_ and =~
> means that boxes are the same ==> =~ ANY should be used for IN.

That is interesting, to use =~ for ANY.

Yes, but how many operators take a SUBQUERY as an operand.  This is a
special case to me.

I think I see where you are trying to go.  You want subselects to behave
like any other operator, with a subselect type, and you do all the
subselect handling in the optimizer, with special Nodes and actions.

I think this may be just too much of a leap.  We have such clean query
logic for single queries, I can't imagine having an operator that has a
Query operand, and trying to get everything to properly handle it. 
UNIONS were very easy to implement as a List off of Query, with some
foreach()'s in rewrite and the high optimizer.

Subselects are SQL standard, and are never going to be over-ridden by a
user.  Same with UNION.  They want UNION, they get UNION.  They want
Subselect, we are going to spin through the Query structure and give
them what they want.

The complexities of subselects and correlated queries and range tables
and stuff is so bizarre that trying to get it to work inside the type
system could be a huge project.

> 
> > right side is an index to a slot in the subqueries List.

I guess the question is what can we have by February 1?

I have been reading some postings, and it seems to me that subselects
are the litmus test for many evaluators when deciding if a database
engine is full-featured.

Sorry to be so straightforward, but I want to keep hashing this around
until we get a conclusion, so coding can start.

My suggestions have been, I believe, trying to get subselects working
with the fullest functionality by adding the least amount of code, and
keeping the logic clean.

Have you checked out the UNION code?  It is very small, but it works.  I
think it could make a good sample for subselects.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
	Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:19:08 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Subject: Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > No, I don't like to add anything in parser. Example:
> >
> >         select *
> >         from tabA
> >         where col1 = (select col2
> >                       from tabB
> >                       where tabA.col3 = tabB.col4
> >                       and exists (select *
> >                                   from tabC
> >                                   where tabB.colX = tabC.colX and
> >                                         tabC.colY = tabA.col2)
> >                      )
> >
> > : a column of tabA is referenced in sub-subselect
> 
> This is a strange case that I don't think we need to handle in our first
> implementation.

I don't know is this strange case or not :)
But I would like to know is this allowed by standards - can someone
comment on this ?
And I don't see problems with handling this...

> 
> > (is it allowable by standards ?) - in this case it's better
> > to don't add tabA to 1st subselect but add tabA to second one
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > (And I'm still not sure that using temp tables is best of what can be
> > done in all cases...)
> 
> I don't see any use for temp tables in subselects anymore.  After having
> implemented UNIONS, I now see how much can be done in the upper
> optimizer.  I see you just putting the subquery PLAN into the proper
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

When saying about temp tables, I meant tables created by node Material
for subquery plan. This is one of two ways - run subquery once for all
possible upper plan tuples and then just join result table with upper
query. Another way is re-run subquery for each upper query tuple,
without temp table but may be with caching results by some ways.
Actually, there is special case - when subquery can be alternatively 
formulated as joins, - but this is just special case.

> > > In the parent query, to parse the WHERE clause, we create a new operator
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> >                                                ^^^^^^^^^^^^^^^^^^
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > Sybase has this.
> 
> I have never seen this in my eight years of SQL.  Perhaps we can leave
> this for later, maybe much later.

Are you saying about (a, b, c) or about 'a_constant' ?
Again, can someone comment on are they in standards or not ?
Tom ?
If yes then please add parser' support for them now...

> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > by parser into corresponding ANY and ALL. At the moment we can do:
> >
> > IN --> = ANY, NOT IN --> <> ALL
> >
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> > Example: box data type. For boxes, = means equality of _areas_ and =~
> > means that boxes are the same ==> =~ ANY should be used for IN.
> 
> That is interesting, to use =~ for ANY.
> 
> Yes, but how many operators take a SUBQUERY as an operand.  This is a
> special case to me.
> 
> I think I see where you are trying to go.  You want subselects to behave
> like any other operator, with a subselect type, and you do all the
> subselect handling in the optimizer, with special Nodes and actions.
> 
> I think this may be just too much of a leap.  We have such clean query
> logic for single queries, I can't imagine having an operator that has a
> Query operand, and trying to get everything to properly handle it.
> UNIONS were very easy to implement as a List off of Query, with some
> foreach()'s in rewrite and the high optimizer.
> 
> Subselects are SQL standard, and are never going to be over-ridden by a
> user.  Same with UNION.  They want UNION, they get UNION.  They want
> Subselect, we are going to spin through the Query structure and give
> them what they want.
> 
> The complexities of subselects and correlated queries and range tables
> and stuff is so bizarre that trying to get it to work inside the type
> system could be a huge project.

PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
derived from the Berkeley Postgres database management system. While
PostgreSQL retains the powerful object-relational data model, rich data types and
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
easy extensibility of Postgres, it replaces the PostQuel query language with an
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extended subset of SQL.
^^^^^^^^^^^^^^^^^^^^^^

Should we say users that subselect will work for standard data types only ?
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
Is there difference between handling = ANY and ~ ANY ? I don't see any.
Currently we can't get IN working properly for boxes (and may be for others too)
and I don't like to try to resolve these problems now, but hope that someday
we'll be able to do this. At the moment - just convert IN into = ANY and
NOT IN into <> ALL in parser.

(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

> >
> > > right side is an index to a slot in the subqueries List.
> 
> I guess the question is what can we have by February 1?
> 
> I have been reading some postings, and it seems to me that subselects
> are the litmus test for many evaluators when deciding if a database
> engine is full-featured.
> 
> Sorry to be so straightforward, but I want to keep hashing this around
> until we get a conclusion, so coding can start.
> 
> My suggestions have been, I believe, trying to get subselects working
> with the fullest functionality by adding the least amount of code, and
> keeping the logic clean.
> 
> Have you checked out the UNION code?  It is very small, but it works.  I
> think it could make a good sample for subselects.

There is big difference between subqueries and queries in UNION - 
there are not dependences between UNION queries.

Ok, opened issues:

1. Is using upper query' vars in all subquery levels in standard ?
2. Is (a, b, c) OP (subselect) in standard ?
3. What types of expressions (Var, Const, ...) are allowed on the left
   side of operator with subquery on the right ?
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
   (My vote for all boolean operators).

And - did we get consensus on presentation subqueries stuff in Query,
Expr and Var ?
I would like to have something done in parser near Jan 17 to get
subqueries working by Feb 1. I vote for support of all standard
things (1. - 3.) in parser right now - if there will be no time
to implement something like (a, b, c) then optimizer will call
elog(WARN) (oh, sorry, - elog(ERROR)).

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
	Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:41:19 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199712220545.AAA11605@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> OK, a few questions:
> 
>         Should we use sortmerge, so we can use our psort as temp tables,
> or do we use hashunique?
> 
>         How do we pass the query to the optimizer?  How do we represent
> the range table for each, and the links between them in correlated
> subqueries?

My suggestion is just use varlevel in Var and don't put upper query'
relations into subquery range table.

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
	Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:58:52 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Vadim B. Mikheev wrote:
> 
> Bruce Momjian wrote:
> >
> > OK, a few questions:
> >
> >         Should we use sortmerge, so we can use our psort as temp tables,
> > or do we use hashunique?
> >
> >         How do we pass the query to the optimizer?  How do we represent
> > the range table for each, and the links between them in correlated
> > subqueries?
> 
> My suggestion is just use varlevel in Var and don't put upper query'
> relations into subquery range table.

Hmm... Sorry, it seems that I did reply to very old message - forget it.

Vadim

From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
	Sat, 10 Jan 1998 18:01:03 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
Date: Sat, 10 Jan 1998 18:01:03 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > > by parser into corresponding ANY and ALL. At the moment we can do:
> > >
> > > IN --> = ANY, NOT IN --> <> ALL
> > >
> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> > > Example: box data type. For boxes, = means equality of _areas_ and =~
> > > means that boxes are the same ==> =~ ANY should be used for IN.
> >
> > That is interesting, to use =~ for ANY.

If I understand the discussion, I would think is is fine to make an assumption about
which operator is used to implement a subselect expression. If someone remaps an
operator to mean something different, then they will get a different result (or a
nonsensical one) from a subselect.

I'd be happy to remap existing operators to fit into a convention which would work
with subselects (especially if I got to help choose :).

> > Subselects are SQL standard, and are never going to be over-ridden by a
> > user.  Same with UNION.  They want UNION, they get UNION.  They want
> > Subselect, we are going to spin through the Query structure and give
> > them what they want.
>
> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
> derived from the Berkeley Postgres database management system. While
> PostgreSQL retains the powerful object-relational data model, rich data types and
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> easy extensibility of Postgres, it replaces the PostQuel query language with an
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> extended subset of SQL.
> ^^^^^^^^^^^^^^^^^^^^^^
>
> Should we say users that subselect will work for standard data types only ?
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
> Currently we can't get IN working properly for boxes (and may be for others too)
> and I don't like to try to resolve these problems now, but hope that someday
> we'll be able to do this. At the moment - just convert IN into = ANY and
> NOT IN into <> ALL in parser.
>
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
list? That would give more consistant behavior...

> > I have been reading some postings, and it seems to me that subselects
> > are the litmus test for many evaluators when deciding if a database
> > engine is full-featured.
> >
> > Sorry to be so straightforward, but I want to keep hashing this around
> > until we get a conclusion, so coding can start.
> >
> > My suggestions have been, I believe, trying to get subselects working
> > with the fullest functionality by adding the least amount of code, and
> > keeping the logic clean.
> >
> > Have you checked out the UNION code?  It is very small, but it works.  I
> > think it could make a good sample for subselects.
>
> There is big difference between subqueries and queries in UNION -
> there are not dependences between UNION queries.
>
> Ok, opened issues:
>
> 1. Is using upper query' vars in all subquery levels in standard ?

I'm not certain. Let me know if you do not get an answer from someone else and I will
research it.

> 2. Is (a, b, c) OP (subselect) in standard ?

Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
the parens are allowed to be omitted from a one element list.

> 3. What types of expressions (Var, Const, ...) are allowed on the left
>    side of operator with subquery on the right ?

I think most expressions are allowed. The "constant OP (subselect)" case you were
asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
a and b are column references should be allowed. Of course, our optimizer could
perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
example "EXISTS (subselect where x = constant)".

> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
>    (My vote for all boolean operators).

Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
important to get an initial implementation for v6.3 which covers a little, some, or
all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
we will have the benefit of feedback from others in practical applications which
always uncovers new things to consider.

> And - did we get consensus on presentation subqueries stuff in Query,
> Expr and Var ?
> I would like to have something done in parser near Jan 17 to get
> subqueries working by Feb 1. I vote for support of all standard
> things (1. - 3.) in parser right now - if there will be no time
> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
> sorry, - elog(ERROR)).

Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
does the right thing with expression comparisions but just parses then ignores
subselect expressions. Let me know what structures you want passed back and I'll put
them in, or if you prefer put in the first one and I'll go through and clean up and
add the rest.

                                                  - Tom


From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
	Sat, 10 Jan 1998 19:31:30 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
Date: Sat, 10 Jan 1998 19:31:29 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> Are you saying about (a, b, c) or about 'a_constant' ?
> Again, can someone comment on are they in standards or not ?
> Tom ?
> If yes then please add parser' support for them now...

As I mentioned a few minutes ago in my last message, I parse the row descriptors and
the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
ignore the result. I didn't want to pass things back as lists until something in the
backend was ready to receive them.

If it is OK, I'll go ahead and start passing back a list of expressions when a row
descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
being a list rather than an atomic node.

Also, I can start passing back the subselect expression as the rexpr; right now the
parser calls elog() and quits.

btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
If lists are handled farther back, this routine should move to there also and the
parser will just pass the lists. Note that some assumptions have to be made about the
meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
to disallow those cases or to look for specific appearance of the operator to guess
the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
it has "<>" or "!" then build as "or"s.

Let me know what you want...

                                                       - Tom


From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
	Sun, 11 Jan 1998 05:58:01 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
Date: Sun, 11 Jan 1998 05:58:01 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
Status: OR

This is a multi-part message in MIME format.
--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
These start sending lists of arguments toward the backend from the parser to
implement row descriptors and subselects.

They should apply OK even over Bruce's recent changes...

                                             - Tom

--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="gram.y.patch"

*** ../src/backend/parser/gram.y.orig	Sat Jan 10 05:44:36 1998
--- ../src/backend/parser/gram.y	Sat Jan 10 19:29:37 1998
***************
*** 195,200 ****
--- 195,201 ----
  				having_clause
  %type <list>	row_descriptor, row_list
  %type <node>	row_expr
+ %type <str>		RowOp, row_opt
  %type <list>	OptCreateAs, CreateAsList
  %type <node>	CreateAsElement
  %type <value>	NumConst
***************
*** 242,248 ****
   */
  
  /* Keywords (in SQL92 reserved words) */
! %token	ACTION, ADD, ALL, ALTER, AND, AS, ASC,
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
--- 243,249 ----
   */
  
  /* Keywords (in SQL92 reserved words) */
! %token	ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
***************
*** 258,264 ****
  		ON, OPTION, OR, ORDER, OUTER_P,
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
! 		SECOND_P, SELECT, SET, SUBSTRING,
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
  		UNION, UNIQUE, UPDATE, USING,
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
--- 259,265 ----
  		ON, OPTION, OR, ORDER, OUTER_P,
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
! 		SECOND_P, SELECT, SET, SOME, SUBSTRING,
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
  		UNION, UNIQUE, UPDATE, USING,
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
***************
*** 2853,2866 ****
  /* Expressions using row descriptors
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
   *  with singleton expressions.
   */
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
  				{
! 					$$ = NULL;
  				}
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
  				{
! 					$$ = NULL;
  				}
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
  				{
--- 2854,2878 ----
  /* Expressions using row descriptors
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
   *  with singleton expressions.
+  *
+  * Note that "SOME" is the same as "ANY" in syntax.
+  * - thomas 1998-01-10
   */
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
  				{
! 					$$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
  				}
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
  				{
! 					$$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
! 				}
! 		| '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
! 				{
! 					char *opr;
! 					opr = palloc(strlen($4)+strlen($5)+1);
! 					strcpy(opr, $4);
! 					strcat(opr, $5);
! 					$$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
  				}
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
  				{
***************
*** 2880,2885 ****
--- 2892,2907 ----
  				}
  		;
  
+ RowOp:  '='						{ $$ = "="; }
+ 		| '<'					{ $$ = "<"; }
+ 		| '>'					{ $$ = ">"; }
+ 		;
+ 
+ row_opt:  ALL					{ $$ = "all"; }
+ 		| ANY					{ $$ = "any"; }
+ 		| SOME					{ $$ = "any"; }
+ 		;
+ 
  row_descriptor:  row_list ',' a_expr
  				{
  					$$ = lappend($1, $3);
***************
*** 3432,3441 ****
  		;
  
  in_expr:  SubSelect
! 				{
! 					elog(ERROR,"IN (SUBSELECT) not yet implemented");
! 					$$ = $1;
! 				}
  		| in_expr_nodes
  				{	$$ = $1; }
  		;
--- 3454,3460 ----
  		;
  
  in_expr:  SubSelect
! 				{	$$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
  		| in_expr_nodes
  				{	$$ = $1; }
  		;
***************
*** 3449,3458 ****
  		;
  
  not_in_expr:  SubSelect
! 				{
! 					elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
! 					$$ = $1;
! 				}
  		| not_in_expr_nodes
  				{	$$ = $1; }
  		;
--- 3468,3474 ----
  		;
  
  not_in_expr:  SubSelect
! 				{	$$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
  		| not_in_expr_nodes
  				{	$$ = $1; }
  		;

--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="keywords.c.patch"

*** ../src/backend/parser/keywords.c.orig	Mon Jan  5 07:51:33 1998
--- ../src/backend/parser/keywords.c	Sat Jan 10 19:22:07 1998
***************
*** 39,44 ****
--- 39,45 ----
  	{"alter", ALTER},
  	{"analyze", ANALYZE},
  	{"and", AND},
+ 	{"any", ANY},
  	{"append", APPEND},
  	{"archive", ARCHIVE},
  	{"as", AS},
***************
*** 178,183 ****
--- 179,185 ----
  	{"set", SET},
  	{"setof", SETOF},
  	{"show", SHOW},
+ 	{"some", SOME},
  	{"stdin", STDIN},
  	{"stdout", STDOUT},
  	{"substring", SUBSTRING},

--------------D8B38A0D1F78A10C0023F702--


From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
	Sun, 11 Jan 1998 00:59:23 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
Subject: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> I would like to have something done in parser near Jan 17 to get
> subqueries working by Feb 1. I vote for support of all standard
> things (1. - 3.) in parser right now - if there will be no time
> to implement something like (a, b, c) then optimizer will call
> elog(WARN) (oh, sorry, - elog(ERROR)).

First, let me say I am glad we are still on schedule for Feb 1.  I was
panicking because I thought we wouldn't make it in time.


> > > (is it allowable by standards ?) - in this case it's better
> > > to don't add tabA to 1st subselect but add tabA to second one
> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > > (And I'm still not sure that using temp tables is best of what can be
> > > done in all cases...)
> > 
> > I don't see any use for temp tables in subselects anymore.  After having
> > implemented UNIONS, I now see how much can be done in the upper
> > optimizer.  I see you just putting the subquery PLAN into the proper
> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
> 
> When saying about temp tables, I meant tables created by node Material
> for subquery plan. This is one of two ways - run subquery once for all
> possible upper plan tuples and then just join result table with upper
> query. Another way is re-run subquery for each upper query tuple,
> without temp table but may be with caching results by some ways.
> Actually, there is special case - when subquery can be alternatively 
> formulated as joins, - but this is just special case.

This is interesting.  It really only applies for correlated subqueries,
and certainly it may help sometimes to just evaluate the subquery for
valid values that are going to come from the upper query than for all
possible values.  Perhaps we can use the 'cost' value of each query to
decide how to handle this.

> 
> > > > In the parent query, to parse the WHERE clause, we create a new operator
> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> > >                                                ^^^^^^^^^^^^^^^^^^
> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > > Sybase has this.
> > 
> > I have never seen this in my eight years of SQL.  Perhaps we can leave
> > this for later, maybe much later.
> 
> Are you saying about (a, b, c) or about 'a_constant' ?
> Again, can someone comment on are they in standards or not ?
> Tom ?
> If yes then please add parser' support for them now...

OK, Thomas says it is, so we will put in as much code as we can to handle
it.

> Should we say users that subselect will work for standard data types only ?
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
> Currently we can't get IN working properly for boxes (and may be for others too)
> and I don't like to try to resolve these problems now, but hope that someday
> we'll be able to do this. At the moment - just convert IN into = ANY and
> NOT IN into <> ALL in parser.

OK.

> 
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

I did not know that either.

> There is big difference between subqueries and queries in UNION - 
> there are not dependences between UNION queries.

Yes, I know UNIONS are trivial compared to subselects.

> 
> Ok, opened issues:
> 
> 1. Is using upper query' vars in all subquery levels in standard ?
> 2. Is (a, b, c) OP (subselect) in standard ?
> 3. What types of expressions (Var, Const, ...) are allowed on the left
>    side of operator with subquery on the right ?
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
>    (My vote for all boolean operators).
> 
> And - did we get consensus on presentation subqueries stuff in Query,
> Expr and Var ?

OK, here are my concrete ideas on changes and structures.

I think we all agreed that Query needs new fields:

        Query *parentQuery;
        List *subqueries;

Maybe query level too, but I don't think so (see later ideas on Var).

We need a new Node structure, call it Sublink:

	int 	linkType	(IN, NOTIN, ANY, EXISTS, OPERATOR...)
	Oid	operator	/* subquery must return single row */
	List	*lefthand;	/* parent stuff */
	Node 	*subquery;	/* represents nodes from parser */
	Index	Subindex;	/* filled in to index Query->subqueries */

Of course, the names are just suggestions.  Every time we run through
the parsenodes of a query to create a Query* structure, when we do the
WHERE clause, if we come upon one of these Sublink nodes (created in the
parser), we move the supplied Query* in Sublink->subquery to a local
List variable, and we set Subquery->subindex to equal the index of the
new query, i.e. is it the first subquery we found, 1, or the second, 2,
etc.

After we have created the parent Query structure, we run through our
local List variable of subquery parsenodes we created above, and add
Query* entries to Query->subqueries.  In each subquery Query*, we set
the parentQuery pointer.

Also, when parsing the subqueries, we need to keep track of correlated
references.  I recommend we add a field to the Var structure:

	Index	sublevel;	/* range table reference:
				   = 0  current level of query
				   < 0  parent above this many levels
				   > 0  index into subquery list
				 */

This way, a Var node with sublevel 0 is the current level, and is true
in most cases.  This helps us not have to change much code.  sublevel =
-1 means it references the range table in the parent query. sublevel =
-2 means the parent's parent. sublevel = 2 means it references the range
table of the second entry in Query->subqueries.  Varno and varattno are
still meaningful.  Of course, we can't reference variables in the
subqueries from the parent in the parser code, but Vadim may want to.

When doing a Var lookup in the parser, we look in the current level
first, but if not found, if it is a subquery, we can look at the parent
and parent's parent to set the sublevel, varno, and varatno properly.

We create no phantom range table entries in the subquery, and no phantom
target list entries.   We can leave that all for the upper optimizer.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Tue Dec  9 12:14:09 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA16186
	for <maillist@candle.pha.pa.us>; Tue, 9 Dec 1997 12:14:05 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id MAA17524; Tue, 9 Dec 1997 12:05:31 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 09 Dec 1997 12:05:01 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id MAA17316 for pgsql-hackers-outgoing; Tue, 9 Dec 1997 12:04:55 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id MAA17304 for <hackers@postgresql.org>; Tue, 9 Dec 1997 12:04:40 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id MAA15973;
	Tue, 9 Dec 1997 12:05:03 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712091705.MAA15973@candle.pha.pa.us>
Subject: Re: [HACKERS] Items for 6.3
To: lockhart@alumni.caltech.edu (Thomas G. Lockhart)
Date: Tue, 9 Dec 1997 12:05:03 -0500 (EST)
Cc: hackers@postgreSQL.org, vadim@sable.krasnoyarsk.su
In-Reply-To: <348CE8BE.FE0F8AA1@alumni.caltech.edu> from "Thomas G. Lockhart" at Dec 9, 97 06:44:14 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> 
> > Here are the items I think would make 6.3 a truly great release:
> >
> >         subselects
> >         outer joins
> 
> These two would be sufficient (along with the changes already in the
> tree) to address the most visible deficiencies in SQL functionality.
> 
> >         temp tables
> >         fix "Reliability" items attached to specific queries
> 
> Sure, why not?

We will need temp tables for subselects anyway.

I could implement them, but again we come up against the problem of
storing these plans and executing them later.  We need to do some of the
temp table stuff in the optimizer because the plan could be passed with
a temp table, and we can't bind the temp name to a real name in the
parser, especially if we save those plans in system tables that other
backends can execute.  Multiple backends would be using the same temp
name.

At the same time, we need some temp stuff in the parser so the parser
can recognize the temp table and its fields when it sees it.

The hardest part is:

select * into tmp mytmp from z where x=y;
select * from mytmp;

If they are passed together, and we have to plan them both, before
either is executed, you have to make the parser aware of the fields in
mytmp, even though you have not executed the select yet, you are just
storing the plan.

This was Vadim's point about not doing subselects in the parser.

> 
> >         postmaster sync's pglog, giving almost fsync reliability with
> >                 no-fsync performance
> 
> OK to save for v6.4.
> 
> Could we try to do the subselect/join/union features for 6.3? I know you
> have been looking at it, and found the deepest parts of the backend to
> be a bit murky. I'm not familiar with that area at all, but perhaps we
> could divert Vadim for a week or two or three when he has some time.
> Especially if we trade him for help on his favorite topics for v6.4??
> 

Sure.  I may be able to do some of the pglog change myself, though Vadim
has some definite ideas on this.

As for Vadim, trading help is a good idea, but what trade can we make? 
He can do most of these tough things without us, and in 1/4 the time. 
We can't even see where to start them.

Basically, without Vadim, this project would have really major problems.

He certainly likes working on PostgreSQL, so he must be busy with other
things.

It is not fair to keep counting on Vadim to do all these tough jobs.  We
really need to get other people up to Vadim's level of ability. 
Unfortunately, the odds of this happening are very slim.

This leaves me scratching my head.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Fri Dec 19 00:08:21 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25029
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:08:13 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA11825;
	Fri, 19 Dec 1997 12:13:15 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349A0265.7329D4EE@sable.krasnoyarsk.su>
Date: Fri, 19 Dec 1997 12:13:09 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgresql.org>
Subject: Re: [HACKERS] Items for 6.3
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Thomas G. Lockhart wrote:
> 
> Could we try to do the subselect/join/union features for 6.3? I know you
> have been looking at it, and found the deepest parts of the backend to
> be a bit murky. I'm not familiar with that area at all, but perhaps we
> could divert Vadim for a week or two or three when he has some time.
                                          ^^^^^
More realistic... And this is for initial release only: tuning performance
of subselects is very hard, long work.

Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
may appear in 6.4 only. And I'll need in help: could someone add support
for them in parser ? Not handling - but parsing and common checking.
Also, it would be nice to have better temp tables implementation 
(without affecting pg_class etc) - node material need in query-level 
temp tables anyway. I'd really like to see temp table files created
only when its data must go to disk due to local buffer pool is full
and can't more keep table data in memory. Also, local buffer manager
should be re-written to use hash table (like shared bufmgr) for buffer search,
not sequential scan as now (this is item for TODO) - this will speed up
things and allow to use more than 64 local buffers.

I'm still sure that handling subselects in parser is not right way.
And the main problem is not in execution plans (we could use tricks
to resolve this) but in performance. Example:

select b from big where b in (select s from small);

If there is no duplicates in small then this is the same as

select b from big, small where b = s;

Without index on big postgres does seq scan of big and uses hashjoin with
hash on small. Using temp table makes query only 20% slower (in my test). 
But with index on big postgres uses nestloop with seq scan of small and 
index scan of big => select run faster and temp table stuff makes query 
2.5 times slower! In the case of duplicates in small, handling in parser 
will use distinct (and so - sorting). But using hashjoin plan distinct 
may be avoided! Who can analize this ? Optimizer only. He can be smart 
to check is there unique index on small or not. If not - what is more 
costless: nestloop with sorting or slower hashjoin without sorting. 
Only optimizer can find best way to execute query, parser can't.

> Especially if we trade him for help on his favorite topics for v6.4??

Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)

Vadim

From owner-pgsql-hackers@hub.org Fri Dec 19 00:58:54 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA25460
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 00:58:52 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA27667; Fri, 19 Dec 1997 00:54:39 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:54:09 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA27633 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:54:04 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA27623 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:53:53 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA25415;
	Fri, 19 Dec 1997 00:53:15 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712190553.AAA25415@candle.pha.pa.us>
Subject: Re: [HACKERS] Items for 6.3
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Fri, 19 Dec 1997 00:53:15 -0500 (EST)
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Thomas G. Lockhart wrote:
> > 
> > Could we try to do the subselect/join/union features for 6.3? I know you
> > have been looking at it, and found the deepest parts of the backend to
> > be a bit murky. I'm not familiar with that area at all, but perhaps we
> > could divert Vadim for a week or two or three when he has some time.
>                                           ^^^^^
> More realistic... And this is for initial release only: tuning performance
> of subselects is very hard, long work.
> 
> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys

Great.

> may appear in 6.4 only. And I'll need in help: could someone add support
> for them in parser ? Not handling - but parsing and common checking.
> Also, it would be nice to have better temp tables implementation 
> (without affecting pg_class etc) - node material need in query-level 
> temp tables anyway. I'd really like to see temp table files created
> only when its data must go to disk due to local buffer pool is full
> and can't more keep table data in memory. Also, local buffer manager
> should be re-written to use hash table (like shared bufmgr) for buffer search,
> not sequential scan as now (this is item for TODO) - this will speed up
> things and allow to use more than 64 local buffers.
> 
> I'm still sure that handling subselects in parser is not right way.
> And the main problem is not in execution plans (we could use tricks
> to resolve this) but in performance. Example:
> 
> select b from big where b in (select s from small);
> 
> If there is no duplicates in small then this is the same as
> 
> select b from big, small where b = s;
> 
> Without index on big postgres does seq scan of big and uses hashjoin with
> hash on small. Using temp table makes query only 20% slower (in my test). 
> But with index on big postgres uses nestloop with seq scan of small and 
> index scan of big => select run faster and temp table stuff makes query 
> 2.5 times slower! In the case of duplicates in small, handling in parser 
> will use distinct (and so - sorting). But using hashjoin plan distinct 
> may be avoided! Who can analize this ? Optimizer only. He can be smart 
> to check is there unique index on small or not. If not - what is more 
> costless: nestloop with sorting or slower hashjoin without sorting. 
> Only optimizer can find best way to execute query, parser can't.
> 

OK, let me comment on this.  Let's take your example:

> 	select b from big where b in (select s from small);
> 
> 	If there is no duplicates in small then this is the same as
> 
> 	select b from big, small where b = s;

My idea was to do this:

	select distinct s into temp table small2 from small;
	select b from big,small2 where b = s;

And let the optimizer decide how to do the join.  Is this what you are
saying?

The problem I see is that the temp table is already distinct, and was
sorted to do that, but you can't pass that information into the
optimizer.  Is that the problem with using the parser?

But you want the temp table never to hit disk unless it has to, but that
will not work unless we do a really good job with temp tables.

Also NOT IN will need some type of non-join operator, perhaps a flag in
the Plan to say "look for a match, but only output if you find it."  How
do we do that?

We definately need temp tables, and I think we can stuff it into the
cache as LOCAL, which will make it usable without adding to pg_class.

Perhaps if we create a special Plan in the optimizer called IN, and we
have the outer and inner queries as plans, and work that plan into the
executor.

The problem with that is we need to specify a way to join the two plans,
and the same logic that determines what type of join to do can this too.
Maybe that's why you wanted stuff done in the optimizer and not the
parser.

At least now, I understand enough to come up with ideas, and can
understand what you are saying.

> > Especially if we trade him for help on his favorite topics for v6.4??
> 
> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
> 
> Vadim
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Fri Dec 19 01:00:58 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25512
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:00:56 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA28102; Fri, 19 Dec 1997 00:56:52 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 19 Dec 1997 00:56:40 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA28077 for pgsql-hackers-outgoing; Fri, 19 Dec 1997 00:56:36 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA28065 for <hackers@postgresql.org>; Fri, 19 Dec 1997 00:56:19 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA25436;
	Fri, 19 Dec 1997 00:55:56 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712190555.AAA25436@candle.pha.pa.us>
Subject: Re: [HACKERS] Items for 6.3
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Fri, 19 Dec 1997 00:55:56 -0500 (EST)
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
In-Reply-To: <349A0265.7329D4EE@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 19, 97 12:13:09 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> select b from big where b in (select s from small);
> 
> If there is no duplicates in small then this is the same as
> 
> select b from big, small where b = s;

I think I see the problem you are describing now.  If we put the
subselect into a temp table, we can't use the existing index on small.s,
even if there is one, or if sorting was involved in creating the temp
table.


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From lockhart@alumni.caltech.edu Fri Dec 19 01:34:26 1997
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA25750
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 01:34:23 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA15234;
	Fri, 19 Dec 1997 06:29:45 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <349A1459.EBFE2C84@alumni.caltech.edu>
Date: Fri, 19 Dec 1997 06:29:45 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgresql.org>
Subject: Re: [HACKERS] Items for 6.3
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> > Could we try to do the subselect/join/union features for 6.3? I know you
> > have been looking at it, and found the deepest parts of the backend to
> > be a bit murky. I'm not familiar with that area at all, but perhaps we
> > could divert Vadim for a week or two or three when he has some time.
>                                           ^^^^^
> More realistic... And this is for initial release only: tuning performance
> of subselects is very hard, long work.
>
> Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
> may appear in 6.4 only. And I'll need in help: could someone add support
> for them in parser ? Not handling - but parsing and common checking.

Yes, I've already added subselect syntax in the parser, but we will need to
modify or add to the parse tree nodes to push that past the parser into the
backend. I'm happy to focus on that, since I understand those pieces pretty well.
There are several places where "subselect syntax" is used: subselects and unions
come to mind right away. If you have an opinion on how the parse nodes should be
structured I can start with that, or I can just put something in and then modify
it as you need later. Do you see unions as being similar to subselects, or are
they a separate problem? To me, they seem like a simpler case since (perhaps) not
as much optimization and internal reorganizing needs to happen.

> Also, it would be nice to have better temp tables implementation
> (without affecting pg_class etc) - node material need in query-level
> temp tables anyway. I'd really like to see temp table files created
> only when its data must go to disk due to local buffer pool is full
> and can't more keep table data in memory.

This sounds very desirable. I noticed that there are, or used to be, multiple
storage managers. Could a manager for temporary storage be written which stores
things in memory until it gets too big and then go to disk? Could that manager
use the mm and md managers internally? Or is all of that at too low a level to be
helpful for this problem?

SQL92 has the concept of transaction-only and session-only tables and variables.
Could an implementation of "temporary tables" be used to implement this feature
at the same time (or form the basis for it later)? It seems like none of these
non-permanent tables need to go to any of the pg_ tables, since other backends do
not need to see them and they are allowed to disappear at the end of the session
(or at a crash). We would just need the "table manager" to cache information on
temporary stuff before looking at the permanent tables (??).

> Also, local buffer manager
> should be re-written to use hash table (like shared bufmgr) for buffer search,
> not sequential scan as now (this is item for TODO) - this will speed up
> things and allow to use more than 64 local buffers.
>
> I'm still sure that handling subselects in parser is not right way.
> And the main problem is not in execution plans (we could use tricks
> to resolve this) but in performance.

Seems to me that the subselect needs to stay untransformed (i.e. executable but
non-optimized) so that an optimizer can independently decide how to transform for
faster execution. That way, in the first implementation we have reliable but
stupid execution, but then can add a subselect optimizer which looks for cases
which can be transformed to run faster.

> > Especially if we trade him for help on his favorite topics for v6.4??
>
> Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)

Sure. (Tell me what it is later :)

                                              - Tom


From vadim@sable.krasnoyarsk.su Fri Dec 19 06:23:14 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27849
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 06:22:46 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id SAA12239;
	Fri, 19 Dec 1997 18:28:13 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349A5A4C.DA366B47@sable.krasnoyarsk.su>
Date: Fri, 19 Dec 1997 18:28:12 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
Subject: Re: [HACKERS] Items for 6.3
References: <199712190553.AAA25415@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> OK, let me comment on this.  Let's take your example:
> 
> >       select b from big where b in (select s from small);
> >
> >       If there is no duplicates in small then this is the same as
> >
> >       select b from big, small where b = s;
> 
> My idea was to do this:
> 
>         select distinct s into temp table small2 from small;
>         select b from big,small2 where b = s;
> 
> And let the optimizer decide how to do the join.  Is this what you are
> saying?
> 
> The problem I see is that the temp table is already distinct, and was
> sorted to do that, but you can't pass that information into the
> optimizer.  Is that the problem with using the parser?

No. I said that in some cases we can avoid distinct at all: if either
unique index on small exists or by using hashjoin plans with !new!
HashUnique node (there was mistake in my prev description - not Hash,
but HashUnique on small should be used, - HashUnique is hash table
without duplicates, just another way to implement distinct, without
sorting). This new node can be usefull and for "normal" queries
(without subselects).

My example is very simple. I just want to say that by handling subqueries
in optimizer we will have more chances to do better optimization. Maybe not
now, but latter. I'm sure that subqueries require some specific optimization
and this is not task of parser.

> 
> But you want the temp table never to hit disk unless it has to, but that
> will not work unless we do a really good job with temp tables.

Of 'course.

> 
> Also NOT IN will need some type of non-join operator, perhaps a flag in
> the Plan to say "look for a match, but only output if you find it."  How
                                                           ^^
                                                          don't ?
> do we do that?

Just as you said - by using of some flag.

> 
> We definately need temp tables, and I think we can stuff it into the
> cache as LOCAL, which will make it usable without adding to pg_class.

We have Relation->rd_istemp flag... Just change it from bool to int:
0 -> is not temp, 1 -> session level temp table, etc...

Vadim

From vadim@sable.krasnoyarsk.su Fri Dec 19 08:09:11 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00349
	for <maillist@candle.pha.pa.us>; Fri, 19 Dec 1997 08:09:05 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id UAA12377;
	Fri, 19 Dec 1997 20:14:25 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349A7327.9A484B74@sable.krasnoyarsk.su>
Date: Fri, 19 Dec 1997 20:14:15 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgresql.org>
Subject: Re: [HACKERS] Items for 6.3
References: <199712090506.AAA05538@candle.pha.pa.us> <348CE8BE.FE0F8AA1@alumni.caltech.edu> <349A0265.7329D4EE@sable.krasnoyarsk.su> <349A1459.EBFE2C84@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Thomas G. Lockhart wrote:
> 
> > Ok - I'm ready to do subselects for 6.3 but this means that foreign keys
> > may appear in 6.4 only. And I'll need in help: could someone add support
> > for them in parser ? Not handling - but parsing and common checking.
> 
> Yes, I've already added subselect syntax in the parser, but we will need to
> modify or add to the parse tree nodes to push that past the parser into the
> backend. I'm happy to focus on that, since I understand those pieces pretty well.

Nice!

> There are several places where "subselect syntax" is used: subselects and unions
> come to mind right away. If you have an opinion on how the parse nodes should be
> structured I can start with that, or I can just put something in and then modify
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It's ok for me.

> it as you need later. Do you see unions as being similar to subselects, or are
> they a separate problem? To me, they seem like a simpler case since (perhaps) not
> as much optimization and internal reorganizing needs to happen.

I didn't think about unions at all... Yes, it's simpler to implement.
BTW, I recall Bruce mentioned that unions are used for selects from
superclass and all descendant classes (select ... from table* ) - maybe
something is already implemented ? Bruce ?

> 
> > Also, it would be nice to have better temp tables implementation
> > (without affecting pg_class etc) - node material need in query-level
> > temp tables anyway. I'd really like to see temp table files created
> > only when its data must go to disk due to local buffer pool is full
> > and can't more keep table data in memory.
> 
> This sounds very desirable. I noticed that there are, or used to be, multiple
> storage managers. Could a manager for temporary storage be written which stores
> things in memory until it gets too big and then go to disk? Could that manager
> use the mm and md managers internally? Or is all of that at too low a level to be
> helpful for this problem?

mm uses shmem... This feature could be implemented in local bufmgr
directly: when requested buffer is not found in pool and there is no free, 
!dirty buffer then try to find some dirty buffer of created relation, flush 
it to disk and use (exception below); if no such buffer -> create some relation 
(and flush 1st block); exception: also create some relation if # of buffers 
occupied by already created relations is too small (just to do not break
buffering of created relations).
(Note, that using some additional in-memory storage manager will cause
keeping some buffers in-memory twice - in local pool and in manager.
The way above is using local bufmgr as storage manager).

> >
> > I'm still sure that handling subselects in parser is not right way.
> > And the main problem is not in execution plans (we could use tricks
> > to resolve this) but in performance.
> 
> Seems to me that the subselect needs to stay untransformed (i.e. executable but
> non-optimized) so that an optimizer can independently decide how to transform for
> faster execution. That way, in the first implementation we have reliable but
> stupid execution, but then can add a subselect optimizer which looks for cases
> which can be transformed to run faster.

Yes, I believe that this is right way.

> 
> > > Especially if we trade him for help on his favorite topics for v6.4??
> >
> > Ok, I'd like to see shared catalog cache implemeted in 6.4... -:)
> 
> Sure. (Tell me what it is later :)

Ok -:)

Vadim

From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:21 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08884
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:01:18 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA24250 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 03:57:12 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
	Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
Date: Tue, 23 Dec 1997 16:04:23 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: lockhart@alumni.caltech.edu, hackers@postgresql.org
Subject: Re: [HACKERS] Items for 6.3
References: <199712191607.LAA02362@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> >
> > I didn't think about unions at all... Yes, it's simpler to implement.
> > BTW, I recall Bruce mentioned that unions are used for selects from
> > superclass and all descendant classes (select ... from table* ) - maybe
> > something is already implemented ? Bruce ?
> 
> Yes, it is already there.  See optimizer/prep/prepunion.c, and see the
> call to it from optimizer/plan/planner.c.  The current source tree has a
> cleaned up version that will be easier to understand.  Basically, if
> there are any inherited tables, it calls prepunion, and and cycles
> through each inherited table, copying the Query plan, and calling the
> planner() for each one, then it returns to the planner() to so sorting
> and uniqueness.  I am working on fixing aggregates.

Could you try with unions ?
I would like to concentrate on single thing - subqueries.

> 
> > mm uses shmem... This feature could be implemented in local bufmgr
> > directly: when requested buffer is not found in pool and there is no free,
> > !dirty buffer then try to find some dirty buffer of created relation, flush
> > it to disk and use (exception below); if no such buffer -> create some relation
> > (and flush 1st block); exception: also create some relation if # of buffers
> > occupied by already created relations is too small (just to do not break
> > buffering of created relations).
> > (Note, that using some additional in-memory storage manager will cause
> > keeping some buffers in-memory twice - in local pool and in manager.
> > The way above is using local bufmgr as storage manager).
> 
> In the psort code, we do a nice job of keeping the stuff in files or
> memory.  Seems to work well.  Can we use that somehow?  Perhaps make it
> a separate module, or just force a psort rather than a hash!

I would like to be not restricted to psort only, but use what is better
in each case. I even can foresee using indices on temp tables: we could
put data in index without putting data in table itself!
In any case, we can leave in-memory tables for future.

Vadim

From owner-pgsql-hackers@hub.org Tue Dec 23 04:31:23 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09186
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:31:20 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA24391 for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:04:44 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id EAA06421; Tue, 23 Dec 1997 04:00:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 23 Dec 1997 03:58:36 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id DAA06163 for pgsql-hackers-outgoing; Tue, 23 Dec 1997 03:58:32 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id DAA06151 for <hackers@postgresql.org>; Tue, 23 Dec 1997 03:58:02 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23028;
	Tue, 23 Dec 1997 16:04:25 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <349F7E97.48C63F17@sable.krasnoyarsk.su>
Date: Tue, 23 Dec 1997 16:04:23 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
Subject: Re: [HACKERS] Items for 6.3
References: <199712191607.LAA02362@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> >
> > I didn't think about unions at all... Yes, it's simpler to implement.
> > BTW, I recall Bruce mentioned that unions are used for selects from
> > superclass and all descendant classes (select ... from table* ) - maybe
> > something is already implemented ? Bruce ?
> 
> Yes, it is already there.  See optimizer/prep/prepunion.c, and see the
> call to it from optimizer/plan/planner.c.  The current source tree has a
> cleaned up version that will be easier to understand.  Basically, if
> there are any inherited tables, it calls prepunion, and and cycles
> through each inherited table, copying the Query plan, and calling the
> planner() for each one, then it returns to the planner() to so sorting
> and uniqueness.  I am working on fixing aggregates.

Could you try with unions ?
I would like to concentrate on single thing - subqueries.

> 
> > mm uses shmem... This feature could be implemented in local bufmgr
> > directly: when requested buffer is not found in pool and there is no free,
> > !dirty buffer then try to find some dirty buffer of created relation, flush
> > it to disk and use (exception below); if no such buffer -> create some relation
> > (and flush 1st block); exception: also create some relation if # of buffers
> > occupied by already created relations is too small (just to do not break
> > buffering of created relations).
> > (Note, that using some additional in-memory storage manager will cause
> > keeping some buffers in-memory twice - in local pool and in manager.
> > The way above is using local bufmgr as storage manager).
> 
> In the psort code, we do a nice job of keeping the stuff in files or
> memory.  Seems to work well.  Can we use that somehow?  Perhaps make it
> a separate module, or just force a psort rather than a hash!

I would like to be not restricted to psort only, but use what is better
in each case. I even can foresee using indices on temp tables: we could
put data in index without putting data in table itself!
In any case, we can leave in-memory tables for future.

Vadim


From aixssd!darrenk@abs.net Thu Dec  5 10:30:53 1996
Received: from abs.net (root@u1.abs.net [207.114.0.130]) by candle.pha.pa.us (8.8.3/8.7.3) with ESMTP id KAA06591 for <maillist@candle.pha.pa.us>; Thu, 5 Dec 1996 10:30:43 -0500 (EST)
Received: from aixssd.UUCP (nobody@localhost) by abs.net (8.8.3/8.7.3) with UUCP id KAA01387 for maillist@candle.pha.pa.us; Thu, 5 Dec 1996 10:13:56 -0500 (EST)
Received: by aixssd (AIX 3.2/UCB 5.64/4.03)
          id AA36963; Thu, 5 Dec 1996 10:10:24 -0500
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
          id AA34942; Thu, 5 Dec 1996 10:07:56 -0500
Date: Thu, 5 Dec 1996 10:07:56 -0500
From: aixssd!darrenk@abs.net (Darren King)
Message-Id: <9612051507.AA34942@ceodev>
To: maillist@candle.pha.pa.us
Subject: Subselect info.
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Md5: jaWdPH2KYtdr7ESzqcOp5g==
Status: OR

> Any of them deal with implementing subselects?

There's a white paper at the www.sybase.com that might
help a little.  It's just a copy of a presentation
given by the optimizer guru there.  Nothing code-wise,
but he gives a few ways of flattening them with temp
tables, etc...

Darren 

From vadim@sable.krasnoyarsk.su Thu Aug 21 23:42:50 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA04109
	for <maillist@candle.pha.pa.us>; Thu, 21 Aug 1997 23:42:43 -0400 (EDT)
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04399; Fri, 22 Aug 1997 12:04:31 +0800 (KRD)
Sender: root@www.krasnet.ru
Message-ID: <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
Date: Fri, 22 Aug 1997 12:04:31 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects
References: <199708220219.WAA23745@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Considering the complexity of the primary/secondary changes you are
> making, I believe subselects will be easier than that.

I don't do changes for P/F keys - just thinking...
Yes, I think that impl of referential integrity is
more complex work.

As for subselects:

in plannodes.h

typedef struct Plan {
...
    struct Plan         *lefttree;
    struct Plan         *righttree;
} Plan;

/* ----------------
 *  these are are defined to avoid confusion problems with "left"
                                   ^^^^^^^^^^^^^^^^^^
 *  and "right" and "inner" and "outer".  The convention is that   
 *  the "left" plan is the "outer" plan and the "right" plan is
 *  the inner plan, but these make the code more readable.
 * ----------------
 */
#define innerPlan(node)         (((Plan *)(node))->righttree)
#define outerPlan(node)         (((Plan *)(node))->lefttree)

First thought is avoid any confusions by re-defining

#define rightPlan(node)         (((Plan *)(node))->righttree)
#define leftPlan(node)          (((Plan *)(node))->lefttree)

and change all occurrences of 'outer' & 'inner' in code
to 'left' & 'inner' ones:

this will allow to use 'outer' & 'inner' things for subselects
latter, without confusion. My hope is that we may change Executor
very easy by adding outer/inner plans/TupleSlots to
EState, CommonState, JoinState, etc and by doing node
processing in right order.

Subselects are mostly Planner problem.

Unfortunately, I havn't time at the moment: CHECK/DEFAULT...

Vadim

From vadim@sable.krasnoyarsk.su Fri Aug 22 00:00:59 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA04354
	for <maillist@candle.pha.pa.us>; Fri, 22 Aug 1997 00:00:51 -0400 (EDT)
Received: from www.krasnet.ru (localhost [127.0.0.1]) by www.krasnet.ru (8.7.5/8.7.3) with SMTP id MAA04425; Fri, 22 Aug 1997 12:22:37 +0800 (KRD)
Sender: root@www.krasnet.ru
Message-ID: <33FD140D.64880EEB@sable.krasnoyarsk.su>
Date: Fri, 22 Aug 1997 12:22:37 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects
References: <199708220219.WAA23745@candle.pha.pa.us> <33FD0FCF.4DAA423A@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Vadim B. Mikheev wrote:
> 
> this will allow to use 'outer' & 'inner' things for subselects
> latter, without confusion. My hope is that we may change Executor

Or may be use 'high' & 'low' for subselecs (to avoid confusion
with outter hoins).

> very easy by adding outer/inner plans/TupleSlots to
> EState, CommonState, JoinState, etc and by doing node
> processing in right order.
             ^^^^^^^^^^^^^^
Rule is easy:
1. Uncorrelated subselect - do 'low' plan node first
2. Correlated             - do left/right first

- just some flag in structures.

Vadim

From owner-pgsql-hackers@hub.org Thu Oct 30 17:02:30 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA09682
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 17:02:28 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA20688; Thu, 30 Oct 1997 16:58:40 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:58:24 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA20615 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:58:17 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA20495 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:57:54 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA07726
	for hackers@postgreSQL.org; Thu, 30 Oct 1997 16:50:29 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199710302150.QAA07726@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Thu, 30 Oct 1997 16:50:29 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

The only thing I have to add to what I had written earlier is that I
think it is best to have these subqueries executed as early in query
execution as possible.

Every piece of the backend: parser, optimizer, executor, is designed to
work on a single query.  The earlier we can split up the queries, the
better those pieces will work at doing their job.  You want to be able
to use the parser and optimizer on each part of the query separately, if
you can.


Forwarded message:
> I have done some thinking about subselects.  There are basically two
> issues:
 > 
> 	Does the query return one row or several rows?  This can be
> 	determined by seeing if the user uses equals on 'IN' to join the
> 	subquery. 
> 
> 	Is the query correlated, meaning "Does the subquery reference
> 	values from the outer query?"
> 
> (We already have the third type of subquery, the INSERT...SELECT query.)
> 
> So we have these four combinations:
> 
> 	1) one row, no correlation
> 	2) multiple rows, no correlation
> 	3) one row, correlated
> 	4) multiple rows, correlated
> 
> 
> With #1, we can execute the subquery, get the value, replace the
> subquery with the constant returned from the subquery, and execute the
> outer query.
> 
> With #2, we can execute the subquery and put the result into a temporary
> table.  We then rewrite the outer query to access the temporary table
> and replace the subquery with the column name from the temporary table. 
> We probabally put an index on the temp. table, which has only one
> column, because a subquery can only return one column.  We remove the
> temp. table after query execution.
> 
> With #3 and #4, we potentially need to execute the subquery for every
> row returned by the outer query.  Performance would be horrible for
> anything but the smallest query.  Another way to handle this is to
> execute the subquery WITHOUT using any of the outer-query columns to
> restrict the WHERE clause, and add those columns used to join the outer
> variables into the target list of the subquery.  So for query:
> 
> 	select t1.name
> 	from tab t1
> 	where t1.age = (select max(t2.age)
> 		        from tab2
> 		        where tab2.name = t1.name)
> 
> Execute the subquery and put it in a temporary table:
> 
> 	select t2.name, max(t2.age)
> 	into table temp999
> 	from tab2
> 	where tab2.name = t1.name
> 
> 	create index i_temp999 on temp999 (name)
> 
> Then re-write the outer query:
> 
> 	select t1.name
> 	from tab t1, temp999
> 	where t1.age = temp999.age and
> 	      t1.name = temp999.name
> 
> The only problem here is that the subselect is running for all entries
> in tab2, even if the outer query is only going to need a few rows. 
> Determining whether to execute the subquery each time, or create a temp.
> table is often difficult to determine.  Even some non-correlated
> subqueries are better to execute for each row rather the pre-execute the
> entire subquery, expecially if the outer query returns few rows.
> 
> One requirement to handle these issues is better column statistics,
> which I am working on.
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Fri Oct 31 22:30:58 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA15643
	for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:30:56 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA24379 for <maillist@candle.pha.pa.us>; Fri, 31 Oct 1997 22:06:08 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id WAA15503; Fri, 31 Oct 1997 22:03:40 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 31 Oct 1997 22:01:38 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id WAA14136 for pgsql-hackers-outgoing; Fri, 31 Oct 1997 22:01:29 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id WAA13866 for <hackers@postgreSQL.org>; Fri, 31 Oct 1997 22:00:53 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id VAA14566;
	Fri, 31 Oct 1997 21:37:06 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711010237.VAA14566@candle.pha.pa.us>
Subject: Re: [HACKERS] subselects
To: maillist@candle.pha.pa.us (Bruce Momjian)
Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <199710302150.QAA07726@candle.pha.pa.us> from "Bruce Momjian" at Oct 30, 97 04:50:29 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

One more issue I thought of.  You can have multiple subselects in a
single query, and subselects can have their own subselects.

This makes it particularly important that we define a system that always
is able to process the subselect BEFORE the upper select.  This will
allow use to handle all these cases without limitations.

> 
> The only thing I have to add to what I had written earlier is that I
> think it is best to have these subqueries executed as early in query
> execution as possible.
> 
> Every piece of the backend: parser, optimizer, executor, is designed to
> work on a single query.  The earlier we can split up the queries, the
> better those pieces will work at doing their job.  You want to be able
> to use the parser and optimizer on each part of the query separately, if
> you can.
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From hannu@trust.ee Sun Nov  2 10:33:33 1997
Received: from sid.trust.ee (sid.trust.ee [194.204.23.180])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27619
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 10:32:04 -0500 (EST)
Received: from sid.trust.ee (wink.trust.ee [194.204.23.184])
	by sid.trust.ee (8.8.5/8.8.5) with ESMTP id RAA02233;
	Sun, 2 Nov 1997 17:30:11 +0200
Message-ID: <345C9BFD.986C68AA@sid.trust.ee>
Date: Sun, 02 Nov 1997 17:27:57 +0200
From: Hannu Krosing <hannu@trust.ee>
X-Mailer: Mozilla 4.02 [en] (Win95; I)
MIME-Version: 1.0
To: hackers-digest@postgresql.org
CC: maillist@candle.pha.pa.us
Subject: Re: [HACKERS] subselects
References: <199711010401.XAA09216@hub.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> Date: Fri, 31 Oct 1997 21:37:06 +1900 (EST)
> From: Bruce Momjian <maillist@candle.pha.pa.us>
> Subject: Re: [HACKERS] subselects
>
> One more issue I thought of.  You can have multiple subselects in a
> single query, and subselects can have their own subselects.
>
> This makes it particularly important that we define a system that always
> is able to process the subselect BEFORE the upper select.  This will
> allow use to handle all these cases without limitations.

This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
search criteria for the subselect,
for example you can't do

update parts p1
set parts.current_id = (
    select new_id
    from parts p2
    where p1.old_id = p2.new_id);or

select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
from parts p1;

there may be of course ways to rewrite these queries (which the optimiser should do
if it can) but IMHO, these kinds of subselects should still be allowed

> > The only thing I have to add to what I had written earlier is that I
> > think it is best to have these subqueries executed as early in query
> > execution as possible.
> >
> > Every piece of the backend: parser, optimizer, executor, is designed to
> > work on a single query.  The earlier we can split up the queries, the
> > better those pieces will work at doing their job.  You want to be able
> > to use the parser and optimizer on each part of the query separately, if
> > you can.
> >
>

Hannu


From vadim@sable.krasnoyarsk.su Sun Nov  2 21:30:59 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA14831
	for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:30:57 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id VAA19683 for <maillist@candle.pha.pa.us>; Sun, 2 Nov 1997 21:20:13 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id JAA17259; Mon, 3 Nov 1997 09:22:38 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <345D356E.353C51DE@sable.krasnoyarsk.su>
Date: Mon, 03 Nov 1997 09:22:38 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199711021848.NAA08319@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > One more issue I thought of.  You can have multiple subselects in a
> > > single query, and subselects can have their own subselects.
> > >
> > > This makes it particularly important that we define a system that always
> > > is able to process the subselect BEFORE the upper select.  This will
> > > allow use to handle all these cases without limitations.
> >
> > This would severely limit what subselects can be used for as you can't useany of the fields in the upper select in a
> > search criteria for the subselect,
> > for example you can't do
> >
> > update parts p1
> > set parts.current_id = (
> >     select new_id
> >     from parts p2
> >     where p1.old_id = p2.new_id);or
> >
> > select id, price, (select sum(price) from parts p2 where p1.id=p2.id) as totalprice
> > from parts p1;
> >
> > there may be of course ways to rewrite these queries (which the optimiser should do
> > if it can) but IMHO, these kinds of subselects should still be allowed
> 
> I hadn't even gotten to this point yet, but it is a good thing to keep
> in mind.
> 
> In these cases, as in correlated subqueries in the where clause, we will
> create a temporary table, and add the proper join fields and tables to
> the clauses.  Our version of UPDATE accepts a FROM section, and we will
> certainly use this for this purpose.

We can't replace subselect with join if there is aggregate
in subselect.

Actually, I don't see any problems if we going to process subselect
like sql-funcs: non-correlated subselects can be emulated by
funcs without args, for correlated subselects parser (analyze.c)
has to change all upper query references to $1, $2,...

Vadim

From vadim@sable.krasnoyarsk.su Mon Nov  3 06:07:12 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id GAA27433
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 06:07:03 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id SAA18519; Mon, 3 Nov 1997 18:09:44 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <345DB0F7.5E652F78@sable.krasnoyarsk.su>
Date: Mon, 03 Nov 1997 18:09:43 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselects
References: <199711030316.WAA15401@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > > In these cases, as in correlated subqueries in the where clause, we will
> > > create a temporary table, and add the proper join fields and tables to
> > > the clauses.  Our version of UPDATE accepts a FROM section, and we will
> > > certainly use this for this purpose.
> >
> > We can't replace subselect with join if there is aggregate
> > in subselect.
> 
> I got lost here.  Why can't we handle aggregates?

Sorry, I missed using of temp tables. Sybase uses joins (without
temp tables) for non-correlated subqueries:

    A noncorrelated subquery can be evaluated as if it were an independent query.
    Conceptually, the results of the subquery are substituted in the main statement, or
    outer query. This is not how SQL Server actually processes statements with
    subqueries. Noncorrelated subqueries can be alternatively stated as joins and
    are processed as joins by SQL Server. 

but this is not possible if there are aggregates in subquery.

> 
> My idea was this.  This is a non-correlated subquery.
...
No problems with it...

> 
> Here is a correlated example:
> 
>         select *
>         from table_a
>         where table_a.col_a in (select table_b.col_b
>                         from table_b
>                         where table_b.col_b = table_a.col_c)
> 
> rewrite as:
> 
>         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
>         into table_sub
>         from table_a, table_b

First, could we add 'where table_b.col_b = table_a.col_c' here ?
Just to avoid Cartesian results ? I hope we can.

Note that for query

        select *
        from table_a
        where table_a.col_a in (select table_b.col_b * table_a.col_c
                        from table_b)

it's better to do

	select distinct table_a.col_a
	into table table_sub
	from table_b, table_a
        where table_a.col_a = table_b.col_b * table_a.col_c

once again - to avoid Cartesians.

But what could we do for

        select *
        from table_a
        where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
                        from table_b)
???
	select max(table_b.col_b * table_a.col_c), table_a.col_a
	into table table_sub
	from table_b, table_a
        group by table_a.col_a

first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
For tables big and small with 100 000 and 1000 tuples 

select max(x*y), x from big, small group by x

"ate" all free 140M in my file system after 20 minutes (just for
sorting - nothing more) and was killed...

select x from big where x = cor(x);
(cor(int4) is 'select max($1*y) from small') takes 20 minutes -
this is bad too.

> >
> > Actually, I don't see any problems if we going to process subselect
> > like sql-funcs: non-correlated subselects can be emulated by
> > funcs without args, for correlated subselects parser (analyze.c)
> > has to change all upper query references to $1, $2,...
> 
> Yes, logically, they are SQL functions, but aren't we going to see
> terrible performance in such circumstances.  My experience is that when
  ^^^^^^^^^^^^^^^^^^^^
You're right.

> people are given subselects, they start to do huge jobs with them.
> 
> In fact, the final solution may be to have both methods available, and
> switch between them depending on the size of the query sets.  Each
> method has its advantages.  The function example lets the outside query
> be executed, and only calls the subquery when needed.
> 
> For large tables where the subselect is small and is the entire WHERE
> restriction, the SQL function gets call much too often.  A simple join
> of the subquery result and the large table would be much better.  This
> method also allows for sort/merge join of the subquery results, and
> index use.

...keep thinking...

Vadim

From owner-pgsql-hackers@hub.org Mon Nov  3 11:01:01 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id LAA03633
	for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 11:00:59 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id KAA12174 for <maillist@candle.pha.pa.us>; Mon, 3 Nov 1997 10:49:42 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id KAA26203; Mon, 3 Nov 1997 10:33:32 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 03 Nov 1997 10:31:43 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id KAA25514 for pgsql-hackers-outgoing; Mon, 3 Nov 1997 10:31:36 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA25449 for <hackers@postgreSQL.org>; Mon, 3 Nov 1997 10:31:23 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA02262;
	Mon, 3 Nov 1997 10:25:34 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711031525.KAA02262@candle.pha.pa.us>
Subject: Re: [HACKERS] subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 3 Nov 1997 10:25:34 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <345DB0F7.5E652F78@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 3, 97 06:09:43 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> Sorry, I missed using of temp tables. Sybase uses joins (without
> temp tables) for non-correlated subqueries:
> 
>     A noncorrelated subquery can be evaluated as if it were an independent query.
>     Conceptually, the results of the subquery are substituted in the main statement, or
>     outer query. This is not how SQL Server actually processes statements with
>     subqueries. Noncorrelated subqueries can be alternatively stated as joins and
>     are processed as joins by SQL Server. 
> 
> but this is not possible if there are aggregates in subquery.
> 
> > 
> > My idea was this.  This is a non-correlated subquery.
> ...
> No problems with it...
> 
> > 
> > Here is a correlated example:
> > 
> >         select *
> >         from table_a
> >         where table_a.col_a in (select table_b.col_b
> >                         from table_b
> >                         where table_b.col_b = table_a.col_c)
> > 
> > rewrite as:
> > 
> >         select distinct table_b.col_b, table_a.col_c -- the distinct is needed
> >         into table_sub
> >         from table_a, table_b
> 
> First, could we add 'where table_b.col_b = table_a.col_c' here ?
> Just to avoid Cartesian results ? I hope we can.

Yes, of course.  I forgot that line here.  We can also be fancy and move
some of the outer where restrictions on table_a into the subquery.

I think the classic subquery for this would be if someone wanted all
customer names that had invoices in the past month:

select custname
from customer
where custid in (select order.custid
		 from order
		 where order.date >= "09/01/97" and
		       order.date <= "09/30/97"

In this case, the subquery can use an index on 'date' to quickly
evaluate the query, and the resulting temp table can quickly be joined
to the customer table.  If we used SQL functions, every customer would
have an order query evaluated for it, and there may be no multi-column
index on customer and date, or even if there is, this could be many
query executions.


> 
> Note that for query
> 
>         select *
>         from table_a
>         where table_a.col_a in (select table_b.col_b * table_a.col_c
>                         from table_b)
> 
> it's better to do
> 
> 	select distinct table_a.col_a
> 	into table table_sub
> 	from table_b, table_a
>         where table_a.col_a = table_b.col_b * table_a.col_c

Yes, I had not thought of cases where they are doing correlated column
arithmetic, but it looks like this would work.

> 
> once again - to avoid Cartesians.
> 
> But what could we do for
> 
>         select *
>         from table_a
>         where table_a.col_a = (select max(table_b.col_b * table_a.col_c)
>                         from table_b)

OK, who wrote this horrible query. :-)

Without a join of table_b and table_a, even an SQL function would die on
this.  You have to take the current value table_a.col_c, and multiply by
every value of table_b.col_b to get the maximum.

Trying to do a temp table on this is certainly going to be a cartesian
product, but using an SQL function is also going to be a cartesian
product, except that the product is generated in small pieces instead of
in one big query.  The SQL function example may eventually complete, but
it will take forever to do so in cases where the temp table would bomb.

I can recommend some SQL books for anyone go sends in a bug report on
this query. :-)


> ???
> 	select max(table_b.col_b * table_a.col_c), table_a.col_a
> 	into table table_sub
> 	from table_b, table_a
>         group by table_a.col_a
> 
> first tries to sort sizeof(table_a) * sizeof(table_b) tuples...
> For tables big and small with 100 000 and 1000 tuples 
> 
> select max(x*y), x from big, small group by x
> 
> "ate" all free 140M in my file system after 20 minutes (just for
> sorting - nothing more) and was killed...
> 
> select x from big where x = cor(x);
> (cor(int4) is 'select max($1*y) from small') takes 20 minutes -
> this is bad too.

Again, my feeling is that in cases where the temp table would bomb, the
SQL function will be so slow that neither will be acceptable.

> 
> > >
> > > Actually, I don't see any problems if we going to process subselect
> > > like sql-funcs: non-correlated subselects can be emulated by
> > > funcs without args, for correlated subselects parser (analyze.c)
> > > has to change all upper query references to $1, $2,...
> > 
> > Yes, logically, they are SQL functions, but aren't we going to see
> > terrible performance in such circumstances.  My experience is that when
>   ^^^^^^^^^^^^^^^^^^^^
> You're right.
> 
> > people are given subselects, they start to do huge jobs with them.
> > 
> > In fact, the final solution may be to have both methods available, and
> > switch between them depending on the size of the query sets.  Each
> > method has its advantages.  The function example lets the outside query
> > be executed, and only calls the subquery when needed.
> > 
> > For large tables where the subselect is small and is the entire WHERE
> > restriction, the SQL function gets call much too often.  A simple join
> > of the subquery result and the large table would be much better.  This
> > method also allows for sort/merge join of the subquery results, and
> > index use.
> 
> ...keep thinking...
> 
> Vadim
> 


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Thu Nov 20 00:09:18 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA05239
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 00:09:11 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id XAA13776; Wed, 19 Nov 1997 23:59:53 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 19 Nov 1997 23:58:49 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id XAA13599 for pgsql-hackers-outgoing; Wed, 19 Nov 1997 23:58:43 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id XAA13512 for <hackers@postgreSQL.org>; Wed, 19 Nov 1997 23:58:16 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id XAA03103
	for hackers@postgreSQL.org; Wed, 19 Nov 1997 23:57:44 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711200457.XAA03103@candle.pha.pa.us>
Subject: [HACKERS] subselect
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Wed, 19 Nov 1997 23:57:44 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

I am going to overhaul all the /parser files, and I may give subselects
a try while I am in there.  This is where it going to have to be done.

Two things I think I need are:

	temp tables that go away at the end of a statement, so if the
query elog's out, the temp file gets destroyed

	how do I implement "not in":

		select * from a where x not in (select y from b)

Using <> is not going to work because that returns multiple copies of a,
one for every one that doesn't equal.  It is like we need not equals,
but don't return multiple rows.

Any ideas?

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From lockhart@alumni.caltech.edu Thu Nov 20 10:00:59 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA22019
	for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 10:00:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21662 for <maillist@candle.pha.pa.us>; Thu, 20 Nov 1997 09:52:55 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id GAA22754;
	Thu, 20 Nov 1997 06:27:21 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <3473D849.16F67A2A@alumni.caltech.edu>
Date: Thu, 20 Nov 1997 06:27:21 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgresql.org>
Subject: Re: [HACKERS] subselect
References: <199711200457.XAA03103@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> I am going to overhaul all the /parser files

??

> , and I may give subselects
> a try while I am in there.  This is where it going to have to be done.

A first cut at the subselect syntax is already in gram.y. I'm sure that the
e-mail you had sent which collected several items regarding subselects
covers some of this topic. I've been thinking about subselects also, and
had thought that there must be some existing mechanisms in the backend
which can be used to help implement subselects. It seems to me that UNION
might be a good thing to implement first, because it has a fairly
well-defined set of behaviors:

  select a union select b;

chooses elements from a and from b and then sorts/uniques the result.

  select a union all select b;

chooses elements from a, sorts/uniques, and then adds all elements from b.

  select a union select b union all select c;

evaluates left to right, and first evaluates a union b, sorts/uniques, and
then evaluates

  (result) union all select c;

There are several types of subselects. Examples of some are:

1) select a.f from a union select b.f from b order by 1;
Needs temporary table(s), optional sort/unique, final order by.

2) select a.f from a where a.f in (select b.f from b);
Needs temporary table(s). "in" can be first implemented by count(*) > 0 but
would be better performance to have the backend return after the first
match.

3) select a.f from a where exists (select b.f from b where b.f = a);
Need to do the select and do a subselect on _each_ of the returned values?
Again could use count(*) to help implement.

This brings up the point that perhaps the backend needs a row-counting
atomic operation and count(*) could be re-implemented using that. At the
moment count(*) is transformed to a select of OID columns and does not
quite work on table joins.

I would think that outer joins could use some of these support routines
also.

                                                       - Tom

> Two things I think I need are:
>
>         temp tables that go away at the end of a statement, so if the
> query elog's out, the temp file gets destroyed
>
>         how do I implement "not in":
>
>                 select * from a where x not in (select y from b)
>
> Using <> is not going to work because that returns multiple copies of a,
> one for every one that doesn't equal.  It is like we need not equals,
> but don't return multiple rows.
>
> Any ideas?
>
> --
> Bruce Momjian
> maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Mon Dec 22 00:49:03 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA13311
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 00:49:01 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id AAA11930; Mon, 22 Dec 1997 00:45:41 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 00:45:17 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id AAA11756 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 00:45:14 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA11624 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 00:44:57 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11605
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 00:45:23 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712220545.AAA11605@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Mon, 22 Dec 1997 00:45:23 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

OK, a few questions:

	Should we use sortmerge, so we can use our psort as temp tables,
or do we use hashunique?

	How do we pass the query to the optimizer?  How do we represent
the range table for each, and the links between them in correlated
subqueries?

I have to think about this.  Comments are welcome.
-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Mon Dec 22 02:01:27 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA20608
	for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 02:01:25 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA25136 for <maillist@candle.pha.pa.us>; Mon, 22 Dec 1997 01:37:29 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA25289; Mon, 22 Dec 1997 01:31:18 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 22 Dec 1997 01:30:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA23854 for pgsql-hackers-outgoing; Mon, 22 Dec 1997 01:30:35 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22847 for <hackers@postgreSQL.org>; Mon, 22 Dec 1997 01:30:15 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA17354
	for hackers@postgreSQL.org; Mon, 22 Dec 1997 01:05:04 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712220605.BAA17354@candle.pha.pa.us>
Subject: [HACKERS] subselects (fwd)
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Mon, 22 Dec 1997 01:05:03 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Forwarded message:
> OK, a few questions:
> 
> 	Should we use sortmerge, so we can use our psort as temp tables,
> or do we use hashunique?
> 
> 	How do we pass the query to the optimizer?  How do we represent
> the range table for each, and the links between them in correlated
> subqueries?
> 
> I have to think about this.  Comments are welcome.

One more thing.  I guess I am seeing subselects as a different thing
that temp tables.  I can see people wanting to put indexes on their temp
tables, so I think they will need more system catalog support.  For
subselects, I think we can just stuff them into psort, perhaps, and do
the unique as we unload them.

Seems like a natural to me.


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Tue Dec 23 04:01:07 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA08876
	for <maillist@candle.pha.pa.us>; Tue, 23 Dec 1997 04:00:57 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA23042;
	Tue, 23 Dec 1997 16:08:56 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <349F7FA8.77F8DC55@sable.krasnoyarsk.su>
Date: Tue, 23 Dec 1997 16:08:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects (fwd)
References: <199712220605.BAA17354@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Forwarded message:
> > OK, a few questions:
> >
> >       Should we use sortmerge, so we can use our psort as temp tables,
> > or do we use hashunique?
> >
> >       How do we pass the query to the optimizer?  How do we represent
> > the range table for each, and the links between them in correlated
> > subqueries?
> >
> > I have to think about this.  Comments are welcome.
> 
> One more thing.  I guess I am seeing subselects as a different thing
> that temp tables.  I can see people wanting to put indexes on their temp
> tables, so I think they will need more system catalog support.  For
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
What's the difference between temp tables and temp indices ?
Both of them are handled via catalog cache...

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan  3 04:01:00 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA28565
	for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 04:00:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA19242 for <maillist@candle.pha.pa.us>; Sat, 3 Jan 1998 03:47:07 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA21017;
	Sat, 3 Jan 1998 16:08:55 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34AE0023.A477AEC5@sable.krasnoyarsk.su>
Date: Sat, 03 Jan 1998 16:08:51 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>,
        "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Subject: Re: subselects
References: <199712290516.AAA12579@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> With UNIONs done, how are things going with you on subselects?  UNIONs
> are much easier that subselects.
> 
> I am stumped on how to record the subselect query information in the
> parser and stuff.

   And I'm too. We definitely need in EXISTS node and may be in IN one.
Also, we have to support ANY and ALL modifiers of comparison operators
(it would be nice to support ANY and ALL for all operators returning
bool: >, =, ..., like, ~ and so on). Note, that IN is the same as
= ANY (NOT IN ==> <> ALL) assuming that '=' means EQUAL for all data types,
and so, we could avoid IN node, but I'm not sure that I like such
assumption: postgres is OO-like system allowing operators to be overriden
and so, '=' can, in theory, mean not EQUAL but something else (someday
we could allow to specify "meaning" of operator in CREATE OPERATOR) -
in short, I would like IN node.
   Also, I would suggest nodes for ANY and ALL.
   (I need in few days to think more about recording of this stuff...)

> 
> Please let me know what I can do to help, if anything.

Thanks. As I remember, Tom also wished to work here. Tom ?

Bye,
   Vadim

P.S. I'll be "on-line" Jan 5.

From owner-pgsql-hackers@hub.org Mon Jan  5 07:30:51 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA05466
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 07:30:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id HAA04700; Mon, 5 Jan 1998 07:22:06 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 07:21:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id HAA02846 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 07:21:35 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id HAA00903 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 07:20:57 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA24278;
	Mon, 5 Jan 1998 19:36:06 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B0D3AF.F31338B3@sable.krasnoyarsk.su>
Date: Mon, 05 Jan 1998 19:35:59 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> I was thinking about subselects, and how to attach the two queries.
> 
> What if the subquery makes a range table entry in the outer query, and
> the query is set up like the UNION queries where we put the scans in a
> row, but in the case we put them over/under each other.
> 
> And we push a temp table into the catalog cache that represents the
> result of the subquery, then we could join to it in the outer query as
> though it was a real table.
> 
> Also, can't we do the correlated subqueries by adding the proper
> target/output columns to the subquery, and have the outer query
> reference those columns in the subquery range table entry.

Yes, this is a way to handle subqueries by joining to temp table.
After getting plan we could change temp table access path to
node material. On the other hand, it could be useful to let optimizer
know about cost of temp table creation (have to think more about it)...
Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
is one example of this - joining by <> will give us invalid results.
Setting special NOT EQUAL flag is not enough: subquery plan must be
always inner one in this case. The same for handling ALL modifier.
Note, that we generaly can't use aggregates here: we can't add MAX to 
subquery in the case of > ALL (subquery), because of > ALL should return FALSE
if subquery returns NULL(s) but aggregates don't take NULLs into account.

> 
> Maybe I can write up a sample of this?  Vadim, would this help?  Is this
> the point we are stuck at?

Personally, I was stuck by holydays -:)
Now I can spend ~ 8 hours ~ each day for development...

Vadim


From owner-pgsql-hackers@hub.org Mon Jan  5 10:45:30 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA10769
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 10:45:28 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA17823; Mon, 5 Jan 1998 10:32:00 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 10:31:45 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA17757 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 10:31:38 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.5/8.7.5) with ESMTP id KAA17727 for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 10:31:06 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id KAA10375;
	Mon, 5 Jan 1998 10:28:48 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801051528.KAA10375@candle.pha.pa.us>
Subject: Re: [HACKERS] subselect
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 5 Jan 1998 10:28:48 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B0D3AF.F31338B3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 5, 98 07:35:59 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> Yes, this is a way to handle subqueries by joining to temp table.
> After getting plan we could change temp table access path to
> node material. On the other hand, it could be useful to let optimizer
> know about cost of temp table creation (have to think more about it)...
> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
> is one example of this - joining by <> will give us invalid results.
> Setting special NOT EQUAL flag is not enough: subquery plan must be
> always inner one in this case. The same for handling ALL modifier.
> Note, that we generaly can't use aggregates here: we can't add MAX to 
> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> if subquery returns NULL(s) but aggregates don't take NULLs into account.

OK, here are my ideas.  First, I think you have to handle subselects in
the outer node because a subquery could have its own subquery.  Also, we
now have a field in Aggreg to all us to 'usenulls'.

OK, here it is.  I recommend we pass the outer and subquery through
the parser and optimizer separately.

We parse the subquery first.  If the subquery is not correlated, it
should parse fine.  If it is correlated, any columns we find in the
subquery that are not already in the FROM list, we add the table to the
subquery FROM list, and add the referenced column to the target list of
the subquery.

When we are finished parsing the subquery, we create a catalog cache
entry for it called 'sub1' and make its fields match the target
list of the subquery.

In the outer query, we add 'sub1' to its target list, and change
the subquery reference to point to the new range table.  We also add
WHERE clauses to do any correlated joins.

Here is a simple example:

	select *
	from taba
	where col1 = (select col2
		      from tabb)

This is not correlated, and the subquery parser easily.  We create a
'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
clause.  We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.

Here is a more complex correlated subquery:

	select *
	from taba
	where col1 = (select col2
		      from tabb
		      where taba.col3 = tabb.col4)

Here we must add 'taba' to the subquery's FROM list, and add col3 to the
target list of the subquery.  After we parse the subquery, add 'sub1' to
the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
THe optimizer will do the correlation for us.

In the optimizer, we can parse the subquery first, then the outer query,
and then replace all 'sub1' references in the outer query to use the
subquery plan.

I realize making merging the two plans and doing IN and NOT IN is the
real challenge, but I hoped this would give us a start.

What do you think?

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Mon Jan  5 15:02:46 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA28690
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 15:02:44 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA08811 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 14:28:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id CAA24904;
	Tue, 6 Jan 1998 02:56:00 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B13ACD.B1A95805@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 02:55:57 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801051528.KAA10375@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > always inner one in this case. The same for handling ALL modifier.
> > Note, that we generaly can't use aggregates here: we can't add MAX to
> > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> > if subquery returns NULL(s) but aggregates don't take NULLs into account.
> 
> OK, here are my ideas.  First, I think you have to handle subselects in
> the outer node because a subquery could have its own subquery.  Also, we

I hope that this is no matter: if results of subquery (with/without sub-subqueries)
will go into temp table then this table will be re-scanned for each outer tuple.

> now have a field in Aggreg to all us to 'usenulls'.
                                           ^^^^^^^^
 This can't help:

vac=> select * from x;
y
-
1
2
3
 <<< this is NULL
(4 rows)

vac=> select max(y) from x;
max
---
  3

==> we can't replace 

select * from A where A.a > ALL (select y from x);
                                 ^^^^^^^^^^^^^^^
           (NULL will be returned and so A.a > ALL is FALSE - this is what 
            Sybase does, is it right ?)
with

select * from A where A.a > (select max(y) from x);
                             ^^^^^^^^^^^^^^^^^^^^
just because of we lose knowledge about NULLs here.

Also, I would like to handle ANY and ALL modifiers for all bool
operators, either built-in or user-defined, for all data types -
isn't PostgreSQL OO-like RDBMS -:)

> OK, here it is.  I recommend we pass the outer and subquery through
> the parser and optimizer separately.

I don't like this. I would like to get parse-tree from parser for
entire query and let optimizer (on upper level) decide how to rewrite
parse-tree and what plans to produce and how these plans should be
merged. Note, that I don't object your methods below, but only where
to place handling of this. I don't understand why should we add
new part to the system which will do optimizer' work (parse-tree --> 
execution plan) and deal with optimizer nodes. Imho, upper optimizer
level is nice place to do this.

> 
> We parse the subquery first.  If the subquery is not correlated, it
> should parse fine.  If it is correlated, any columns we find in the
> subquery that are not already in the FROM list, we add the table to the
> subquery FROM list, and add the referenced column to the target list of
> the subquery.
> 
> When we are finished parsing the subquery, we create a catalog cache
> entry for it called 'sub1' and make its fields match the target
> list of the subquery.
> 
> In the outer query, we add 'sub1' to its target list, and change
> the subquery reference to point to the new range table.  We also add
> WHERE clauses to do any correlated joins.
...
> Here is a more complex correlated subquery:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> Here we must add 'taba' to the subquery's FROM list, and add col3 to the
> target list of the subquery.  After we parse the subquery, add 'sub1' to
> the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
> sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
> THe optimizer will do the correlation for us.
> 
> In the optimizer, we can parse the subquery first, then the outer query,
> and then replace all 'sub1' references in the outer query to use the
> subquery plan.
> 
> I realize making merging the two plans and doing IN and NOT IN is the
                   ^^^^^^^^^^^^^^^^^^^^^
This is very easy to do! As I already said we have just change sub1
access path (SeqScan of sub1) with SeqScan of Material node with 
subquery plan.

> real challenge, but I hoped this would give us a start.

Decision about how to record subquery stuff in to parse-tree
would be very good start -:)

BTW, note that for _expression_ subqueries (which are introduced without
IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples - 
we have to check that subquery returns single tuple...

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:03 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06836
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:01 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA29980 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:56:05 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28044; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:16 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27203 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:02 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27049 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:30 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09337
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:31:04 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA02675;
	Mon, 5 Jan 1998 17:16:40 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801052216.RAA02675@candle.pha.pa.us>
Subject: Re: [HACKERS] subselect
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 5 Jan 1998 17:16:40 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B15C23.B24D5CC@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 6, 98 05:18:11 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> > I am confused.  Do you want one flat query and want to pass the whole
> > thing into the optimizer?  That brings up some questions:
> 
> No. I just want to follow Tom's way: I would like to see new
> SubSelect node as shortened version of struct Query (or use
> Query structure for each subquery - no matter for me), some 
> subquery-related stuff added to Query (and SubSelect) to help
> optimizer to start, and see

OK, so you want the subquery to actually be INSIDE the outer query
expression.  Do they share a common range table?  If they don't, we
could very easily just fly through when processing the WHERE clause, and
start a new query using a new query structure for the subquery.  Believe
me, you don't want a separate SubQuery-type, just re-use Query for it. 
It allows you to call all the normal query stuff with a consistent
structure.

The parser will need to know it is in a subquery, so it can add the
proper target columns to the subquery, or are you going to do that in
the optimizer.  You can do it in the optimizer, and join the range table
references there too.

> 
> typedef struct A_Expr
> {
>     NodeTag     type;
>     int         oper;           /* type of operation
>                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>             IN, NOT IN, ANY, ALL, EXISTS here,
> 
>     char       *opname;         /* name of operator/function */
>     Node       *lexpr;          /* left argument */
>     Node       *rexpr;          /* right argument */
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>             and SubSelect (Query) here (as possible case).
> 
> One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> Query - how else can we implement VIEWs on selects with subqueries ?

Views are stored as nodeout structures, and are merged into the query's
from list, target list, and where clause.  I am working out
readfunc,outfunc now to make sure they are up-to-date with all the
current fields.

> 
> BTW, is
> 
> select * from A where (select TRUE from B);
> 
> valid syntax ?

I don't think so.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Mon Jan  5 17:01:54 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA02066
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:01:47 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25063;
	Tue, 6 Jan 1998 05:18:13 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B15C23.B24D5CC@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:18:11 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052051.PAA29341@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > OK, here it is.  I recommend we pass the outer and subquery through
> > > the parser and optimizer separately.
> >
> > I don't like this. I would like to get parse-tree from parser for
> > entire query and let optimizer (on upper level) decide how to rewrite
> > parse-tree and what plans to produce and how these plans should be
> > merged. Note, that I don't object your methods below, but only where
> > to place handling of this. I don't understand why should we add
> > new part to the system which will do optimizer' work (parse-tree -->
> > execution plan) and deal with optimizer nodes. Imho, upper optimizer
> > level is nice place to do this.
> 
> I am confused.  Do you want one flat query and want to pass the whole
> thing into the optimizer?  That brings up some questions:

No. I just want to follow Tom's way: I would like to see new
SubSelect node as shortened version of struct Query (or use
Query structure for each subquery - no matter for me), some 
subquery-related stuff added to Query (and SubSelect) to help
optimizer to start, and see

typedef struct A_Expr
{
    NodeTag     type;
    int         oper;           /* type of operation
                                 * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            IN, NOT IN, ANY, ALL, EXISTS here,

    char       *opname;         /* name of operator/function */
    Node       *lexpr;          /* left argument */
    Node       *rexpr;          /* right argument */
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
            and SubSelect (Query) here (as possible case).

One thought to follow this way: RULEs (and so - VIEWs) are handled by using
Query - how else can we implement VIEWs on selects with subqueries ?

BTW, is

select * from A where (select TRUE from B);

valid syntax ?

Vadim

From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:57 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03296
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:55 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA20716 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:22:21 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:48:58 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Goran Thyni <goran@bildbasen.se>
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Goran Thyni wrote:
> 
> Vadim,
> 
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
>    is one example of this - joining by <> will give us invalid results.
> 
> What is you approach towards this problem?

Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
and so, we have to have not just NOT EQUAL flag but some ALL node
with modified operator.

After that, one way is put subquery into inner plan of an join node
to be sure that for an outer tuple all corresponding subquery tuples
will be tested with modified operator (this will require either
changing code of all join nodes or addition of new plan type - we'll see)
and another way is ... suggested by you:

> I got an idea that one could reverse the order,
> that is execute the outer first into a temptable
> and delete from that according to the result of the
> subquery and then return it.
> Probably this is too raw and slow. ;-)

This will be faster in some cases (when subquery returns many results
and there are "not so many" results from outer query) - thanks for idea!

> 
>    Personally, I was stuck by holydays -:)
>    Now I can spend ~ 8 hours ~ each day for development...
> 
> Oh, isn't it christmas eve right now in Russia?

Due to historic reasons New Year is mu-u-u-uch popular
holiday in Russia -:)

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 19:32:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA05070
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 19:32:57 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA26847 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:59:43 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28045; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:40 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27280 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:25 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27030 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:25 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09438
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:35:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id FAA25094;
	Tue, 6 Jan 1998 05:49:02 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B1635A.94A172AD@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 05:48:58 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Goran Thyni <goran@bildbasen.se>
CC: maillist@candle.pha.pa.us, hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801050516.AAA28005@candle.pha.pa.us> <34B0D3AF.F31338B3@sable.krasnoyarsk.su> <19980105132825.28962.qmail@guevara.bildbasen.se>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Goran Thyni wrote:
> 
> Vadim,
> 
>    Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
>    is one example of this - joining by <> will give us invalid results.
> 
> What is you approach towards this problem?

Actually, this is problem of ALL modifier (NOT IN is _not_equal_ ALL)
and so, we have to have not just NOT EQUAL flag but some ALL node
with modified operator.

After that, one way is put subquery into inner plan of an join node
to be sure that for an outer tuple all corresponding subquery tuples
will be tested with modified operator (this will require either
changing code of all join nodes or addition of new plan type - we'll see)
and another way is ... suggested by you:

> I got an idea that one could reverse the order,
> that is execute the outer first into a temptable
> and delete from that according to the result of the
> subquery and then return it.
> Probably this is too raw and slow. ;-)

This will be faster in some cases (when subquery returns many results
and there are "not so many" results from outer query) - thanks for idea!

> 
>    Personally, I was stuck by holydays -:)
>    Now I can spend ~ 8 hours ~ each day for development...
> 
> Oh, isn't it christmas eve right now in Russia?

Due to historic reasons New Year is mu-u-u-uch popular
holiday in Russia -:)

Vadim


From vadim@sable.krasnoyarsk.su Mon Jan  5 18:00:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03300
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 18:00:57 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id RAA21652 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 17:42:15 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 06:09:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052216.RAA02675@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > > I am confused.  Do you want one flat query and want to pass the whole
> > > thing into the optimizer?  That brings up some questions:
> >
> > No. I just want to follow Tom's way: I would like to see new
> > SubSelect node as shortened version of struct Query (or use
> > Query structure for each subquery - no matter for me), some
> > subquery-related stuff added to Query (and SubSelect) to help
> > optimizer to start, and see
> 
> OK, so you want the subquery to actually be INSIDE the outer query
> expression.  Do they share a common range table?  If they don't, we
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No.

> could very easily just fly through when processing the WHERE clause, and
> start a new query using a new query structure for the subquery.  Believe
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... and filling some subquery-related stuff in upper query structure -
still don't know what exactly this could be -:)

> me, you don't want a separate SubQuery-type, just re-use Query for it.
> It allows you to call all the normal query stuff with a consistent
> structure.

No objections.

> 
> The parser will need to know it is in a subquery, so it can add the
> proper target columns to the subquery, or are you going to do that in

I don't think that we need in it, but list of correlation clauses
could be good thing - all in all parser has to check all column 
references...

> the optimizer.  You can do it in the optimizer, and join the range table
> references there too.

Yes.

> > typedef struct A_Expr
> > {
> >     NodeTag     type;
> >     int         oper;           /* type of operation
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             IN, NOT IN, ANY, ALL, EXISTS here,
> >
> >     char       *opname;         /* name of operator/function */
> >     Node       *lexpr;          /* left argument */
> >     Node       *rexpr;          /* right argument */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             and SubSelect (Query) here (as possible case).
> >
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> > Query - how else can we implement VIEWs on selects with subqueries ?
> 
> Views are stored as nodeout structures, and are merged into the query's
> from list, target list, and where clause.  I am working out
> readfunc,outfunc now to make sure they are up-to-date with all the
> current fields.

Nice! This stuff was out-of-date for too long time.

> > BTW, is
> >
> > select * from A where (select TRUE from B);
> >
> > valid syntax ?
> 
> I don't think so.

And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
ANY, ALL, EXISTS - well.

(Time to sleep -:)

Vadim

From owner-pgsql-hackers@hub.org Mon Jan  5 20:31:08 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id UAA06842
	for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:31:06 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id UAA00621 for <maillist@candle.pha.pa.us>; Mon, 5 Jan 1998 20:03:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA28043; Mon, 5 Jan 1998 19:06:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 05 Jan 1998 19:03:38 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA27270 for pgsql-hackers-outgoing; Mon, 5 Jan 1998 19:03:22 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA27141 for <hackers@postgresql.org>; Mon, 5 Jan 1998 19:02:50 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id RAA09919
	for <hackers@postgreSQL.org>; Mon, 5 Jan 1998 17:54:47 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id GAA25129;
	Tue, 6 Jan 1998 06:10:05 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B16844.B4F4BA92@sable.krasnoyarsk.su>
Date: Tue, 06 Jan 1998 06:09:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] subselect
References: <199801052216.RAA02675@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> > > I am confused.  Do you want one flat query and want to pass the whole
> > > thing into the optimizer?  That brings up some questions:
> >
> > No. I just want to follow Tom's way: I would like to see new
> > SubSelect node as shortened version of struct Query (or use
> > Query structure for each subquery - no matter for me), some
> > subquery-related stuff added to Query (and SubSelect) to help
> > optimizer to start, and see
> 
> OK, so you want the subquery to actually be INSIDE the outer query
> expression.  Do they share a common range table?  If they don't, we
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No.

> could very easily just fly through when processing the WHERE clause, and
> start a new query using a new query structure for the subquery.  Believe
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
... and filling some subquery-related stuff in upper query structure -
still don't know what exactly this could be -:)

> me, you don't want a separate SubQuery-type, just re-use Query for it.
> It allows you to call all the normal query stuff with a consistent
> structure.

No objections.

> 
> The parser will need to know it is in a subquery, so it can add the
> proper target columns to the subquery, or are you going to do that in

I don't think that we need in it, but list of correlation clauses
could be good thing - all in all parser has to check all column 
references...

> the optimizer.  You can do it in the optimizer, and join the range table
> references there too.

Yes.

> > typedef struct A_Expr
> > {
> >     NodeTag     type;
> >     int         oper;           /* type of operation
> >                                  * {OP,OR,AND,NOT,ISNULL,NOTNULL} */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             IN, NOT IN, ANY, ALL, EXISTS here,
> >
> >     char       *opname;         /* name of operator/function */
> >     Node       *lexpr;          /* left argument */
> >     Node       *rexpr;          /* right argument */
> >     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >             and SubSelect (Query) here (as possible case).
> >
> > One thought to follow this way: RULEs (and so - VIEWs) are handled by using
> > Query - how else can we implement VIEWs on selects with subqueries ?
> 
> Views are stored as nodeout structures, and are merged into the query's
> from list, target list, and where clause.  I am working out
> readfunc,outfunc now to make sure they are up-to-date with all the
> current fields.

Nice! This stuff was out-of-date for too long time.

> > BTW, is
> >
> > select * from A where (select TRUE from B);
> >
> > valid syntax ?
> 
> I don't think so.

And so, *rexpr can be of Query type only for oper "in" OP, IN, NOT IN,
ANY, ALL, EXISTS - well.

(Time to sleep -:)

Vadim


From owner-pgsql-hackers@hub.org Thu Jan  8 23:10:50 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA09707
	for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:10:48 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA19334 for <maillist@candle.pha.pa.us>; Thu, 8 Jan 1998 23:08:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id XAA14375; Thu, 8 Jan 1998 23:03:29 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 08 Jan 1998 23:03:10 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id XAA14345 for pgsql-hackers-outgoing; Thu, 8 Jan 1998 23:03:06 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id XAA14008 for <hackers@postgreSQL.org>; Thu, 8 Jan 1998 23:00:50 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id WAA09243;
	Thu, 8 Jan 1998 22:55:03 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801090355.WAA09243@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Thu, 8 Jan 1998 22:55:03 -0500 (EST)
Cc: hackers@postgreSQL.org (PostgreSQL-development)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Vadim, I know you are still thinking about subselects, but I have some
more clarification that may help.

We have to add phantom range table entries to correlated subselects so
they will pass the parser.  We might as well add those fields to the
target list of the subquery at the same time:

	select *
	from taba
	where col1 = (select col2
		      from tabb
		      where taba.col3 = tabb.col4)

becomes:

	select *
	from taba
	where col1 = (select col2, tabb.col4 <---
		      from tabb, taba  <---
		      where taba.col3 = tabb.col4)

We add a field to TargetEntry and RangeTblEntry to mark the fact that it
was entered as a correlation entry:

	bool	isCorrelated;

Second, we need to hook the subselect to the main query.  I recommend we
add two fields to Query for this:

	Query *parentQuery;
	List *subqueries;

The parentQuery pointer is used to resolve field names in the correlated
subquery.

	select *
	from taba
	where col1 = (select col2, tabb.col4 <---
		      from tabb, taba  <---
		      where taba.col3 = tabb.col4)

In the query above, the subquery can be easily parsed, and we add the
subquery to the parsent's parentQuery list.

In the parent query, to parse the WHERE clause, we create a new operator
type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
right side is an index to a slot in the subqueries List.

We can then do the rest in the upper optimizer.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Fri Jan  9 10:01:01 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA27305
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 10:00:59 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA21583 for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 09:52:17 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id WAA01623;
	Fri, 9 Jan 1998 22:10:25 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B63DCD.73AA70C7@sable.krasnoyarsk.su>
Date: Fri, 09 Jan 1998 22:10:06 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgresql.org>
Subject: Re: subselects
References: <199801090355.WAA09243@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Vadim, I know you are still thinking about subselects, but I have some
> more clarification that may help.
> 
> We have to add phantom range table entries to correlated subselects so
> they will pass the parser.  We might as well add those fields to the
> target list of the subquery at the same time:
> 
>         select *
>         from taba
>         where col1 = (select col2
>                       from tabb
>                       where taba.col3 = tabb.col4)
> 
> becomes:
> 
>         select *
>         from taba
>         where col1 = (select col2, tabb.col4 <---
>                       from tabb, taba  <---
>                       where taba.col3 = tabb.col4)
> 
> We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> was entered as a correlation entry:
> 
>         bool    isCorrelated;

No, I don't like to add anything in parser. Example:

        select *
        from tabA
        where col1 = (select col2
                      from tabB
                      where tabA.col3 = tabB.col4
                      and exists (select * 
                                  from tabC 
                                  where tabB.colX = tabC.colX and
                                        tabC.colY = tabA.col2)
                     )

: a column of tabA is referenced in sub-subselect 
(is it allowable by standards ?) - in this case it's better 
to don't add tabA to 1st subselect but add tabA to second one
and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
this gives us 2-tables join in 1st subquery instead of 3-tables join.
(And I'm still not sure that using temp tables is best of what can be 
done in all cases...)

Instead of using isCorrelated in TE & RTE we can add 

Index varlevel;

to Var node to reflect (sub)query from where this Var is come
(where is range table to find var's relation using varno). Upmost query
will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
                        ^^^                          ^^^^^^^^^^^^
(I don't see problems with distinguishing Vars of different children
on the same level...)

> 
> Second, we need to hook the subselect to the main query.  I recommend we
> add two fields to Query for this:
> 
>         Query *parentQuery;
>         List *subqueries;

Agreed. And maybe Index queryLevel.

> In the parent query, to parse the WHERE clause, we create a new operator
> type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
                                               ^^^^^^^^^^^^^^^^^^
No. We have to handle (a,b,c) OP (select x, y, z ...) and 
'_a_constant_' OP (select ...) - I don't know is last in standards,
Sybase has this.

Well,

typedef enum OpType
{
    OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR

+ OP_EXISTS, OP_ALL, OP_ANY

} OpType;

typedef struct Expr
{
    NodeTag     type;
    Oid         typeOid;        /* oid of the type of this expr */
    OpType      opType;         /* type of the op */
    Node       *oper;           /* could be Oper or Func */
    List       *args;           /* list of argument nodes */
} Expr;

OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
           List, following your suggestion)

OP_ALL, OP_ANY:

oper is List of Oper nodes. We need in list because of data types of
a, b, c (above) can be different and so Oper nodes will be different too.

lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
left side of subquery' operator.
lsecond(args) is SubSelect.

Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
by parser into corresponding ANY and ALL. At the moment we can do:

IN --> = ANY, NOT IN --> <> ALL

but this will be "known bug": this breaks OO-nature of Postgres, because of
operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
Example: box data type. For boxes, = means equality of _areas_ and =~
means that boxes are the same ==> =~ ANY should be used for IN.

> right side is an index to a slot in the subqueries List.

Vadim

From owner-pgsql-hackers@hub.org Fri Jan  9 17:44:04 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id RAA24779
	for <maillist@candle.pha.pa.us>; Fri, 9 Jan 1998 17:44:01 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id RAA20728; Fri, 9 Jan 1998 17:32:34 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 09 Jan 1998 17:32:19 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id RAA20503 for pgsql-hackers-outgoing; Fri, 9 Jan 1998 17:32:15 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id RAA20008 for <hackers@postgresql.org>; Fri, 9 Jan 1998 17:31:24 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id RAA24282;
	Fri, 9 Jan 1998 17:31:41 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801092231.RAA24282@candle.pha.pa.us>
Subject: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Fri, 9 Jan 1998 17:31:41 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34B63DCD.73AA70C7@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 9, 98 10:10:06 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> > 
> > Vadim, I know you are still thinking about subselects, but I have some
> > more clarification that may help.
> > 
> > We have to add phantom range table entries to correlated subselects so
> > they will pass the parser.  We might as well add those fields to the
> > target list of the subquery at the same time:
> > 
> >         select *
> >         from taba
> >         where col1 = (select col2
> >                       from tabb
> >                       where taba.col3 = tabb.col4)
> > 
> > becomes:
> > 
> >         select *
> >         from taba
> >         where col1 = (select col2, tabb.col4 <---
> >                       from tabb, taba  <---
> >                       where taba.col3 = tabb.col4)
> > 
> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> > was entered as a correlation entry:
> > 
> >         bool    isCorrelated;
> 
> No, I don't like to add anything in parser. Example:
> 
>         select *
>         from tabA
>         where col1 = (select col2
>                       from tabB
>                       where tabA.col3 = tabB.col4
>                       and exists (select * 
>                                   from tabC 
>                                   where tabB.colX = tabC.colX and
>                                         tabC.colY = tabA.col2)
>                      )
> 
> : a column of tabA is referenced in sub-subselect 

This is a strange case that I don't think we need to handle in our first
implementation.

> (is it allowable by standards ?) - in this case it's better 
> to don't add tabA to 1st subselect but add tabA to second one
> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> this gives us 2-tables join in 1st subquery instead of 3-tables join.
> (And I'm still not sure that using temp tables is best of what can be 
> done in all cases...)

I don't see any use for temp tables in subselects anymore.  After having
implemented UNIONS, I now see how much can be done in the upper
optimizer.  I see you just putting the subquery PLAN into the proper
place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

> 
> Instead of using isCorrelated in TE & RTE we can add 
> 
> Index varlevel;

OK.  Sounds good.

> 
> to Var node to reflect (sub)query from where this Var is come
> (where is range table to find var's relation using varno). Upmost query
> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
>                         ^^^                          ^^^^^^^^^^^^
> (I don't see problems with distinguishing Vars of different children
> on the same level...)
> 
> > 
> > Second, we need to hook the subselect to the main query.  I recommend we
> > add two fields to Query for this:
> > 
> >         Query *parentQuery;
> >         List *subqueries;
> 
> Agreed. And maybe Index queryLevel.

Sure.  If it helps.

> 
> > In the parent query, to parse the WHERE clause, we create a new operator
> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
>                                                ^^^^^^^^^^^^^^^^^^
> No. We have to handle (a,b,c) OP (select x, y, z ...) and 
> '_a_constant_' OP (select ...) - I don't know is last in standards,
> Sybase has this.

I have never seen this in my eight years of SQL.  Perhaps we can leave
this for later, maybe much later.

> 
> Well,
> 
> typedef enum OpType
> {
>     OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
> 
> + OP_EXISTS, OP_ALL, OP_ANY
> 
> } OpType;
> 
> typedef struct Expr
> {
>     NodeTag     type;
>     Oid         typeOid;        /* oid of the type of this expr */
>     OpType      opType;         /* type of the op */
>     Node       *oper;           /* could be Oper or Func */
>     List       *args;           /* list of argument nodes */
> } Expr;
> 
> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
>            List, following your suggestion)
> 
> OP_ALL, OP_ANY:
> 
> oper is List of Oper nodes. We need in list because of data types of
> a, b, c (above) can be different and so Oper nodes will be different too.
> 
> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
> left side of subquery' operator.
> lsecond(args) is SubSelect.
> 
> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> by parser into corresponding ANY and ALL. At the moment we can do:
> 
> IN --> = ANY, NOT IN --> <> ALL
> 
> but this will be "known bug": this breaks OO-nature of Postgres, because of
> operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> Example: box data type. For boxes, = means equality of _areas_ and =~
> means that boxes are the same ==> =~ ANY should be used for IN.

That is interesting, to use =~ for ANY.

Yes, but how many operators take a SUBQUERY as an operand.  This is a
special case to me.

I think I see where you are trying to go.  You want subselects to behave
like any other operator, with a subselect type, and you do all the
subselect handling in the optimizer, with special Nodes and actions.

I think this may be just too much of a leap.  We have such clean query
logic for single queries, I can't imagine having an operator that has a
Query operand, and trying to get everything to properly handle it. 
UNIONS were very easy to implement as a List off of Query, with some
foreach()'s in rewrite and the high optimizer.

Subselects are SQL standard, and are never going to be over-ridden by a
user.  Same with UNION.  They want UNION, they get UNION.  They want
Subselect, we are going to spin through the Query structure and give
them what they want.

The complexities of subselects and correlated queries and range tables
and stuff is so bizarre that trying to get it to work inside the type
system could be a huge project.

> 
> > right side is an index to a slot in the subqueries List.

I guess the question is what can we have by February 1?

I have been reading some postings, and it seems to me that subselects
are the litmus test for many evaluators when deciding if a database
engine is full-featured.

Sorry to be so straightforward, but I want to keep hashing this around
until we get a conclusion, so coding can start.

My suggestions have been, I believe, trying to get subselects working
with the fullest functionality by adding the least amount of code, and
keeping the logic clean.

Have you checked out the UNION code?  It is very small, but it works.  I
think it could make a good sample for subselects.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Sat Jan 10 12:00:51 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA28742
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:00:43 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05684;
	Sun, 11 Jan 1998 00:19:10 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:19:08 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgresql.org, "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Subject: Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > No, I don't like to add anything in parser. Example:
> >
> >         select *
> >         from tabA
> >         where col1 = (select col2
> >                       from tabB
> >                       where tabA.col3 = tabB.col4
> >                       and exists (select *
> >                                   from tabC
> >                                   where tabB.colX = tabC.colX and
> >                                         tabC.colY = tabA.col2)
> >                      )
> >
> > : a column of tabA is referenced in sub-subselect
> 
> This is a strange case that I don't think we need to handle in our first
> implementation.

I don't know is this strange case or not :)
But I would like to know is this allowed by standards - can someone
comment on this ?
And I don't see problems with handling this...

> 
> > (is it allowable by standards ?) - in this case it's better
> > to don't add tabA to 1st subselect but add tabA to second one
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > (And I'm still not sure that using temp tables is best of what can be
> > done in all cases...)
> 
> I don't see any use for temp tables in subselects anymore.  After having
> implemented UNIONS, I now see how much can be done in the upper
> optimizer.  I see you just putting the subquery PLAN into the proper
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

When saying about temp tables, I meant tables created by node Material
for subquery plan. This is one of two ways - run subquery once for all
possible upper plan tuples and then just join result table with upper
query. Another way is re-run subquery for each upper query tuple,
without temp table but may be with caching results by some ways.
Actually, there is special case - when subquery can be alternatively 
formulated as joins, - but this is just special case.

> > > In the parent query, to parse the WHERE clause, we create a new operator
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> >                                                ^^^^^^^^^^^^^^^^^^
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > Sybase has this.
> 
> I have never seen this in my eight years of SQL.  Perhaps we can leave
> this for later, maybe much later.

Are you saying about (a, b, c) or about 'a_constant' ?
Again, can someone comment on are they in standards or not ?
Tom ?
If yes then please add parser' support for them now...

> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > by parser into corresponding ANY and ALL. At the moment we can do:
> >
> > IN --> = ANY, NOT IN --> <> ALL
> >
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> > Example: box data type. For boxes, = means equality of _areas_ and =~
> > means that boxes are the same ==> =~ ANY should be used for IN.
> 
> That is interesting, to use =~ for ANY.
> 
> Yes, but how many operators take a SUBQUERY as an operand.  This is a
> special case to me.
> 
> I think I see where you are trying to go.  You want subselects to behave
> like any other operator, with a subselect type, and you do all the
> subselect handling in the optimizer, with special Nodes and actions.
> 
> I think this may be just too much of a leap.  We have such clean query
> logic for single queries, I can't imagine having an operator that has a
> Query operand, and trying to get everything to properly handle it.
> UNIONS were very easy to implement as a List off of Query, with some
> foreach()'s in rewrite and the high optimizer.
> 
> Subselects are SQL standard, and are never going to be over-ridden by a
> user.  Same with UNION.  They want UNION, they get UNION.  They want
> Subselect, we are going to spin through the Query structure and give
> them what they want.
> 
> The complexities of subselects and correlated queries and range tables
> and stuff is so bizarre that trying to get it to work inside the type
> system could be a huge project.

PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
derived from the Berkeley Postgres database management system. While
PostgreSQL retains the powerful object-relational data model, rich data types and
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
easy extensibility of Postgres, it replaces the PostQuel query language with an
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extended subset of SQL.
^^^^^^^^^^^^^^^^^^^^^^

Should we say users that subselect will work for standard data types only ?
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
Is there difference between handling = ANY and ~ ANY ? I don't see any.
Currently we can't get IN working properly for boxes (and may be for others too)
and I don't like to try to resolve these problems now, but hope that someday
we'll be able to do this. At the moment - just convert IN into = ANY and
NOT IN into <> ALL in parser.

(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

> >
> > > right side is an index to a slot in the subqueries List.
> 
> I guess the question is what can we have by February 1?
> 
> I have been reading some postings, and it seems to me that subselects
> are the litmus test for many evaluators when deciding if a database
> engine is full-featured.
> 
> Sorry to be so straightforward, but I want to keep hashing this around
> until we get a conclusion, so coding can start.
> 
> My suggestions have been, I believe, trying to get subselects working
> with the fullest functionality by adding the least amount of code, and
> keeping the logic clean.
> 
> Have you checked out the UNION code?  It is very small, but it works.  I
> think it could make a good sample for subselects.

There is big difference between subqueries and queries in UNION - 
there are not dependences between UNION queries.

Ok, opened issues:

1. Is using upper query' vars in all subquery levels in standard ?
2. Is (a, b, c) OP (subselect) in standard ?
3. What types of expressions (Var, Const, ...) are allowed on the left
   side of operator with subquery on the right ?
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
   (My vote for all boolean operators).

And - did we get consensus on presentation subqueries stuff in Query,
Expr and Var ?
I would like to have something done in parser near Jan 17 to get
subqueries working by Feb 1. I vote for support of all standard
things (1. - 3.) in parser right now - if there will be no time
to implement something like (a, b, c) then optimizer will call
elog(WARN) (oh, sorry, - elog(ERROR)).

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan 10 12:31:05 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id MAA29045
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:31:01 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA23364 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:22:30 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05725;
	Sun, 11 Jan 1998 00:41:22 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:41:19 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199712220545.AAA11605@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> OK, a few questions:
> 
>         Should we use sortmerge, so we can use our psort as temp tables,
> or do we use hashunique?
> 
>         How do we pass the query to the optimizer?  How do we represent
> the range table for each, and the links between them in correlated
> subqueries?

My suggestion is just use varlevel in Var and don't put upper query'
relations into subquery range table.

Vadim

From vadim@sable.krasnoyarsk.su Sat Jan 10 13:01:00 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29357
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:00:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id MAA24030 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 12:40:02 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id AAA05741;
	Sun, 11 Jan 1998 00:58:56 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B7B6DC.937E1B8D@sable.krasnoyarsk.su>
Date: Sun, 11 Jan 1998 00:58:52 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects
References: <199712220545.AAA11605@candle.pha.pa.us> <34B7B2BF.44FE7252@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Vadim B. Mikheev wrote:
> 
> Bruce Momjian wrote:
> >
> > OK, a few questions:
> >
> >         Should we use sortmerge, so we can use our psort as temp tables,
> > or do we use hashunique?
> >
> >         How do we pass the query to the optimizer?  How do we represent
> > the range table for each, and the links between them in correlated
> > subqueries?
> 
> My suggestion is just use varlevel in Var and don't put upper query'
> relations into subquery range table.

Hmm... Sorry, it seems that I did reply to very old message - forget it.

Vadim

From lockhart@alumni.caltech.edu Sat Jan 10 13:30:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id NAA29664
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:30:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id NAA25109 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 13:05:09 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id SAA03623;
	Sat, 10 Jan 1998 18:01:03 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B7B75F.B49D7642@alumni.caltech.edu>
Date: Sat, 10 Jan 1998 18:01:03 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> > > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > > by parser into corresponding ANY and ALL. At the moment we can do:
> > >
> > > IN --> = ANY, NOT IN --> <> ALL
> > >
> > > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > > operators can be overrided and '=' can mean  s o m e t h i n g (not equality).
> > > Example: box data type. For boxes, = means equality of _areas_ and =~
> > > means that boxes are the same ==> =~ ANY should be used for IN.
> >
> > That is interesting, to use =~ for ANY.

If I understand the discussion, I would think is is fine to make an assumption about
which operator is used to implement a subselect expression. If someone remaps an
operator to mean something different, then they will get a different result (or a
nonsensical one) from a subselect.

I'd be happy to remap existing operators to fit into a convention which would work
with subselects (especially if I got to help choose :).

> > Subselects are SQL standard, and are never going to be over-ridden by a
> > user.  Same with UNION.  They want UNION, they get UNION.  They want
> > Subselect, we are going to spin through the Query structure and give
> > them what they want.
>
> PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
> derived from the Berkeley Postgres database management system. While
> PostgreSQL retains the powerful object-relational data model, rich data types and
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> easy extensibility of Postgres, it replaces the PostQuel query language with an
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> extended subset of SQL.
> ^^^^^^^^^^^^^^^^^^^^^^
>
> Should we say users that subselect will work for standard data types only ?
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
> Currently we can't get IN working properly for boxes (and may be for others too)
> and I don't like to try to resolve these problems now, but hope that someday
> we'll be able to do this. At the moment - just convert IN into = ANY and
> NOT IN into <> ALL in parser.
>
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

?? I didn't know that. Wouldn't we want it to eventually use "=" through a sorted
list? That would give more consistant behavior...

> > I have been reading some postings, and it seems to me that subselects
> > are the litmus test for many evaluators when deciding if a database
> > engine is full-featured.
> >
> > Sorry to be so straightforward, but I want to keep hashing this around
> > until we get a conclusion, so coding can start.
> >
> > My suggestions have been, I believe, trying to get subselects working
> > with the fullest functionality by adding the least amount of code, and
> > keeping the logic clean.
> >
> > Have you checked out the UNION code?  It is very small, but it works.  I
> > think it could make a good sample for subselects.
>
> There is big difference between subqueries and queries in UNION -
> there are not dependences between UNION queries.
>
> Ok, opened issues:
>
> 1. Is using upper query' vars in all subquery levels in standard ?

I'm not certain. Let me know if you do not get an answer from someone else and I will
research it.

> 2. Is (a, b, c) OP (subselect) in standard ?

Yes. In fact, it _is_ the standard, and "a OP (subselect)" is a special case where
the parens are allowed to be omitted from a one element list.

> 3. What types of expressions (Var, Const, ...) are allowed on the left
>    side of operator with subquery on the right ?

I think most expressions are allowed. The "constant OP (subselect)" case you were
asking about is just a simplified case since "(a, b, constant) OP (subselect)" where
a and b are column references should be allowed. Of course, our optimizer could
perhaps change this to "(a, b) OP (subselect where x = constant)", or for the first
example "EXISTS (subselect where x = constant)".

> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
>    (My vote for all boolean operators).

Sounds good. But I'll vote with Bruce (and I'll bet you already agree) that it is
important to get an initial implementation for v6.3 which covers a little, some, or
all of the usual SQL subselect constructs. If we have to revisit this for v6.4 then
we will have the benefit of feedback from others in practical applications which
always uncovers new things to consider.

> And - did we get consensus on presentation subqueries stuff in Query,
> Expr and Var ?
> I would like to have something done in parser near Jan 17 to get
> subqueries working by Feb 1. I vote for support of all standard
> things (1. - 3.) in parser right now - if there will be no time
> to implement something like (a, b, c) then optimizer will callelog(WARN) (oh,
> sorry, - elog(ERROR)).

Great. I'd like to help with the remaining parser issues; at the moment "row_expr"
does the right thing with expression comparisions but just parses then ignores
subselect expressions. Let me know what structures you want passed back and I'll put
them in, or if you prefer put in the first one and I'll go through and clean up and
add the rest.

                                                  - Tom


From lockhart@alumni.caltech.edu Sat Jan 10 15:00:58 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id PAA00728
	for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 15:00:56 -0500 (EST)
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id OAA28438 for <maillist@candle.pha.pa.us>; Sat, 10 Jan 1998 14:35:19 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id TAA06002;
	Sat, 10 Jan 1998 19:31:30 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B7CC91.E6E331C7@alumni.caltech.edu>
Date: Sat, 10 Jan 1998 19:31:29 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

> Are you saying about (a, b, c) or about 'a_constant' ?
> Again, can someone comment on are they in standards or not ?
> Tom ?
> If yes then please add parser' support for them now...

As I mentioned a few minutes ago in my last message, I parse the row descriptors and
the subselects but for subselect expressions (e.g. "(a,b) OP (subselect)" I currently
ignore the result. I didn't want to pass things back as lists until something in the
backend was ready to receive them.

If it is OK, I'll go ahead and start passing back a list of expressions when a row
descriptor is present. So, what you will find is lexpr or rexpr in the A_Expr node
being a list rather than an atomic node.

Also, I can start passing back the subselect expression as the rexpr; right now the
parser calls elog() and quits.

btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
If lists are handled farther back, this routine should move to there also and the
parser will just pass the lists. Note that some assumptions have to be made about the
meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
"a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
to disallow those cases or to look for specific appearance of the operator to guess
the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
it has "<>" or "!" then build as "or"s.

Let me know what you want...

                                                       - Tom


From lockhart@alumni.caltech.edu Sun Jan 11 01:01:55 1998
Received: from golem.jpl.nasa.gov (root@gnet04.jpl.nasa.gov [128.149.70.168])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA11953
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:01:51 -0500 (EST)
Received: from alumni.caltech.edu (localhost [127.0.0.1])
	by golem.jpl.nasa.gov (8.8.5/8.8.5) with ESMTP id FAA23797;
	Sun, 11 Jan 1998 05:58:01 GMT
Sender: tgl@gnet04.jpl.nasa.gov
Message-ID: <34B85F68.9C015ED9@alumni.caltech.edu>
Date: Sun, 11 Jan 1998 05:58:01 +0000
From: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
Organization: Caltech/JPL
X-Mailer: Mozilla 4.03 [en] (X11; I; Linux 2.0.30 i686)
MIME-Version: 1.0
To: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgresql.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su>
Content-Type: multipart/mixed; boundary="------------D8B38A0D1F78A10C0023F702"
Status: OR

This is a multi-part message in MIME format.
--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Here are context diffs of gram.y and keywords.c; sorry about sending the full files.
These start sending lists of arguments toward the backend from the parser to
implement row descriptors and subselects.

They should apply OK even over Bruce's recent changes...

                                             - Tom

--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii; name="gram.y.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="gram.y.patch"

*** ../src/backend/parser/gram.y.orig	Sat Jan 10 05:44:36 1998
--- ../src/backend/parser/gram.y	Sat Jan 10 19:29:37 1998
***************
*** 195,200 ****
--- 195,201 ----
  				having_clause
  %type <list>	row_descriptor, row_list
  %type <node>	row_expr
+ %type <str>		RowOp, row_opt
  %type <list>	OptCreateAs, CreateAsList
  %type <node>	CreateAsElement
  %type <value>	NumConst
***************
*** 242,248 ****
   */
  
  /* Keywords (in SQL92 reserved words) */
! %token	ACTION, ADD, ALL, ALTER, AND, AS, ASC,
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
--- 243,249 ----
   */
  
  /* Keywords (in SQL92 reserved words) */
! %token	ACTION, ADD, ALL, ALTER, AND, ANY, AS, ASC,
  		BEGIN_TRANS, BETWEEN, BOTH, BY,
  		CASCADE, CAST, CHAR, CHARACTER, CHECK, CLOSE, COLLATE, COLUMN, COMMIT, 
  		CONSTRAINT, CREATE, CROSS, CURRENT, CURRENT_DATE, CURRENT_TIME, 
***************
*** 258,264 ****
  		ON, OPTION, OR, ORDER, OUTER_P,
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
! 		SECOND_P, SELECT, SET, SUBSTRING,
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
  		UNION, UNIQUE, UPDATE, USING,
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
--- 259,265 ----
  		ON, OPTION, OR, ORDER, OUTER_P,
  		PARTIAL, POSITION, PRECISION, PRIMARY, PRIVILEGES, PROCEDURE, PUBLIC,
  		REFERENCES, REVOKE, RIGHT, ROLLBACK,
! 		SECOND_P, SELECT, SET, SOME, SUBSTRING,
  		TABLE, TIME, TIMESTAMP, TO, TRAILING, TRANSACTION, TRIM,
  		UNION, UNIQUE, UPDATE, USING,
  		VALUES, VARCHAR, VARYING, VERBOSE, VERSION, VIEW,
***************
*** 2853,2866 ****
  /* Expressions using row descriptors
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
   *  with singleton expressions.
   */
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
  				{
! 					$$ = NULL;
  				}
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
  				{
! 					$$ = NULL;
  				}
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
  				{
--- 2854,2878 ----
  /* Expressions using row descriptors
   * Define row_descriptor to allow yacc to break the reduce/reduce conflict
   *  with singleton expressions.
+  *
+  * Note that "SOME" is the same as "ANY" in syntax.
+  * - thomas 1998-01-10
   */
  row_expr: '(' row_descriptor ')' IN '(' SubSelect ')'
  				{
! 					$$ = makeA_Expr(OP, "=any", (Node *)$2, (Node *)$6);
  				}
  		| '(' row_descriptor ')' NOT IN '(' SubSelect ')'
  				{
! 					$$ = makeA_Expr(OP, "<>any", (Node *)$2, (Node *)$7);
! 				}
! 		| '(' row_descriptor ')' RowOp row_opt '(' SubSelect ')'
! 				{
! 					char *opr;
! 					opr = palloc(strlen($4)+strlen($5)+1);
! 					strcpy(opr, $4);
! 					strcat(opr, $5);
! 					$$ = makeA_Expr(OP, opr, (Node *)$2, (Node *)$7);
  				}
  		| '(' row_descriptor ')' '=' '(' row_descriptor ')'
  				{
***************
*** 2880,2885 ****
--- 2892,2907 ----
  				}
  		;
  
+ RowOp:  '='						{ $$ = "="; }
+ 		| '<'					{ $$ = "<"; }
+ 		| '>'					{ $$ = ">"; }
+ 		;
+ 
+ row_opt:  ALL					{ $$ = "all"; }
+ 		| ANY					{ $$ = "any"; }
+ 		| SOME					{ $$ = "any"; }
+ 		;
+ 
  row_descriptor:  row_list ',' a_expr
  				{
  					$$ = lappend($1, $3);
***************
*** 3432,3441 ****
  		;
  
  in_expr:  SubSelect
! 				{
! 					elog(ERROR,"IN (SUBSELECT) not yet implemented");
! 					$$ = $1;
! 				}
  		| in_expr_nodes
  				{	$$ = $1; }
  		;
--- 3454,3460 ----
  		;
  
  in_expr:  SubSelect
! 				{	$$ = makeA_Expr(OP, "=", saved_In_Expr, (Node *)$1); }
  		| in_expr_nodes
  				{	$$ = $1; }
  		;
***************
*** 3449,3458 ****
  		;
  
  not_in_expr:  SubSelect
! 				{
! 					elog(ERROR,"NOT IN (SUBSELECT) not yet implemented");
! 					$$ = $1;
! 				}
  		| not_in_expr_nodes
  				{	$$ = $1; }
  		;
--- 3468,3474 ----
  		;
  
  not_in_expr:  SubSelect
! 				{	$$ = makeA_Expr(OP, "<>", saved_In_Expr, (Node *)$1); }
  		| not_in_expr_nodes
  				{	$$ = $1; }
  		;

--------------D8B38A0D1F78A10C0023F702
Content-Type: text/plain; charset=us-ascii; name="keywords.c.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="keywords.c.patch"

*** ../src/backend/parser/keywords.c.orig	Mon Jan  5 07:51:33 1998
--- ../src/backend/parser/keywords.c	Sat Jan 10 19:22:07 1998
***************
*** 39,44 ****
--- 39,45 ----
  	{"alter", ALTER},
  	{"analyze", ANALYZE},
  	{"and", AND},
+ 	{"any", ANY},
  	{"append", APPEND},
  	{"archive", ARCHIVE},
  	{"as", AS},
***************
*** 178,183 ****
--- 179,185 ----
  	{"set", SET},
  	{"setof", SETOF},
  	{"show", SHOW},
+ 	{"some", SOME},
  	{"stdin", STDIN},
  	{"stdout", STDOUT},
  	{"substring", SUBSTRING},

--------------D8B38A0D1F78A10C0023F702--


From owner-pgsql-hackers@hub.org Sun Jan 11 01:31:13 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA12255
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:31:10 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA20396 for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 01:10:48 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA22176; Sun, 11 Jan 1998 01:03:15 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 11 Jan 1998 01:02:34 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA22151 for pgsql-hackers-outgoing; Sun, 11 Jan 1998 01:02:26 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA22077 for <hackers@postgresql.org>; Sun, 11 Jan 1998 01:01:05 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA11801;
	Sun, 11 Jan 1998 00:59:23 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801110559.AAA11801@candle.pha.pa.us>
Subject: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Sun, 11 Jan 1998 00:59:23 -0500 (EST)
Cc: hackers@postgresql.org, lockhart@alumni.caltech.edu
In-Reply-To: <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 11, 98 00:19:08 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> I would like to have something done in parser near Jan 17 to get
> subqueries working by Feb 1. I vote for support of all standard
> things (1. - 3.) in parser right now - if there will be no time
> to implement something like (a, b, c) then optimizer will call
> elog(WARN) (oh, sorry, - elog(ERROR)).

First, let me say I am glad we are still on schedule for Feb 1.  I was
panicking because I thought we wouldn't make it in time.


> > > (is it allowable by standards ?) - in this case it's better
> > > to don't add tabA to 1st subselect but add tabA to second one
> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > > (And I'm still not sure that using temp tables is best of what can be
> > > done in all cases...)
> > 
> > I don't see any use for temp tables in subselects anymore.  After having
> > implemented UNIONS, I now see how much can be done in the upper
> > optimizer.  I see you just putting the subquery PLAN into the proper
> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
> 
> When saying about temp tables, I meant tables created by node Material
> for subquery plan. This is one of two ways - run subquery once for all
> possible upper plan tuples and then just join result table with upper
> query. Another way is re-run subquery for each upper query tuple,
> without temp table but may be with caching results by some ways.
> Actually, there is special case - when subquery can be alternatively 
> formulated as joins, - but this is just special case.

This is interesting.  It really only applies for correlated subqueries,
and certainly it may help sometimes to just evaluate the subquery for
valid values that are going to come from the upper query than for all
possible values.  Perhaps we can use the 'cost' value of each query to
decide how to handle this.

> 
> > > > In the parent query, to parse the WHERE clause, we create a new operator
> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> > >                                                ^^^^^^^^^^^^^^^^^^
> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > > Sybase has this.
> > 
> > I have never seen this in my eight years of SQL.  Perhaps we can leave
> > this for later, maybe much later.
> 
> Are you saying about (a, b, c) or about 'a_constant' ?
> Again, can someone comment on are they in standards or not ?
> Tom ?
> If yes then please add parser' support for them now...

OK, Thomas says it is, so we will put in as much code as we can to handle
it.

> Should we say users that subselect will work for standard data types only ?
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
> Currently we can't get IN working properly for boxes (and may be for others too)
> and I don't like to try to resolve these problems now, but hope that someday
> we'll be able to do this. At the moment - just convert IN into = ANY and
> NOT IN into <> ALL in parser.

OK.

> 
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

I did not know that either.

> There is big difference between subqueries and queries in UNION - 
> there are not dependences between UNION queries.

Yes, I know UNIONS are trivial compared to subselects.

> 
> Ok, opened issues:
> 
> 1. Is using upper query' vars in all subquery levels in standard ?
> 2. Is (a, b, c) OP (subselect) in standard ?
> 3. What types of expressions (Var, Const, ...) are allowed on the left
>    side of operator with subquery on the right ?
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
>    (My vote for all boolean operators).
> 
> And - did we get consensus on presentation subqueries stuff in Query,
> Expr and Var ?

OK, here are my concrete ideas on changes and structures.

I think we all agreed that Query needs new fields:

        Query *parentQuery;
        List *subqueries;

Maybe query level too, but I don't think so (see later ideas on Var).

We need a new Node structure, call it Sublink:

	int 	linkType	(IN, NOTIN, ANY, EXISTS, OPERATOR...)
	Oid	operator	/* subquery must return single row */
	List	*lefthand;	/* parent stuff */
	Node 	*subquery;	/* represents nodes from parser */
	Index	Subindex;	/* filled in to index Query->subqueries */

Of course, the names are just suggestions.  Every time we run through
the parsenodes of a query to create a Query* structure, when we do the
WHERE clause, if we come upon one of these Sublink nodes (created in the
parser), we move the supplied Query* in Sublink->subquery to a local
List variable, and we set Subquery->subindex to equal the index of the
new query, i.e. is it the first subquery we found, 1, or the second, 2,
etc.

After we have created the parent Query structure, we run through our
local List variable of subquery parsenodes we created above, and add
Query* entries to Query->subqueries.  In each subquery Query*, we set
the parentQuery pointer.

Also, when parsing the subqueries, we need to keep track of correlated
references.  I recommend we add a field to the Var structure:

	Index	sublevel;	/* range table reference:
				   = 0  current level of query
				   < 0  parent above this many levels
				   > 0  index into subquery list
				 */

This way, a Var node with sublevel 0 is the current level, and is true
in most cases.  This helps us not have to change much code.  sublevel =
-1 means it references the range table in the parent query. sublevel =
-2 means the parent's parent. sublevel = 2 means it references the range
table of the second entry in Query->subqueries.  Varno and varattno are
still meaningful.  Of course, we can't reference variables in the
subqueries from the parent in the parser code, but Vadim may want to.

When doing a Var lookup in the parser, we look in the current level
first, but if not found, if it is a subquery, we can look at the parent
and parent's parent to set the sublevel, varno, and varatno properly.

We create no phantom range table entries in the subquery, and no phantom
target list entries.   We can leave that all for the upper optimizer.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Fri Nov 28 16:34:03 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA17454
	for <maillist@candle.pha.pa.us>; Fri, 28 Nov 1997 16:33:59 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA10553; Fri, 28 Nov 1997 16:20:03 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 28 Nov 1997 16:17:50 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA10116 for pgsql-hackers-outgoing; Fri, 28 Nov 1997 16:17:45 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA09997 for <hackers@postgreSQL.org>; Fri, 28 Nov 1997 16:17:26 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA17309
	for hackers@postgreSQL.org; Fri, 28 Nov 1997 16:18:08 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199711282118.QAA17309@candle.pha.pa.us>
Subject: [HACKERS] querytrees and multiple statements
To: hackers@postgreSQL.org (PostgreSQL-development)
Date: Fri, 28 Nov 1997 16:18:08 -0500 (EST)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Currently, if a query string arrives that has multiple sql statements in
it, the parser breaks it down into separate queries, analyzes each one,
then executes them in order.  (psql automatically breaks things down
into separate queries, do this will not work there.)  The problem is
that if the first query creates a table, and the second query goes to
access it, the parser analysis fails because the table is not yet
created.  See the attached pginterface source for an example.  The real
problem is that all the queries in the string are analyzed first, then
executed, rather than having one analyzed then execute, then the next.

I am going to have touble with subselects and temp tables.  I want to
pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
the QueryTree before the outer select, then the outer select is analyzed
by the parser, the temp table doesn't exist yet, and will cause an
error.

Currently postgres.c does each step on all queries before moving to the
next step.  Does anyone know what the ramifications would be if I
changed this to do to the full set of operations on each statement first
before moving to the next?

---------------------------------------------------------------------------


/*
 * pgnulltest.c
 *
*/

#include <stdio.h>
#include <signal.h>
#include <time.h>
#include <halt.h>
#include <postgres.h>
#include <libpq-fe.h>
#include <pginterface.h>

int main(int argc, char **argv)
{
	char query[4000];
	int i;
	
	if (argc != 2)
		halt("Usage:  %s database\n",argv[0]);

	connectdb(argv[1],NULL,NULL,NULL,NULL);

	sprintf(query,"create table test(x int); select x from test;");
	doquery(query);

	disconnectdb();
	return 0;
}


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Sat Nov 29 05:01:01 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA27942
	for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 05:00:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA13666 for <maillist@candle.pha.pa.us>; Sat, 29 Nov 1997 04:35:08 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id QAA17107; Sat, 29 Nov 1997 16:38:58 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <347FE2B1.167EB0E7@sable.krasnoyarsk.su>
Date: Sat, 29 Nov 1997 16:38:57 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] querytrees and multiple statements
References: <199711282118.QAA17309@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> Currently, if a query string arrives that has multiple sql statements in
> it, the parser breaks it down into separate queries, analyzes each one,
> then executes them in order.  (psql automatically breaks things down
> into separate queries, do this will not work there.)  The problem is
> that if the first query creates a table, and the second query goes to
> access it, the parser analysis fails because the table is not yet
> created.  See the attached pginterface source for an example.  The real
> problem is that all the queries in the string are analyzed first, then
> executed, rather than having one analyzed then execute, then the next.
> 
> I am going to have touble with subselects and temp tables.  I want to
> pull out the subselect, change it into a SELECT ... INTO TEMP, add it to
> the QueryTree before the outer select, then the outer select is analyzed
> by the parser, the temp table doesn't exist yet, and will cause an
> error.
> 
> Currently postgres.c does each step on all queries before moving to the
> next step.  Does anyone know what the ramifications would be if I
> changed this to do to the full set of operations on each statement first
> before moving to the next?

This will break ability to prepare plan (parser + optimizer) for latter
execution. This ability is used by RULEs (and so - by VIEWs) and will be
used by PL(s)...

Please, take a look at nodeMaterial.c:

/*-------------------------------------------------------------------------
 *
 * nodeMaterial.c--
 *    Routines to handle materialization nodes.
...
/*
 * INTERFACE ROUTINES
 *      ExecMaterial            - generate a temporary relation
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(I'm still very busy. Hope to return soon.)

Vadim

From vadim@sable.krasnoyarsk.su Sun Nov 30 02:30:56 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA15439
	for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:30:55 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id CAA17743 for <maillist@candle.pha.pa.us>; Sun, 30 Nov 1997 02:27:40 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id OAA18937; Sun, 30 Nov 1997 14:32:14 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <3481167E.2781E494@sable.krasnoyarsk.su>
Date: Sun, 30 Nov 1997 14:32:14 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] querytrees and multiple statements
References: <199711291854.NAA05185@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> > This will break ability to prepare plan (parser + optimizer) for latter
> > execution. This ability is used by RULEs (and so - by VIEWs) and will be
> > used by PL(s)...
> >
> > Please, take a look at nodeMaterial.c:
> >
> > /*-------------------------------------------------------------------------
> >  *
> >  * nodeMaterial.c--
> >  *    Routines to handle materialization nodes.
> > ...
> > /*
> >  * INTERFACE ROUTINES
> >  *      ExecMaterial            - generate a temporary relation
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> I understand what you are saying here.  The temp table has transaction
> scope, and breaking each query into multiple commands, each with its own
> transaction scope will cause the temp table to go away.

No. I just said that there will be no ability to prepare queries with
subselects for latter execution: will be no ability to get execution plan which
could be passed to executor to get results without additional parser/planner
invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
(==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.

Ability to have execution plans seems important to me. Other DBMS-es use
this for stored procedures and views.

Vadim

From owner-pgsql-hackers@hub.org Mon Dec  1 01:30:57 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA10903
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:30:55 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26262 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:21:28 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA05263; Mon, 1 Dec 1997 01:02:12 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:00:12 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA03357 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:00:07 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id AAA03290 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 00:59:45 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA10395;
	Mon, 1 Dec 1997 00:57:07 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712010557.AAA10395@candle.pha.pa.us>
Subject: Re: [HACKERS] querytrees and multiple statements
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 1 Dec 1997 00:57:07 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <3481167E.2781E494@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Nov 30, 97 02:32:14 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> No. I just said that there will be no ability to prepare queries with
> subselects for latter execution: will be no ability to get execution plan which
> could be passed to executor to get results without additional parser/planner
> invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
> (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
> in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
> 
> Ability to have execution plans seems important to me. Other DBMS-es use
> this for stored procedures and views.
> 
> Vadim
> 

I see what you are saying about other people calling pg_plan().  pg_plan
returns the query rewritten, and a plan, and some areas use that.  I
will have to make sure I honor that functionality in any changes I make
to it.  I will think more about this.  I may have to add an 'execute me'
flag to it.  However, I am unsure how I am going to generate 'just a
plan or rewritten query structure' without actually running the query
and having the temp table created so the rest can be parsed.


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Mon Dec  1 02:00:58 1997
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11221
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:00:57 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA26994 for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 01:55:19 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA23269; Mon, 1 Dec 1997 01:47:13 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 01:45:31 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA22653 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 01:45:25 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA22590 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 01:45:13 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA21318; Mon, 1 Dec 1997 13:49:58 +0700 (KRS)
Message-ID: <34825E16.446B9B3D@sable.krasnoyarsk.su>
Date: Mon, 01 Dec 1997 13:49:58 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org
Subject: Re: [HACKERS] querytrees and multiple statements
References: <199712010557.AAA10395@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> >
> > No. I just said that there will be no ability to prepare queries with
> > subselects for latter execution: will be no ability to get execution plan which
> > could be passed to executor to get results without additional parser/planner
> > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
> > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
> > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
> >
> > Ability to have execution plans seems important to me. Other DBMS-es use
> > this for stored procedures and views.
> >
> > Vadim
> >
> 
> I see what you are saying about other people calling pg_plan().  pg_plan
> returns the query rewritten, and a plan, and some areas use that.  I
> will have to make sure I honor that functionality in any changes I make
> to it.  I will think more about this.  I may have to add an 'execute me'
> flag to it.  However, I am unsure how I am going to generate 'just a
> plan or rewritten query structure' without actually running the query
> and having the temp table created so the rest can be parsed.

That's why I suggest to try with nodeMaterial(): this could allow to handle
subqueries on optimizer level and got single execution plan for
single user query.

Vadim


From owner-pgsql-hackers@hub.org Mon Dec  1 02:46:23 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id CAA11762
	for <maillist@candle.pha.pa.us>; Mon, 1 Dec 1997 02:46:21 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id CAA11681; Mon, 1 Dec 1997 02:35:00 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 01 Dec 1997 02:33:17 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id CAA11451 for pgsql-hackers-outgoing; Mon, 1 Dec 1997 02:33:09 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id CAA11110 for <hackers@postgreSQL.org>; Mon, 1 Dec 1997 02:32:10 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id CAA11574;
	Mon, 1 Dec 1997 02:32:45 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712010732.CAA11574@candle.pha.pa.us>
Subject: Re: [HACKERS] querytrees and multiple statements
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 1 Dec 1997 02:32:45 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34825E16.446B9B3D@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 1, 97 01:49:58 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> > 
> > >
> > > No. I just said that there will be no ability to prepare queries with
> > > subselects for latter execution: will be no ability to get execution plan which
> > > could be passed to executor to get results without additional parser/planner
> > > invocations. This ability is used by SQL-functions and SPI_prepare()/SPI_execp()
> > > (==> PLs). RULEs don't use execution plan, but use parsed query tree (stored
> > > in pg_rewrite) -> I foresee problems with VIEWs on queries with subselects.
> > >
> > > Ability to have execution plans seems important to me. Other DBMS-es use
> > > this for stored procedures and views.
> > >
> > > Vadim
> > >
> > 
> > I see what you are saying about other people calling pg_plan().  pg_plan
> > returns the query rewritten, and a plan, and some areas use that.  I
> > will have to make sure I honor that functionality in any changes I make
> > to it.  I will think more about this.  I may have to add an 'execute me'
> > flag to it.  However, I am unsure how I am going to generate 'just a
> > plan or rewritten query structure' without actually running the query
> > and having the temp table created so the rest can be parsed.
> 
> That's why I suggest to try with nodeMaterial(): this could allow to handle
> subqueries on optimizer level and got single execution plan for
> single user query.

Can you give me more details on this?  I realize I can create an empty
tmp table to get through the parser analysis stuff, but how do I do
something in nodeMaterial?

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Tue Dec  2 00:04:05 1997
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id AAA00350
	for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 00:03:58 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id MAA22889; Tue, 2 Dec 1997 12:09:57 +0700 (KRS)
Sender: root@www.krasnet.ru
Message-ID: <34839824.3F54BC7E@sable.krasnoyarsk.su>
Date: Tue, 02 Dec 1997 12:09:56 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: "Vadim B. Mikheev" <vadim@post.krasnet.ru>, hackers@postgreSQL.org
Subject: Re: [HACKERS] querytrees and multiple statements
References: <199712010732.CAA11574@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > That's why I suggest to try with nodeMaterial(): this could allow to handle
> > subqueries on optimizer level and got single execution plan for
> > single user query.
> 
> Can you give me more details on this?  I realize I can create an empty
> tmp table to get through the parser analysis stuff, but how do I do
> something in nodeMaterial?

 *      ExecMaterial
 *
 *      The first time this is called, ExecMaterial retrieves tuples
 *      this node's outer subplan and inserts them into a temporary
                          ^^^^^^^

 *      relation.  After this is done, a flag is set indicating that
 *      the subplan has been materialized.  Once the relation is
 *      materialized, the first tuple is then returned.  Successive
 *      calls to ExecMaterial return successive tuples from the temp 
 *      relation.

As you see, this node materializes some plan results into temp relation:
instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
its subplan. SeqScan of this materialized relation can be used in any
join plans just like scan od normal relation, e.g. - NESTLOOP plan:

	NESTLOOP
		SeqScan A
		SeqScan B

becomes

	NESTLOOP
		SeqScan
			Material
				...subplan here...
		SeqScan B (or other Material)

and so on...

Vadim

From owner-pgsql-hackers@hub.org Tue Dec  2 01:28:02 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA02313
	for <maillist@candle.pha.pa.us>; Tue, 2 Dec 1997 01:28:00 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA00346; Tue, 2 Dec 1997 01:03:55 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 02 Dec 1997 01:03:04 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28750 for pgsql-hackers-outgoing; Tue, 2 Dec 1997 01:02:57 -0500 (EST)
Received: from candle.pha.pa.us (maillist@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA28254 for <hackers@postgreSQL.org>; Tue, 2 Dec 1997 01:02:38 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id BAA01042;
	Tue, 2 Dec 1997 01:02:15 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199712020602.BAA01042@candle.pha.pa.us>
Subject: Re: [HACKERS] querytrees and multiple statements
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Tue, 2 Dec 1997 01:02:15 -0500 (EST)
Cc: vadim@post.krasnet.ru, hackers@postgreSQL.org
In-Reply-To: <34839824.3F54BC7E@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Dec 2, 97 12:09:56 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> > 
> > >
> > > That's why I suggest to try with nodeMaterial(): this could allow to handle
> > > subqueries on optimizer level and got single execution plan for
> > > single user query.
> > 
> > Can you give me more details on this?  I realize I can create an empty
> > tmp table to get through the parser analysis stuff, but how do I do
> > something in nodeMaterial?
> 
>  *      ExecMaterial
>  *
>  *      The first time this is called, ExecMaterial retrieves tuples
>  *      this node's outer subplan and inserts them into a temporary
>                           ^^^^^^^
> 
>  *      relation.  After this is done, a flag is set indicating that
>  *      the subplan has been materialized.  Once the relation is
>  *      materialized, the first tuple is then returned.  Successive
>  *      calls to ExecMaterial return successive tuples from the temp 
>  *      relation.
> 
> As you see, this node materializes some plan results into temp relation:
> instead of doing SELECT ... INTO temp FROM ... WHERE ... you could
> create Material node using plan for 'SELECT ... FROM ... WHERE ...' as
> its subplan. SeqScan of this materialized relation can be used in any
> join plans just like scan od normal relation, e.g. - NESTLOOP plan:
> 
> 	NESTLOOP
> 		SeqScan A
> 		SeqScan B
> 
> becomes
> 
> 	NESTLOOP
> 		SeqScan
> 			Material
> 				...subplan here...
> 		SeqScan B (or other Material)
> 
> and so on...

The problem now is that I don't understand much about what happens
inside the optimizer or executor.  I am sure you are correct that we can
have the subselect as a subnode, and if you think that is best, then it
is.

This pretty much stops me in developing subselects.  I have the concepts
down of what has to happen, but I can not implement it.  It will take me
several months to learn how the optimizer and executor work in enough
detail to implement this.

I usually alot 2-3 days a month for PostgreSQL development.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Thu Oct 30 01:30:59 1997
Received: from renoir.op.net (root@renoir.op.net [206.84.208.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA17986
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:30:58 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA27090 for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 01:19:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id BAA28901; Thu, 30 Oct 1997 01:16:38 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 01:16:17 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id BAA28673 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 01:16:10 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.5/8.7.5) with ESMTP id BAA27557 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 01:15:27 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by www.krasnet.ru (8.8.7/8.7.3) with SMTP id NAA20275; Thu, 30 Oct 1997 13:16:10 +0700 (KRS)
Message-ID: <34582629.33590565@sable.krasnoyarsk.su>
Date: Thu, 30 Oct 1997 13:16:09 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 3.01 (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: PostgreSQL Developers List <hackers@postgreSQL.org>
Subject: [HACKERS] Subqueries?
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

Hi!

Bruce, did you begin with them ?
I agreed that subqueries should be implemented like SQL-funcs, but
I would suggest to don't CREATE FUNCTION - this is quite bad for
performance, but use some new node (VirtualFunc or SubQuery or) and
handle such nodes like sql-funcs are handled in function.c
(but without parser/planner invocation on each call - should be
fixed!). Also, not corelated subqueries returning single result
can't be replaced in parser/planner by constant node: rules (and so -
views), spi and PL use _prepared_ plans...
It seems that this is not hard work...

Vadim


From owner-pgsql-hackers@hub.org Thu Oct 30 16:31:59 1997
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id QAA07360
	for <maillist@candle.pha.pa.us>; Thu, 30 Oct 1997 16:31:49 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.5/8.7.5) with SMTP id QAA11483; Thu, 30 Oct 1997 16:27:11 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 30 Oct 1997 16:26:14 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.5/8.7.5) id QAA11163 for pgsql-hackers-outgoing; Thu, 30 Oct 1997 16:26:07 -0500 (EST)
Received: from candle.pha.pa.us (root@s3-03.ppp.op.net [206.84.210.195]) by hub.org (8.8.5/8.7.5) with ESMTP id QAA10874 for <hackers@postgreSQL.org>; Thu, 30 Oct 1997 16:25:12 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id QAA06370;
	Thu, 30 Oct 1997 16:07:52 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199710302107.QAA06370@candle.pha.pa.us>
Subject: Re: [HACKERS] Subqueries?
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Thu, 30 Oct 1997 16:07:51 -0500 (EST)
Cc: hackers@postgreSQL.org
In-Reply-To: <34582629.33590565@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Oct 30, 97 01:16:09 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Hi!
> 
> Bruce, did you begin with them ?
> I agreed that subqueries should be implemented like SQL-funcs, but
> I would suggest to don't CREATE FUNCTION - this is quite bad for
> performance, but use some new node (VirtualFunc or SubQuery or) and
> handle such nodes like sql-funcs are handled in function.c
> (but without parser/planner invocation on each call - should be
> fixed!). Also, not corelated subqueries returning single result
> can't be replaced in parser/planner by constant node: rules (and so -
> views), spi and PL use _prepared_ plans...
> It seems that this is not hard work...
> 
> Vadim
> 
> 

OK, here is what I have collected over the months about subqueries.
The Sybase whitepaper is also attached.

This should get us thinking about how to implement each subquery type,
what operations need to be performed, and in what order.

---------------------------------------------------------------------------

From: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: [PG95-DEV] Need info on other databases.
To: pg95-dev@ki.net
Date: Fri, 22 Nov 1996 12:49:24 -0500 (EST)

> 
> 
> What I'm specifically interested in is the SQL-92 spec
> for the ANSI things that postgres95 is missing and the
> syntax/limitations on systems like Informix, Sybase,
> Microsoft, et.al...
> 
> Any technical info such as performance hits, disabling
> the use of indices, stuff like that would be _greatly_
> appreciated.  I have a decent understanding of this for
> Oracle, but not for any other systems.  I want to get
> an idea of the work load of adding the IN, BETWEEN/AND
> and HAVING clauses.

I have done some thinking about subselects.  There are basically two
issues:

	Does the query return one row or several rows?  This can be
	determined by seeing if the user uses equals on 'IN' to join the
	subquery. 

	Is the query correlated, meaning "Does the subquery reference
	values from the outer query?"

(We already have the third type of subquery, the INSERT...SELECT query.)

So we have these four combinations:

	1) one row, no correlation
	2) multiple rows, no correlation
	3) one row, correlated
	4) multiple rows, correlated


With #1, we can execute the subquery, get the value, replace the
subquery with the constant returned from the subquery, and execute the
outer query.

With #2, we can execute the subquery and put the result into a temporary
table.  We then rewrite the outer query to access the temporary table
and replace the subquery with the column name from the temporary table. 
We probabally put an index on the temp. table, which has only one
column, because a subquery can only return one column.  We remove the
temp. table after query execution.

With #3 and #4, we potentially need to execute the subquery for every
row returned by the outer query.  Performance would be horrible for
anything but the smallest query.  Another way to handle this is to
execute the subquery WITHOUT using any of the outer-query columns to
restrict the WHERE clause, and add those columns used to join the outer
variables into the target list of the subquery.  So for query:

	select t1.name
	from tab t1
	where t1.age = (select max(t2.age)
		        from tab2
		        where tab2.name = t1.name)

Execute the subquery and put it in a temporary table:

	select t2.name, max(t2.age)
	into table temp999
	from tab2
	where tab2.name = t1.name

	create index i_temp999 on temp999 (name)

Then re-write the outer query:

	select t1.name
	from tab t1, temp999
	where t1.age = temp999.age and
	      t1.name = temp999.name

The only problem here is that the subselect is running for all entries
in tab2, even if the outer query is only going to need a few rows. 
Determining whether to execute the subquery each time, or create a temp.
table is often difficult to determine.  Even some non-correlated
subqueries are better to execute for each row rather the pre-execute the
entire subquery, expecially if the outer query returns few rows.

One requirement to handle these issues is better column statistics,
which I am working on.

------------------------------------------------------------------------------

Date: Thu, 5 Dec 1996 10:07:56 -0500
From: aixssd!darrenk@abs.net (Darren King)
To: maillist@candle.pha.pa.us
Subject: Subselect info.

> Any of them deal with implementing subselects?

There's a white paper at the www.sybase.com that might
help a little.  It's just a copy of a presentation
given by the optimizer guru there.  Nothing code-wise,
but he gives a few ways of flattening them with temp
tables, etc...

Darren 

------------------------------------------------------------------------------

Date: Fri, 22 Aug 1997 12:04:31 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects

Bruce Momjian wrote:
> 
> Considering the complexity of the primary/secondary changes you are
> making, I believe subselects will be easier than that.

I don't do changes for P/F keys - just thinking...
Yes, I think that impl of referential integrity is
more complex work.

As for subselects:

in plannodes.h

typedef struct Plan {
...
    struct Plan         *lefttree;
    struct Plan         *righttree;
} Plan;

/* ----------------
 *  these are are defined to avoid confusion problems with "left"
                                   ^^^^^^^^^^^^^^^^^^
 *  and "right" and "inner" and "outer".  The convention is that   
 *  the "left" plan is the "outer" plan and the "right" plan is
 *  the inner plan, but these make the code more readable.
 * ----------------
 */
#define innerPlan(node)         (((Plan *)(node))->righttree)
#define outerPlan(node)         (((Plan *)(node))->lefttree)

First thought is avoid any confusions by re-defining

#define rightPlan(node)         (((Plan *)(node))->righttree)
#define leftPlan(node)          (((Plan *)(node))->lefttree)

and change all occurrences of 'outer' & 'inner' in code
to 'left' & 'inner' ones:

this will allow to use 'outer' & 'inner' things for subselects
latter, without confusion. My hope is that we may change Executor
very easy by adding outer/inner plans/TupleSlots to
EState, CommonState, JoinState, etc and by doing node
processing in right order.

Subselects are mostly Planner problem.

Unfortunately, I havn't time at the moment: CHECK/DEFAULT...

Vadim

------------------------------------------------------------------------------

Date: Fri, 22 Aug 1997 12:22:37 +0800
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: subselects

Vadim B. Mikheev wrote:
> 
> this will allow to use 'outer' & 'inner' things for subselects
> latter, without confusion. My hope is that we may change Executor

Or may be use 'high' & 'low' for subselecs (to avoid confusion
with outter hoins).

> very easy by adding outer/inner plans/TupleSlots to
> EState, CommonState, JoinState, etc and by doing node
> processing in right order.
             ^^^^^^^^^^^^^^
Rule is easy:
1. Uncorrelated subselect - do 'low' plan node first
2. Correlated             - do left/right first

- just some flag in structures.

Vadim


---------------------------------------------------------------------------

[Image]
Home | Search/Index

Performance Tips for Transact-SQL

Slides from a presentation by Jeff Lichtman

----------------------------------------------------------------------------

Table of Contents

Overview
>versus>=
Exists Versus Not Exists
Exists Versus Not Exists II
Correlated Subqueries with Restrictive Outer Joins
Correlated Subqueries with Restrictive Outer Joins Example
Correlated Subqueries with Restrictive Outer Joins III
Correlated Subqueries with Restrictive Outer Joins IV
Correlated Subqueries with Restrictive Outer Joins V
Correlated Subqueries with Restrictive Outer Joins Example
Creating Tables in Stored Procedures
Creating Tables in Stored Procedures Example
Variables versus Parameters in Where Clause
Variables versus Parameters in Where Clause Example
Count versus Exists
Count versus Exists II
Or versus Union
Or versus Union Example
MAX and MIN Aggregates
MAX and MIN Aggregates II
MAX and MIN Aggregates Example
MAX and MIN Aggregates III
Joins and Datatypes
Joins and Datatypes Example
Joins and Datatypes II
Joins and Datatypes III
Parameters and Datatypes
Parameters and Datatypes Example
Summary
----------------------------------------------------------------------------

Overview

   * Goal Is to Learn Some Tips to Help You Improve the Performance of Your
     Queries.
   * Emphasis Is on Queries, Not on Schema.
   * Many Tips Are Not Related to Query Optimizer.
   * Tips Are Based on Actual Customer Cases Seen by SQL Server Development
     Engineer.
   * These Tips Are Intended As Suggestions and Guidelines, Not Absolute
     Rules.
   * Some of These Tips Could Become Obsolete As Sybase Improves the SQL
     Server.

----------------------------------------------------------------------------

> versus >=

Given the query:

select * from tab where x > 3

with an index on x. This query works by using the index to find the first
value where x = 3, and scanning forward.

Suppose there are many rows in tab where x = 3.

In this case, the server has to scan many pages before finding the first row
where x > 3.

It is more efficient to write the query like this:

select * from tab where x >= 4

----------------------------------------------------------------------------

Exists Versus Not Exists

In subqueries and IF statements, EXISTS and IN are faster than NOT EXISTS
and NOT IN.

With IF statements, one can easily avoid NOT EXISTS:

if not exists (select * from ...)
begin /* Statement group 1 */
...
end else begin /* Statement group 2 */
...
end

can be re-written as:

if exists (select * from ...)
begin /* Statement group 2 */
...
end else begin /* Statement group 1 */
...
end

----------------------------------------------------------------------------

Exists versus Not Exists (cont.)

Even without an ELSE clause, it is possible to avoid

NOT EXISTS in IF statements :

if not exists (select * from ...)
begin
               /* Statement group */
               ...
end
...

can be re-written as:

if exists (select * from ...)
begin
     goto exists_label
end
/* Statement group */
...
exists_label:
...

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins

   * SQL Server Processes Subqueries "Inside-Out"
   * For Correlated Subqueries, It Creates a Worktable Containing Subquery
     Results
   * The Worktable Is Grouped on the Correlation Columns

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins

For example:

select w from outer where x =
     (select sum(a) from inner
      where inner.b = outer.z)

becomes:

select outer.z, summ = sum(inner.a)
into #work
from outer, inner
where inner.b = outer.z
group by outer.z
select outer.w
from outer, #work
where outer.z = #work.z
and outer.x = #work.summ

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins (cont.)

The SQL Server copies search clauses from the outer query to the subquery to
improve performance:

select w from outer
where y = 1
and x = (select sum(a)
     from inner
     where inner.b = outer.z)

becomes:

select outer.z, summ = sum(inner.a)
into #work
from outer, inner
where inner.b = outer.z and outer.y = 1
group by outer .z
select outer.w
from outer, #work
where outer.z = #work.z and outer.y = 1 and outer.x =#work.summ

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins (cont.)

   * The SQL Server Does Not Copy Join Clauses Into Correlated Subqueries As
     It Does With Search Clauses.
   * Copying Search Clauses Will Always Make the Query Run Faster, but
     Copying a Join Clause Might Make It Run Slower.
   * Copying the Join Clause Is Beneficial Only If the Join Clause Is Very
     Restrictive.
   * Only the Query Optimizer Knows Whether a Join Clause Is Restrictive,
     but the SQL Server Breaks the Query Into Steps Before Optimization.
   * Since You Know Your Data, You Can Copy Join Clauses Into Subqueries
     When You Know It Will Help.

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins (cont.)

An example of when to copy join clause:

select *
from huge_tab, single_row_tab
where huge_tab.unique_column = single_row_tab.a
and huge_tab.b = (select sum<75>
       from inner
       where huge_tab.d = inner.e)

should be re-written as:

select *
from huge_tab, single_row_tab
where huge_tab.unique_column = single_row_tab.a
and huge_tab.b = (select sum<75>
        from inner
        where huge_tab.d = inner.e
        and huge_tab.unique_column = single_row_tab.a)

----------------------------------------------------------------------------

Correlated Subqueries with Restrictive Outer Joins (cont.)

An example of when not to copy join clause:

select *
from huge_tab, single_row_tab
where huge_tab.many_duplicates_in_column = single_row_tab.a and
single_row_tab.b = (select sum<75>
     from inner
     where single_row_tab.d = inner.e)

Should not be re-written as:

select *
from huge_tab, single_row_tab
where huge_tab.many_duplicates_in_column = single_row_tab.a and
single_row_tab.b = (select sum<75>
      from inner
      where single_row tab.d = inner .e
      and huge_tab.many_duplicates_in_column = single_row_tab.a)

----------------------------------------------------------------------------

Creating Tables in Stored Procedures

   * When You Create a Table in the Same Stored Procedure Where It Is Used,
     the Query Optimizer Cannot Know How Big the Table Is.
   * The Optimizer Assumes That Any Such Table Has 10 Data Pages and 100
     Rows.
   * If the Table Is Really Big, This Assumption Can Lead the Optimizer to
     Choose a Sub-Optimal Query Plan.
   * In Cases Like This, It Is Better to Create the Table Outside the
     Procedure, Which Allows the Optimizer to See How Large the Table Is.

----------------------------------------------------------------------------

Creating Tables in Stored Procedures (cont)

For example:

create proc p as
      select * into #huge_result from ...
      select * from tab, #huge_result where
 ...

can be re-written as:

create proc p as
      select * into #huge_result from ...
      exec s
create proc s as
      select * from tab, #huge_result where
 ...

----------------------------------------------------------------------------

Variables versus Parameters in Where Clause

   * The Query Optimizer Cannot Predict the Value of a Declared Variable.
   * The Query Does Know the Value of a Parameter to a Stored Procedure at
     Compile Time.
   * Knowing the Values in the WHERE Clause of a Query Can Help the
     Optimizer Make Better Choices.
   * To Avoid Putting Variables Into WHERE Clauses, One Can Split up Stored
     Procedures.

----------------------------------------------------------------------------

Variables versus Parameters in Where Clause (cont)

For example:

create procedure p as
       declare @x int
       select @x = col from tab where ...
       select * from tab2 where col2 = @x

can be re-written as:

create procedure p as
       declare @x int
       select @x = col from tab where ...
       exec s @x
create procedure s @x int as
       select * from tab2 where col2 = @x

----------------------------------------------------------------------------

Count versus Exists

It is possible to use the COUNT aggregate in a subquery to do an existence
check:

select * from tab where 0 <
        (select count(*) from tab2 where ...)

It is possible to write this same query using EXISTS (or IN):

select * from tab where exists
       (select * from tab2 where ...)

----------------------------------------------------------------------------

Count versus Exists (cont)

   * Using COUNT to Do an Existence Check Is Slower Than Using EXISTS.
   * When You Use COUNT, the SQL Server Does Not Know That You Are Doing an
     Existence Check. It Counts All of the Matching Values.
   * When You Use EXISTS, the SQL Server Knows You Are Doing an Existence
     Check, So It Stops Looking When It Finds the First Matching Value.
   * The Same Applies to Using COUNT Instead of IN or ANY.

----------------------------------------------------------------------------

Or versus Union

   * The SQL Server Cannot Optimize Join Clauses That Are Linked With OR.
   * The SQL Server Can Optimize Selects That Are Linked With UNION.
   * The Result of OR Is Somewhat Like the Result of UNION, Except For the
     Treatment of Duplicate Rows and Empty Tables.

----------------------------------------------------------------------------

Or versus Union (cont)

For example:

select * from tab1, tab2
where tab1.a = tab2.b
or tab1.x = tab2.y

can be re-written as:

select * from tab1, tab2
where tab1.a = tab2.b
union all
select * from tab1, tab2
where tab1.x = tab2.y

You can use UNION instead of UNION ALL if you want to eliminate duplicates,
but this will eliminate all duplicates. It may not be possible to get
exactly the same set of duplicates from the re-written query.
----------------------------------------------------------------------------

MAX and MIN Aggregates

   * The SQL Server Uses Special Optimizations for the MAX and MIN
     Aggregates When There Is an Index on the Aggregated Column.
   * For MIN, It Stops the Scan on the First Qualifying Row.
   * For MAX, It Goes Directly to the End of the Index to Find the Last Row.
   * The Optimization Is Not Applied If:
        o The Expression Inside the MAX or MIN Is Anything but a Column
        o The Column Inside the MAX or MIN Is Not the First Column of an
          Index
        o There Is Another Aggregate in the Query
        o There Is a GROUP BY Clause
   * In Addition, the MAX Optimization Is Not Applied If There Is a WHERE
     Clause.

----------------------------------------------------------------------------

MAX and MIN Aggregates (cont)

If you have an optimizable MAX or MIN aggregate, it can pay to put it in a
query separate from other aggregates. For example:

select max(x), min(x) from tab

will result in a full scan of tab, even if there is an index on x. The query
can be re-written as:

select max(x) from tab
select min(x) from tab

This can result in using the index twice, rather than scanning the entire
table once.
----------------------------------------------------------------------------

MAX and MIN Aggregates (cont)

The MIN optimization can backfire if the where clause is highly selective.
For example:

select min(index_col)
from tab
where
       col_in_other_index = "value only at end of first index"

The MIN optimization will result in a nearly complete scan of the entire
index.

This is counter-intuitive. The more selective the WHERE clause, the slower
the query.
----------------------------------------------------------------------------

MAX and MIN Aggregates (cont)

In cases like this, it can pay to disable the MIN optimization by combining
it with another aggregate:

select min(index_col), max(index_col)
from tab
where
col_in_other_index = <20>value only at end of first index<65>

This convinces the optimizer not to use the MIN optimization, so it chooses
the next best plan, which might be the other index.
----------------------------------------------------------------------------

Joins and Datatypes

   * When Joining Between Two Columns of the Different Datatypes, One of the
     Columns Must Be Converted to the Type of the Other.
   * The Commands Reference Manual Shows the Hierarchy of Types.
   * The Column Whose Type Is Lower in the Hierarchy Is the One That Is
     Converted.
   * The Query Optimizer Cannot Choose an Index on the Column That Is
     Converted.

----------------------------------------------------------------------------

Joins and Datatypes (cont)

For example:

select *
from tab1, tab2
where tab1.float_column = tab2.int_column

In this case, no index on tab2.int_column can be used, because int is lower
in the hierarchy than float.

Note that CHAR NULL is really VARCHAR, and BINARY NULL is really VARBINARY.

Joining CHAR NOT NULL with CHAR NULL involves a conversion (BINARY too).
----------------------------------------------------------------------------

Joins and Datatypes (cont)

It's best to avoid datatype problems in joins by designing the schema
accordingly.

If a join between different datatypes is unavoidable, and it hurts
performance, you can force the conversion to be on the other side of the
join.

For example:

select *
from tab1, tab2
where tab1.char_column = convert(char(75),tab2.varchar_column)

----------------------------------------------------------------------------

Joins and Datatypes (cont)

Be careful! This tactic can change the meaning of the query.

For example:

select *
from tab1, tab2
where tab1.int_column = convert(int, tab2.float_column)

This will not return the same results as the join without the convert. It
can be salvaged by adding:

and tab2.float_column = convert(int, tab2.float_column)

This assumes that all values in tab2.float_column can be converted to int.
----------------------------------------------------------------------------

Parameters and Datatypes

   * The Query Optimizer Can Use the Values of Parameters to Stored
     Procedures to Help Determine Costs.
   * If a Parameter Is Not of the Same Type As the Column in The WHERE
     Clause That It Is Being Compared to, the Server Has to Convert the
     Parameter.
   * The Optimizer Cannot Use the Value of a Converted Parameter.
   * It Pays to Make Sure That Parameters Have the Same Type As the Columns
     They Are Compared To.

----------------------------------------------------------------------------

Parameters and Datatypes (cont)

For example:

create proc p @x varchar(30) as
select * from tab where char_column = @x

may get a poorer query plan than:

create proc p @x char(30) as
select * from tab where char_column = @x

Remember that CHAR NULL is really VARCHAR, and BINARY NULL is really
VARBINARY.
----------------------------------------------------------------------------

Summary

   * How you write your queries can make a big difference in performance.
   * Two different queries that do the same thing may perform differently.
   * There are few absolutes to improving performance, but the tips given
     here can help.
   * These tips are not all there is to know about performance.

About the Author

Jeff Lichtman has worked at Sybase since 1987. In 1994, he was given the new
position of architect of query processing for SQL Server. He is informally
known as Sybase's optimizer guru.

For more info send email to webmaster@sybase.com

Copyright 1995 <20> Sybase, Inc. All Rights Reserved.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Sun Jan 11 23:49:44 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA19252
	for <maillist@candle.pha.pa.us>; Sun, 11 Jan 1998 23:49:02 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id MAA08095;
	Mon, 12 Jan 1998 12:09:24 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B9A580.55DD4645@sable.krasnoyarsk.su>
Date: Mon, 12 Jan 1998 12:09:20 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
Subject: Re: [HACKERS] Re: subselects
References: <199801110559.AAA11801@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> We need a new Node structure, call it Sublink:
> 
>         int     linkType        (IN, NOTIN, ANY, EXISTS, OPERATOR...)
>         Oid     operator        /* subquery must return single row */
>         List    *lefthand;      /* parent stuff */
>         Node    *subquery;      /* represents nodes from parser */
>         Index   Subindex;       /* filled in to index Query->subqueries */

Ok, I agreed that it's better to have new node and don't put subquery stuff
into Expr node.

int linkType
        is one of EXISTS, ANY, ALL, EXPR. EXPR is for the case of expression
        subqueries (following Sybase naming) which must return single row -
        (a, b, c) = (subquery).
        Note again, that there are no linkType for IN and NOTIN here. 
        User' IN and NOT IN must be converted to = ANY and <> ALL by parser.

We need not in Oid operator! In all cases we need in

List *oper
        list of Oper nodes for each of a, b, c, ... and operator (=, ...)
        corresponding to data type of a, b, c, ...

List *lefthand
        is list of Var/Const nodes - representation of (a, b, c, ...)

What is Node *subquery ?
In optimizer we need either in Subindex (to get subquery from Query->subqueries
when beeing in Sublink) or in Node *subquery inside Sublink itself.
BTW, after some thought I don't see how Query->subqueries will be usefull.
So, may be just add bool hassubqueries to Query (and Query *parentQuery)
and use Query *subquery in Sublink, but not subindex ?

> 
> Also, when parsing the subqueries, we need to keep track of correlated
> references.  I recommend we add a field to the Var structure:
> 
>         Index   sublevel;       /* range table reference:
>                                    = 0  current level of query
>                                    < 0  parent above this many levels
>                                    > 0  index into subquery list
>                                  */
> 
> This way, a Var node with sublevel 0 is the current level, and is true
> in most cases.  This helps us not have to change much code.  sublevel =
> -1 means it references the range table in the parent query. sublevel =
> -2 means the parent's parent. sublevel = 2 means it references the range
> table of the second entry in Query->subqueries.  Varno and varattno are
> still meaningful.  Of course, we can't reference variables in the
> subqueries from the parent in the parser code, but Vadim may want to.
                                                     ^^^^^^^^^^^^^^^^^
No. So, just use sublevel >= 0: 0 - current level, 1 - one level up, ...
sublevel is for optimizer only - executor will not use it.

> 
> When doing a Var lookup in the parser, we look in the current level
> first, but if not found, if it is a subquery, we can look at the parent
> and parent's parent to set the sublevel, varno, and varatno properly.
> 
> We create no phantom range table entries in the subquery, and no phantom
> target list entries.   We can leave that all for the upper optimizer.

Ok.

Vadim

From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:41 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00786
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:39 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12270 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:16:10 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
	Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
Date: Mon, 12 Jan 1998 16:34:45 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Thomas G. Lockhart wrote:
> 
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
> If lists are handled farther back, this routine should move to there also and the
> parser will just pass the lists. Note that some assumptions have to be made about the
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
> to disallow those cases or to look for specific appearance of the operator to guess
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
> it has "<>" or "!" then build as "or"s.

Oh, god! I never thought about this!
Ok, I have to agree:

1. Only <, <=, =, >, >=, <> is allowed with subselects
2. Use OR's for <>, and so - we need in bool useor in SubLink 
   for <>, <> ANY and <> ALL:

typedef struct SubLink {
	NodeTag		type;
	int		linkType; /* EXISTS, ALL, ANY, EXPR */
	bool		useor;    /* TRUE for <> */
	List	        *lefthand; /* List of Var/Const nodes on the left */
	List	        *oper;     /* List of Oper nodes */
	Query	        *subquery; /* */
} SubLink;

Vadim

From owner-pgsql-hackers@hub.org Mon Jan 12 08:06:53 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00814
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:51 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12449 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:26:03 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA01671; Mon, 12 Jan 1998 04:17:59 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 12 Jan 1998 04:17:29 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA01651 for pgsql-hackers-outgoing; Mon, 12 Jan 1998 04:17:23 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA01633 for <hackers@postgresql.org>; Mon, 12 Jan 1998 04:16:44 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08460;
	Mon, 12 Jan 1998 16:34:54 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su>
Date: Mon, 12 Jan 1998 16:34:45 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Thomas G. Lockhart wrote:
> 
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
> If lists are handled farther back, this routine should move to there also and the
> parser will just pass the lists. Note that some assumptions have to be made about the
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
> to disallow those cases or to look for specific appearance of the operator to guess
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
> it has "<>" or "!" then build as "or"s.

Oh, god! I never thought about this!
Ok, I have to agree:

1. Only <, <=, =, >, >=, <> is allowed with subselects
2. Use OR's for <>, and so - we need in bool useor in SubLink 
   for <>, <> ANY and <> ALL:

typedef struct SubLink {
	NodeTag		type;
	int		linkType; /* EXISTS, ALL, ANY, EXPR */
	bool		useor;    /* TRUE for <> */
	List	        *lefthand; /* List of Var/Const nodes on the left */
	List	        *oper;     /* List of Oper nodes */
	Query	        *subquery; /* */
} SubLink;

Vadim


From vadim@sable.krasnoyarsk.su Mon Jan 12 08:06:38 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id IAA00783
	for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 08:06:36 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id EAA12377 for <maillist@candle.pha.pa.us>; Mon, 12 Jan 1998 04:21:55 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA08470;
	Mon, 12 Jan 1998 16:40:49 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34B9E520.4C0EA6BC@sable.krasnoyarsk.su>
Date: Mon, 12 Jan 1998 16:40:48 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>, hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: subselects
References: <199801092231.RAA24282@candle.pha.pa.us> <34B7AD8C.5ED59CB5@sable.krasnoyarsk.su> <34B7CC91.E6E331C7@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Thomas G. Lockhart wrote:
> 
> btw, to implement "(a,b,c) OP (d,e,f)" I made a new routine in the parser called
> makeRowExpr() which breaks this up into a sequence of "and" and/or "or" expressions.
> If lists are handled farther back, this routine should move to there also and the
> parser will just pass the lists. Note that some assumptions have to be made about the
> meaning of "(a,b) OP (c,d)", since usually we only have knowledge of the behavior of
> "a OP c". Easy for the standard SQL operators, unknown for others, but maybe it is OK
> to disallow those cases or to look for specific appearance of the operator to guess
> the behavior (e.g. if the operator has "<" or "=" or ">" then build as "and"s and if
> it has "<>" or "!" then build as "or"s.

Sorry, I forgot something: is (a, b) OP (x, y) in standard ?
If not then I suggest to don't implement it at all and allow
(a, b) OP [ANY|ALL] (subselect) only.

Vadim

From vadim@sable.krasnoyarsk.su Tue Jan 13 09:30:58 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id JAA28551
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:30:56 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id JAA26483 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 09:21:36 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id VAA04356;
	Tue, 13 Jan 1998 21:20:31 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34BB7829.2B18D4B5@sable.krasnoyarsk.su>
Date: Tue, 13 Jan 1998 21:20:25 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
Subject: Re: [HACKERS] Re: subselects
References: <199801121424.JAA02440@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Ok. I don't see how Query->subqueries could me help, but I foresee
that Query->sublinks can do it. Could you add this ? 

Bruce Momjian wrote:
> 
> >
> > What is Node *subquery ?
> > In optimizer we need either in Subindex (to get subquery from Query->subqueries
> > when beeing in Sublink) or in Node *subquery inside Sublink itself.
> > BTW, after some thought I don't see how Query->subqueries will be usefull.
> > So, may be just add bool hassubqueries to Query (and Query *parentQuery)
> > and use Query *subquery in Sublink, but not subindex ?
> 
> OK, I originally created it because the parser would have trouble
> filling in a List* field in SelectStmt while it was parsing a WHERE
> clause.  I decided to just stick the SelectStmt* into Sublink->subquery.
> 
> While we are going through the parse output to fill in the Query*, I
> thought we should move the actual subquery parse output to a separate
> place, and once the Query* was completed, spin through the saved
> subquery parse list and stuff Query->subqueries with a list of Query*
> for the subqueries.  I thought this would be easier, because we would
> then have all the subqueries in a nice list that we can manage easier.
> 
> In fact, we can fill Query->subqueries with SelectStmt* as we process
> the WHERE clause, then convert them to Query* at the end.
> 
> If you would rather keep the subquery Query* entries in the Sublink
> structure, we can do that.  The only issue I see is that when you want
> to get to them, you have to wade through the WHERE clause to find them.
> For example, we will have to run the subquery Query* through the rewrite
> system.  Right now, for UNION, I have a nice union List* in Query, and I
> just spin through it in postgres.c for each Union query.  If we keep the
> subquery Query* inside Sublink, we have to have some logic to go through
> and find them.
> 
> If we just have an Index in Sublink to the Query->subqueries, we can use
> the nth() macro to find them quite easily.
> 
> But it is up to you.  I really don't know how you are going to handle
> things like:
> 
>         select *
>         from taba
>         where x = 3 and y = 5 and (z=6 or q in (select g from tabb ))

No problems.

> 
> My logic was to break the problem down to single queries as much as
> possible, so we would be breaking the problem up into pieces.  Whatever
> is easier for you.

Vadim

From owner-pgsql-hackers@hub.org Tue Jan 13 10:32:35 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA29523
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 10:32:33 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA03743; Tue, 13 Jan 1998 10:32:13 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Tue, 13 Jan 1998 10:31:57 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA03708 for pgsql-hackers-outgoing; Tue, 13 Jan 1998 10:31:51 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA03628 for <hackers@postgreSQL.org>; Tue, 13 Jan 1998 10:31:20 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id JAA28747;
	Tue, 13 Jan 1998 09:48:00 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801131448.JAA28747@candle.pha.pa.us>
Subject: Re: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Tue, 13 Jan 1998 09:48:00 -0500 (EST)
Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
In-Reply-To: <34BB7829.2B18D4B5@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 13, 98 09:20:25 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Ok. I don't see how Query->subqueries could me help, but I foresee
> that Query->sublinks can do it. Could you add this ? 

OK, so instead of moving the query out of the SubLink structure, you
want the Query* in the Sublink structure, and a List* of SubLink
pointers in the query structure?

	Query
	{
		...
		List *sublink;  /* list of pointers to Sublinks
		...
	}

I can do that.  Let me know.
-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Tue Jan 13 22:23:46 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id WAA08806
	for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:23:45 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id WAA11486 for <maillist@candle.pha.pa.us>; Tue, 13 Jan 1998 22:09:55 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id KAA05660;
	Wed, 14 Jan 1998 10:09:07 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34BC2C4E.83E92D82@sable.krasnoyarsk.su>
Date: Wed, 14 Jan 1998 10:09:02 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
Subject: Re: [HACKERS] Re: subselects
References: <199801131448.JAA28747@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > Ok. I don't see how Query->subqueries could me help, but I foresee
> > that Query->sublinks can do it. Could you add this ?
> 
> OK, so instead of moving the query out of the SubLink structure, you
> want the Query* in the Sublink structure, and a List* of SubLink
> pointers in the query structure?

Yes.

> 
>         Query
>         {
>                 ...
>                 List *sublink;  /* list of pointers to Sublinks
>                 ...
>         }
> 
> I can do that.  Let me know.

Thanks!

Are there any opened issues ?

Vadim

From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:40 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21676
	for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:39 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23948 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:59 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27814; Thu, 15 Jan 1998 18:32:40 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:32:20 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27668 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:32:08 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27425 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:32 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id SAA12920;
	Thu, 15 Jan 1998 18:18:32 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801152318.SAA12920@candle.pha.pa.us>
Subject: Re: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Thu, 15 Jan 1998 18:18:31 -0500 (EST)
Cc: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
In-Reply-To: <34BC2C4E.83E92D82@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 14, 98 10:09:02 am
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> 
> Bruce Momjian wrote:
> > 
> > >
> > > Ok. I don't see how Query->subqueries could me help, but I foresee
> > > that Query->sublinks can do it. Could you add this ?
> > 
> > OK, so instead of moving the query out of the SubLink structure, you
> > want the Query* in the Sublink structure, and a List* of SubLink
> > pointers in the query structure?
> 
> Yes.
> 
> > 
> >         Query
> >         {
> >                 ...
> >                 List *sublink;  /* list of pointers to Sublinks
> >                 ...
> >         }
> > 
> > I can do that.  Let me know.
> 
> Thanks!
> 
> Are there any opened issues ?

OK, what do you need me to do.  Do you want me to create the Sublink
support stuff, fill them in in the parser, and pass them through the
rewrite section and into the optimizer.  I will prepare a list of
changes.


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Thu Jan 15 19:00:38 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id TAA21663
	for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 19:00:36 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id SAA23925 for <maillist@candle.pha.pa.us>; Thu, 15 Jan 1998 18:35:42 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA27796; Thu, 15 Jan 1998 18:32:37 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Thu, 15 Jan 1998 18:31:52 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27463 for pgsql-hackers-outgoing; Thu, 15 Jan 1998 18:31:37 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27167 for <hackers@postgreSQL.org>; Thu, 15 Jan 1998 18:31:06 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id SAA26747;
	Thu, 15 Jan 1998 18:26:42 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801152326.SAA26747@candle.pha.pa.us>
Subject: Re: [HACKERS] Re: subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Thu, 15 Jan 1998 18:26:41 -0500 (EST)
Cc: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
In-Reply-To: <34B9E3B5.CF9AC8E3@sable.krasnoyarsk.su> from "Vadim B. Mikheev" at Jan 12, 98 04:34:45 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> typedef struct SubLink {
> 	NodeTag		type;
> 	int		linkType; /* EXISTS, ALL, ANY, EXPR */
> 	bool		useor;    /* TRUE for <> */
> 	List	        *lefthand; /* List of Var/Const nodes on the left */
> 	List	        *oper;     /* List of Oper nodes */
> 	Query	        *subquery; /* */
> } SubLink;

OK, we add this structure above.  During parsing, *subquery actually
will hold Node *parsetree, not Query *.

And add to Query:

	bool	hasSubLinks;

Also need a function to return a List* of SubLink*.  I just did a
similar thing with Aggreg*.  And Var gets:

	int uplevels;

Is that it?


-- 
Bruce Momjian
maillist@candle.pha.pa.us


From owner-pgsql-hackers@hub.org Fri Jan 16 04:36:05 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09604
	for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:03 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id EAA07040; Fri, 16 Jan 1998 04:35:27 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Fri, 16 Jan 1998 04:35:18 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id EAA06936 for pgsql-hackers-outgoing; Fri, 16 Jan 1998 04:35:13 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id EAA06823 for <hackers@postgreSQL.org>; Fri, 16 Jan 1998 04:34:22 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10384;
	Fri, 16 Jan 1998 16:34:15 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34BF2997.97B40172@sable.krasnoyarsk.su>
Date: Fri, 16 Jan 1998 16:34:15 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: lockhart@alumni.caltech.edu, hackers@postgreSQL.org
Subject: Re: [HACKERS] Re: subselects
References: <199801152326.SAA26747@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> > typedef struct SubLink {
> >       NodeTag         type;
> >       int             linkType; /* EXISTS, ALL, ANY, EXPR */
> >       bool            useor;    /* TRUE for <> */
> >       List            *lefthand; /* List of Var/Const nodes on the left */
> >       List            *oper;     /* List of Oper nodes */
> >       Query           *subquery; /* */
> > } SubLink;
> 
> OK, we add this structure above.  During parsing, *subquery actually
> will hold Node *parsetree, not Query *.
            ^^^^^^^^^^^^^^^
But optimizer will get node Query here, yes ?

> 
> And add to Query:
> 
>         bool    hasSubLinks;
> 
> Also need a function to return a List* of SubLink*.  I just did a
> similar thing with Aggreg*.  And Var gets:
> 
>         int uplevels;
> 
> Is that it?

Yes.

Vadim


From vadim@sable.krasnoyarsk.su Fri Jan 16 04:36:21 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA09607
	for <maillist@candle.pha.pa.us>; Fri, 16 Jan 1998 04:36:06 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id QAA10396;
	Fri, 16 Jan 1998 16:37:21 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34BF2A50.A357A16D@sable.krasnoyarsk.su>
Date: Fri, 16 Jan 1998 16:37:20 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: hackers@postgreSQL.org, lockhart@alumni.caltech.edu
Subject: Re: [HACKERS] Re: subselects
References: <199801152318.SAA12920@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > Are there any opened issues ?
> 
> OK, what do you need me to do.  Do you want me to create the Sublink
> support stuff, fill them in in the parser, and pass them through the
> rewrite section and into the optimizer.  I will prepare a list of
> changes.

Please do this. I'm ready to start coding of things in optimizer.

Vadim

From vadim@sable.krasnoyarsk.su Sun Jan 18 07:32:52 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id HAA14786
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:32:51 -0500 (EST)
Received: from www.krasnet.ru ([193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id HAA29385 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 07:25:55 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
	Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
Date: Sun, 18 Jan 1998 19:27:09 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects coding started
References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Thomas G. Lockhart wrote:
> 
> Bruce Momjian wrote:
> 
> > OK, I have created the SubLink structure with supporting routines, and
> > have added code to create the SubLink structures in the parser, and have
> > added Query->hasSubLink.
> >
> > I changed gram.y to support:
> >
> >         (x,y,z) OP (subselect)
> >
> > where OP is any operator.  Is that right, or are we doing only certain
> > ones, and of so, do we limit it in the parser?
> 
> Seems like we would want to pass most operators and expressions through
> gram.y, and then call elog() in either the transformation or in the
> optimizer if it is an operator which can't be supported.

Not in optimizer, in parser, please.
Remember that for <> SubLink->useor must be TRUE and this is parser work
(optimizer don't know about "=", "<>", etc but only about Oper nodes).

IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.

Vadim

From owner-pgsql-hackers@hub.org Sun Jan 18 21:08:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id VAA00825
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 21:08:57 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id TAA25254 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 19:18:24 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id TAA06912; Sun, 18 Jan 1998 19:17:01 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 18 Jan 1998 19:11:05 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id TAA06322 for pgsql-hackers-outgoing; Sun, 18 Jan 1998 19:11:01 -0500 (EST)
Received: from clio.trends.ca (root@clio.trends.ca [209.47.148.2]) by hub.org (8.8.8/8.7.5) with ESMTP id TAA06144 for <hackers@postgresql.org>; Sun, 18 Jan 1998 19:10:31 -0500 (EST)
Received: from www.krasnet.ru ([193.125.44.86])
	by clio.trends.ca (8.8.8/8.8.8) with ESMTP id HAA12383
	for <hackers@postgreSQL.org>; Sun, 18 Jan 1998 07:28:38 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id TAA15780;
	Sun, 18 Jan 1998 19:27:14 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34C1F51D.E9CF0A39@sable.krasnoyarsk.su>
Date: Sun, 18 Jan 1998 19:27:09 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: "Thomas G. Lockhart" <lockhart@alumni.caltech.edu>
CC: Bruce Momjian <maillist@candle.pha.pa.us>,
        PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: [HACKERS] subselects coding started
References: <199801170500.AAA12837@candle.pha.pa.us> <34C044D5.C21FE707@alumni.caltech.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Thomas G. Lockhart wrote:
> 
> Bruce Momjian wrote:
> 
> > OK, I have created the SubLink structure with supporting routines, and
> > have added code to create the SubLink structures in the parser, and have
> > added Query->hasSubLink.
> >
> > I changed gram.y to support:
> >
> >         (x,y,z) OP (subselect)
> >
> > where OP is any operator.  Is that right, or are we doing only certain
> > ones, and of so, do we limit it in the parser?
> 
> Seems like we would want to pass most operators and expressions through
> gram.y, and then call elog() in either the transformation or in the
> optimizer if it is an operator which can't be supported.

Not in optimizer, in parser, please.
Remember that for <> SubLink->useor must be TRUE and this is parser work
(optimizer don't know about "=", "<>", etc but only about Oper nodes).

IN ("=" ANY) and NOT IN ("<>" ALL) transformations are also parser work.

Vadim


From vadim@sable.krasnoyarsk.su Sun Jan 18 23:59:08 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id XAA10497
	for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:59:07 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id XAA06941 for <maillist@candle.pha.pa.us>; Sun, 18 Jan 1998 23:44:32 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id LAA16745
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 11:46:28 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34C2DAA3.78E54042@sable.krasnoyarsk.su>
Date: Mon, 19 Jan 1998 11:46:27 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: SubLink->oper
References: <199801190419.XAA04367@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> In SubLink->oper, do you want the oid of the pg_operator, or the oid of
> the pg_proc assigned to the operator?
> 
> Currently, I am giving you the oid of pg_operator.

No! I need in Oper nodes here. For "normal" operators parser
returns Expr node with opType = OP_EXPR and corresponding Oper
in Node *oper. Near the same for SubLink: I need in Oper node
for each pair of Var/Const from the left side and target entry from
the subquery.

Vadim

From owner-pgsql-hackers@hub.org Mon Jan 19 01:02:23 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24036
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:02:21 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA13913; Mon, 19 Jan 1998 01:02:16 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:01:41 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA13824 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:01:34 -0500 (EST)
Received: from candle.pha.pa.us (root@s5-03.ppp.op.net [209.152.195.67]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA13699 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:00:59 -0500 (EST)
Received: (from maillist@localhost)
	by candle.pha.pa.us (8.8.5/8.8.5) id AAA23866;
	Mon, 19 Jan 1998 00:54:49 -0500 (EST)
From: Bruce Momjian <maillist@candle.pha.pa.us>
Message-Id: <199801190554.AAA23866@candle.pha.pa.us>
Subject: [HACKERS] subselects
To: vadim@sable.krasnoyarsk.su (Vadim B. Mikheev)
Date: Mon, 19 Jan 1998 00:54:49 -0500 (EST)
Cc: hackers@postgreSQL.org (PostgreSQL-development)
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR


OK, I have added code to allow the SubLinks make it to the optimizer.

I implemented ParseState->parentParseState, but not parentQuery, because
the parentParseState is much more valuable to me, and Vadim thought it
might be useful, but was not positive.  Also, keeping that parentQuery
pointer valid through rewrite may be difficult, so I dropped it. 
ParseState is only valid in the parser.

I have not done:

	correlated subquery column references
	added Var->sublevels_up
	gotten this to work in the rewrite system
	have not added full CopyNode support

I will address these in the next few days.

-- 
Bruce Momjian
maillist@candle.pha.pa.us


From vadim@sable.krasnoyarsk.su Mon Jan 19 01:32:54 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24335
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:32:52 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id BAA10610 for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:23:02 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16879
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 13:25:28 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34C2F1D2.9CD191CC@sable.krasnoyarsk.su>
Date: Mon, 19 Jan 1998 13:25:22 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
Subject: Re: SubLink->oper
References: <199801190500.AAA10576@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> >
> > Bruce Momjian wrote:
> > >
> > > In SubLink->oper, do you want the oid of the pg_operator, or the oid of
> > > the pg_proc assigned to the operator?
> > >
> > > Currently, I am giving you the oid of pg_operator.
> >
> > No! I need in Oper nodes here. For "normal" operators parser
> > returns Expr node with opType = OP_EXPR and corresponding Oper
> > in Node *oper. Near the same for SubLink: I need in Oper node
> > for each pair of Var/Const from the left side and target entry from
> > the subquery.
> >
> > Vadim
> >
> 
> OK, can I give you an Oper* for each field.

Nice! But what's this:

typedef struct SubLink
{
struct Query;
^^^^^^^^^^^^^
    NodeTag     type;

Vadim

From vadim@sable.krasnoyarsk.su Mon Jan 19 01:34:39 1998
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24346
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:34:33 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
	Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Sender: root@www.krasnet.ru
Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
Date: Mon, 19 Jan 1998 13:37:41 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: Re: subselects
References: <199801190554.AAA23866@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: OR

Bruce Momjian wrote:
> 
> OK, I have added code to allow the SubLinks make it to the optimizer.
> 
> I implemented ParseState->parentParseState, but not parentQuery, because
> the parentParseState is much more valuable to me, and Vadim thought it
> might be useful, but was not positive.  Also, keeping that parentQuery
> pointer valid through rewrite may be difficult, so I dropped it.
> ParseState is only valid in the parser.
> 
> I have not done:
> 
>         correlated subquery column references
>         added Var->sublevels_up
>         gotten this to work in the rewrite system
>         have not added full CopyNode support
> 
> I will address these in the next few days.

Nice! I'm starting with non-correlated subqueries...

Vadim

From owner-pgsql-hackers@hub.org Mon Jan 19 01:35:50 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id BAA24362
	for <maillist@candle.pha.pa.us>; Mon, 19 Jan 1998 01:35:48 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id BAA17531; Mon, 19 Jan 1998 01:35:39 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Mon, 19 Jan 1998 01:35:33 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id BAA17460 for pgsql-hackers-outgoing; Mon, 19 Jan 1998 01:35:28 -0500 (EST)
Received: from www.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id BAA17323 for <hackers@postgreSQL.org>; Mon, 19 Jan 1998 01:35:03 -0500 (EST)
Received: from sable.krasnoyarsk.su (www.krasnet.ru [193.125.44.86])
	by www.krasnet.ru (8.8.7/8.8.7) with ESMTP id NAA16904;
	Mon, 19 Jan 1998 13:37:42 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34C2F4B4.7BBA1DB2@sable.krasnoyarsk.su>
Date: Mon, 19 Jan 1998 13:37:41 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: [HACKERS] Re: subselects
References: <199801190554.AAA23866@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> OK, I have added code to allow the SubLinks make it to the optimizer.
> 
> I implemented ParseState->parentParseState, but not parentQuery, because
> the parentParseState is much more valuable to me, and Vadim thought it
> might be useful, but was not positive.  Also, keeping that parentQuery
> pointer valid through rewrite may be difficult, so I dropped it.
> ParseState is only valid in the parser.
> 
> I have not done:
> 
>         correlated subquery column references
>         added Var->sublevels_up
>         gotten this to work in the rewrite system
>         have not added full CopyNode support
> 
> I will address these in the next few days.

Nice! I'm starting with non-correlated subqueries...

Vadim


From owner-pgsql-hackers@hub.org Wed Jan 21 04:00:59 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id EAA14981
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 04:00:56 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id DAA02432 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 03:46:22 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id DAA12583; Wed, 21 Jan 1998 03:45:43 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 03:44:07 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id DAA12288 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 03:44:02 -0500 (EST)
Received: from gandalf.sd.spardat.at (gandalf.telecom.at [194.118.26.84]) by hub.org (8.8.8/8.7.5) with ESMTP id DAA12263 for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 03:43:18 -0500 (EST)
Received: from sdgtw.sd.spardat.at (sdgtw.sd.spardat.at [172.18.99.31])
	by gandalf.sd.spardat.at (8.8.8/8.8.8) with ESMTP id JAA38408
	for <pgsql-hackers@hub.org>; Wed, 21 Jan 1998 09:42:55 +0100
Received: by sdgtw.sd.spardat.at with Internet Mail Service (5.0.1458.49)
	id <DAF4ZATD>; Wed, 21 Jan 1998 09:42:55 +0100
Message-ID: <219F68D65015D011A8E000006F8590C6010A51A2@sdexcsrv1.sd.spardat.at>
From: Zeugswetter Andreas DBT <Andreas.Zeugswetter@telecom.at>
To: "'pgsql-hackers@hub.org'" <pgsql-hackers@hub.org>
Subject: [HACKERS] Re: subselects
Date: Wed, 21 Jan 1998 09:42:52 +0100
X-Priority: 3
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.0.1458.49)
Content-Type: text/plain
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce wrote:
> I have completed adding Var.varlevelsup, and have added code to the
> parser to properly set the field.  It will allow correlated references
> in the WHERE clause, but not in the target list.

select i2.ip1, i1.ip4 from nameip i1 where ip1 = (select ip1 from nameip
i2);
   522: Table (i2) not selected in query.
select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2);
   284: A subquery has returned not exactly one row.
select i1.ip4 from nameip i1 where ip1 = (select i1.ip1 from nameip i2
where name='zeus');
 2 row(s) retrieved.

Informix allows correlated references in the target list. It also allows
subselects in the target list as in:
select i1.ip4, (select i1.ip1 from nameip i2) from nameip i1;
   284: A subquery has returned not exactly one row.
select i1.ip4, (select i1.ip1 from nameip i2 where name='zeus') from
nameip i1;
 2 row(s) retrieved.

Is this what you were looking for ?

Andreas


From owner-pgsql-hackers@hub.org Wed Jan 21 05:31:02 1998
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id FAA15884
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:31:01 -0500 (EST)
Received: from hub.org (hub.org [209.47.148.200]) by renoir.op.net (o1/$ Revision: 1.14 $) with ESMTP id FAA04709 for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 05:16:16 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id FAA05191; Wed, 21 Jan 1998 05:15:42 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 05:14:02 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id FAA04951 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 05:13:57 -0500 (EST)
Received: from dune.krasnet.ru (www.krasnet.ru [193.125.44.86]) by hub.org (8.8.8/8.7.5) with ESMTP id FAA04610 for <hackers@postgreSQL.org>; Wed, 21 Jan 1998 05:12:18 -0500 (EST)
Received: from sable.krasnoyarsk.su (dune.krasnet.ru [193.125.44.86])
	by dune.krasnet.ru (8.8.7/8.8.7) with ESMTP id RAA01918;
	Wed, 21 Jan 1998 17:10:24 +0700 (KRS)
	(envelope-from vadim@sable.krasnoyarsk.su)
Message-ID: <34C5C98E.3E085F52@sable.krasnoyarsk.su>
Date: Wed, 21 Jan 1998 17:10:22 +0700
From: "Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>
Organization: ITTS (Krasnoyarsk)
X-Mailer: Mozilla 4.04 [en] (X11; I; FreeBSD 2.2.5-RELEASE i386)
MIME-Version: 1.0
To: Bruce Momjian <maillist@candle.pha.pa.us>
CC: PostgreSQL-development <hackers@postgreSQL.org>
Subject: [HACKERS] Re: subselects
References: <199801210324.WAA02161@candle.pha.pa.us>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

Bruce Momjian wrote:
> 
> We are only going to have subselects in the WHERE clause, not in the
> target list, right?
> 
> The standard says we can have them either place, but I didn't think we
> were implementing the target list subselects.
> 
> Is that correct?

Yes, this is right for 6.3. I hope that we'll support subselects in 
target list, FROM, etc in future.

BTW, I'm going to implement subselect in (let's say) "natural" way -
without substitution of parent query relations into subselect and so on,
but by execution of (correlated) subqueries for each upper query row
(may be with cacheing of results in hash table for better performance).
Sure, this is much more clean way and much more clear how to do this.
This seems like SQL-func way, but funcs start/run/stop Executor each time
when called and this breaks performance. 

Vadim


From owner-pgsql-hackers@hub.org Wed Jan 21 10:02:02 1998
Received: from hub.org (hub.org [209.47.148.200])
	by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id KAA20456
	for <maillist@candle.pha.pa.us>; Wed, 21 Jan 1998 10:02:01 -0500 (EST)
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id KAA06778; Wed, 21 Jan 1998 10:02:13 -0500 (EST)
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Wed, 21 Jan 1998 10:00:41 -0500 (EST)
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id KAA06544 for pgsql-hackers-outgoing; Wed, 21 Jan 1998 10:00:37 -0500 (EST)
Received: from u1.abs.net (root@u1.abs.net [207.114.0.131]) by hub.org (8.8.8/8.7.5) with ESMTP id KAA06326 for <pgsql-hackers@postgresql.org>; Wed, 21 Jan 1998 10:00:03 -0500 (EST)
Received: from insightdist.com (nobody@localhost)
	by u1.abs.net (8.8.5/8.8.5) with UUCP id JAA08009
	for pgsql-hackers@postgresql.org; Wed, 21 Jan 1998 09:40:29 -0500 (EST)
X-Authentication-Warning: u1.abs.net: nobody set sender to insightdist.com!darrenk using -f
Received: by insightdist.com (AIX 3.2/UCB 5.64/4.03)
          id AA33174; Wed, 21 Jan 1998 09:26:09 -0500
Received: by ceodev (AIX 4.1/UCB 5.64/4.03)
          id AA36452; Wed, 21 Jan 1998 09:13:05 -0500
Date: Wed, 21 Jan 1998 09:13:05 -0500
From: darrenk@insightdist.com (Darren King)
Message-Id: <9801211413.AA36452@ceodev>
To: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] subselects
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Md5: 4wI6dUsUAXei+yg3JycjGw==
Sender: owner-pgsql-hackers@hub.org
Precedence: bulk
Status: OR

> We are only going to have subselects in the WHERE clause, not in the
> target list, right?
> 
> The standard says we can have them either place, but I didn't think we
> were implementing the target list subselects.
> 
> Is that correct?

What about the HAVING clause?  Currently not in, but someone here wants
to take a stab at it.

Doesn't seem that tough...loops over the tuples returned from the group
by node and checks the expression such as "x > 5" or "x = (subselect)".

The cost analysis in the optimizer could be tricky come to think of it.
If a subselect has a HAVING, would have to have a formula to determine
the selectiveness.  Hmmm...

darrenk