991 lines
38 KiB
Plaintext
991 lines
38 KiB
Plaintext
From pgsql-performance-owner+M17204@postgresql.org Wed Feb 15 16:28:34 2006
|
|
Return-path: <pgsql-performance-owner+M17204@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1FLSV527014
|
|
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 16:28:31 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id 168C967B584;
|
|
Wed, 15 Feb 2006 17:28:29 -0400 (AST)
|
|
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id BB0AB9DCB9E
|
|
for <pgsql-performance-postgresql.org@localhost.postgresql.org>; Wed, 15 Feb 2006 17:27:56 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 22055-07
|
|
for <pgsql-performance-postgresql.org@localhost.postgresql.org>;
|
|
Wed, 15 Feb 2006 17:27:57 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])
|
|
by postgresql.org (Postfix) with ESMTP id F385E9DCB98
|
|
for <pgsql-performance@postgresql.org>; Wed, 15 Feb 2006 17:27:53 -0400 (AST)
|
|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1FLRsqd019780;
|
|
Wed, 15 Feb 2006 16:27:54 -0500 (EST)
|
|
To: Gary Doades <gpd@gpdnet.co.uk>
|
|
cc: pgsql-performance@postgresql.org
|
|
Subject: Re: [PERFORM] Strange Create Index behaviour
|
|
In-Reply-To: <19510.1140036968@sss.pgh.pa.us>
|
|
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us>
|
|
Comments: In-reply-to Tom Lane <tgl@sss.pgh.pa.us>
|
|
message dated "Wed, 15 Feb 2006 15:56:08 -0500"
|
|
Date: Wed, 15 Feb 2006 16:27:54 -0500
|
|
Message-ID: <19779.1140038874@sss.pgh.pa.us>
|
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
|
|
X-Spam-Score: 0.11
|
|
X-Mailing-List: pgsql-performance
|
|
List-Archive: <http://archives.postgresql.org/pgsql-performance>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-performance.postgresql.org>
|
|
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-performance@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
|
|
Precedence: bulk
|
|
Sender: pgsql-performance-owner@postgresql.org
|
|
Status: ORr
|
|
|
|
I wrote:
|
|
> Interesting. I tried your test script and got fairly close times
|
|
> for all the cases on two different machines:
|
|
> old HPUX machine: shortest 5800 msec, longest 7960 msec
|
|
> new Fedora 4 machine: shortest 461 msec, longest 608 msec
|
|
|
|
> So what this looks like to me is a corner case that FreeBSD's qsort
|
|
> fails to handle well.
|
|
|
|
I tried forcing PG to use src/port/qsort.c on the Fedora machine,
|
|
and lo and behold:
|
|
new Fedora 4 machine: shortest 434 msec, longest 8530 msec
|
|
|
|
So it sure looks like this script does expose a problem on BSD-derived
|
|
qsorts. Curiously, the case that's much the worst for me is the third
|
|
in the script, while the shortest time is the first case, which was slow
|
|
for Gary. So I'd venture that the *BSD code has been tweaked somewhere
|
|
along the way, in a manner that moves the problem around without really
|
|
fixing it. (Anyone want to compare the actual FreeBSD source to what
|
|
we have?)
|
|
|
|
This is pretty relevant stuff, because there was a thread recently
|
|
advocating that we stop using the platform qsort on all platforms:
|
|
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php
|
|
|
|
It's really interesting to see a case where port/qsort is radically
|
|
worse than other qsorts ... unless we figure that out and fix it,
|
|
I think the idea of using port/qsort everywhere has just taken a
|
|
major hit.
|
|
|
|
regards, tom lane
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 5: don't forget to increase your free space map settings
|
|
|
|
From pgsql-performance-owner+M17212@postgresql.org Wed Feb 15 18:29:07 2006
|
|
Return-path: <pgsql-performance-owner+M17212@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1FNT6509074
|
|
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 18:29:06 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id 2BE6267B58B;
|
|
Wed, 15 Feb 2006 19:29:04 -0400 (AST)
|
|
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id 7C3D49DC803;
|
|
Wed, 15 Feb 2006 19:28:30 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 47149-10; Wed, 15 Feb 2006 19:28:32 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])
|
|
by postgresql.org (Postfix) with ESMTP id C56AD9DC843;
|
|
Wed, 15 Feb 2006 19:28:27 -0400 (AST)
|
|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1FNSTkm020782;
|
|
Wed, 15 Feb 2006 18:28:29 -0500 (EST)
|
|
To: Gary Doades <gpd@gpdnet.co.uk>
|
|
cc: pgsql-performance@postgresql.org, pgsql-hackers@postgresql.org
|
|
Subject: qsort again (was Re: [PERFORM] Strange Create Index behaviour)
|
|
In-Reply-To: <43F39E53.1020009@gpdnet.co.uk>
|
|
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us> <19779.1140038874@sss.pgh.pa.us> <43F39E53.1020009@gpdnet.co.uk>
|
|
Comments: In-reply-to Gary Doades <gpd@gpdnet.co.uk>
|
|
message dated "Wed, 15 Feb 2006 21:34:11 +0000"
|
|
Date: Wed, 15 Feb 2006 18:28:29 -0500
|
|
Message-ID: <20781.1140046109@sss.pgh.pa.us>
|
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
|
|
X-Spam-Score: 0.11
|
|
X-Mailing-List: pgsql-performance
|
|
List-Archive: <http://archives.postgresql.org/pgsql-performance>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-performance.postgresql.org>
|
|
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-performance@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
|
|
Precedence: bulk
|
|
Sender: pgsql-performance-owner@postgresql.org
|
|
Status: OR
|
|
|
|
Gary Doades <gpd@gpdnet.co.uk> writes:
|
|
> If I run the script again, it is not always the first case that is slow,
|
|
> it varies from run to run, which is why I repeated it quite a few times
|
|
> for the test.
|
|
|
|
For some reason I hadn't immediately twigged to the fact that your test
|
|
script is just N repetitions of the exact same structure with random data.
|
|
So it's not so surprising that you get random variations in behavior
|
|
with different test data sets.
|
|
|
|
I did some experimentation comparing the qsort from Fedora Core 4
|
|
(glibc-2.3.5-10.3) with our src/port/qsort.c. For those who weren't
|
|
following the pgsql-performance thread, the test case is just this
|
|
repeated a lot of times:
|
|
|
|
create table atest(i int4, r int4);
|
|
insert into atest (i,r) select generate_series(1,100000), 0;
|
|
insert into atest (i,r) select generate_series(1,100000), random()*100000;
|
|
\timing
|
|
create index idx on atest(r);
|
|
\timing
|
|
drop table atest;
|
|
|
|
I did this 100 times and sorted the reported runtimes. (Investigation
|
|
with trace_sort = on confirms that the runtime is almost entirely spent
|
|
in qsort() called from our performsort --- the Postgres overhead is
|
|
about 100msec on this machine.) Results are below.
|
|
|
|
It seems clear that our qsort.c is doing a pretty awful job of picking
|
|
qsort pivots, while glibc is mostly managing not to make that mistake.
|
|
I haven't looked at the glibc code yet to see what they are doing
|
|
differently.
|
|
|
|
I'd say this puts a considerable damper on my enthusiasm for using our
|
|
qsort all the time, as was recently debated in this thread:
|
|
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00610.php
|
|
We need to fix our qsort.c before pushing ahead with that idea.
|
|
|
|
regards, tom lane
|
|
|
|
|
|
100 runtimes for glibc qsort, sorted ascending:
|
|
|
|
Time: 459.860 ms
|
|
Time: 460.209 ms
|
|
Time: 460.704 ms
|
|
Time: 461.317 ms
|
|
Time: 461.538 ms
|
|
Time: 461.652 ms
|
|
Time: 461.988 ms
|
|
Time: 462.573 ms
|
|
Time: 462.638 ms
|
|
Time: 462.716 ms
|
|
Time: 462.917 ms
|
|
Time: 463.219 ms
|
|
Time: 463.455 ms
|
|
Time: 463.650 ms
|
|
Time: 463.723 ms
|
|
Time: 463.737 ms
|
|
Time: 463.750 ms
|
|
Time: 463.852 ms
|
|
Time: 463.964 ms
|
|
Time: 463.988 ms
|
|
Time: 464.003 ms
|
|
Time: 464.135 ms
|
|
Time: 464.372 ms
|
|
Time: 464.458 ms
|
|
Time: 464.496 ms
|
|
Time: 464.551 ms
|
|
Time: 464.599 ms
|
|
Time: 464.655 ms
|
|
Time: 464.656 ms
|
|
Time: 464.722 ms
|
|
Time: 464.814 ms
|
|
Time: 464.827 ms
|
|
Time: 464.878 ms
|
|
Time: 464.899 ms
|
|
Time: 464.905 ms
|
|
Time: 464.987 ms
|
|
Time: 465.055 ms
|
|
Time: 465.138 ms
|
|
Time: 465.159 ms
|
|
Time: 465.194 ms
|
|
Time: 465.310 ms
|
|
Time: 465.316 ms
|
|
Time: 465.375 ms
|
|
Time: 465.450 ms
|
|
Time: 465.535 ms
|
|
Time: 465.595 ms
|
|
Time: 465.680 ms
|
|
Time: 465.769 ms
|
|
Time: 465.865 ms
|
|
Time: 465.892 ms
|
|
Time: 465.903 ms
|
|
Time: 466.003 ms
|
|
Time: 466.154 ms
|
|
Time: 466.164 ms
|
|
Time: 466.203 ms
|
|
Time: 466.305 ms
|
|
Time: 466.344 ms
|
|
Time: 466.364 ms
|
|
Time: 466.388 ms
|
|
Time: 466.502 ms
|
|
Time: 466.593 ms
|
|
Time: 466.725 ms
|
|
Time: 466.794 ms
|
|
Time: 466.798 ms
|
|
Time: 466.904 ms
|
|
Time: 466.971 ms
|
|
Time: 466.997 ms
|
|
Time: 467.122 ms
|
|
Time: 467.146 ms
|
|
Time: 467.221 ms
|
|
Time: 467.224 ms
|
|
Time: 467.244 ms
|
|
Time: 467.277 ms
|
|
Time: 467.587 ms
|
|
Time: 468.142 ms
|
|
Time: 468.207 ms
|
|
Time: 468.237 ms
|
|
Time: 468.471 ms
|
|
Time: 468.663 ms
|
|
Time: 468.700 ms
|
|
Time: 469.235 ms
|
|
Time: 469.840 ms
|
|
Time: 470.472 ms
|
|
Time: 471.140 ms
|
|
Time: 472.811 ms
|
|
Time: 472.959 ms
|
|
Time: 474.858 ms
|
|
Time: 477.210 ms
|
|
Time: 479.571 ms
|
|
Time: 479.671 ms
|
|
Time: 482.797 ms
|
|
Time: 488.852 ms
|
|
Time: 514.639 ms
|
|
Time: 529.287 ms
|
|
Time: 612.185 ms
|
|
Time: 660.748 ms
|
|
Time: 742.227 ms
|
|
Time: 866.814 ms
|
|
Time: 1234.848 ms
|
|
Time: 1267.398 ms
|
|
|
|
|
|
100 runtimes for port/qsort.c, sorted ascending:
|
|
|
|
Time: 418.905 ms
|
|
Time: 420.611 ms
|
|
Time: 420.764 ms
|
|
Time: 420.904 ms
|
|
Time: 421.706 ms
|
|
Time: 422.466 ms
|
|
Time: 422.627 ms
|
|
Time: 423.189 ms
|
|
Time: 423.302 ms
|
|
Time: 425.096 ms
|
|
Time: 425.731 ms
|
|
Time: 425.851 ms
|
|
Time: 427.253 ms
|
|
Time: 430.113 ms
|
|
Time: 432.756 ms
|
|
Time: 432.963 ms
|
|
Time: 440.502 ms
|
|
Time: 440.640 ms
|
|
Time: 450.452 ms
|
|
Time: 458.143 ms
|
|
Time: 459.212 ms
|
|
Time: 467.706 ms
|
|
Time: 468.006 ms
|
|
Time: 468.574 ms
|
|
Time: 470.003 ms
|
|
Time: 472.313 ms
|
|
Time: 483.622 ms
|
|
Time: 492.395 ms
|
|
Time: 509.564 ms
|
|
Time: 531.037 ms
|
|
Time: 533.366 ms
|
|
Time: 535.610 ms
|
|
Time: 575.523 ms
|
|
Time: 582.688 ms
|
|
Time: 593.545 ms
|
|
Time: 647.364 ms
|
|
Time: 660.612 ms
|
|
Time: 677.312 ms
|
|
Time: 680.288 ms
|
|
Time: 697.626 ms
|
|
Time: 833.066 ms
|
|
Time: 834.511 ms
|
|
Time: 851.819 ms
|
|
Time: 920.443 ms
|
|
Time: 926.731 ms
|
|
Time: 954.289 ms
|
|
Time: 1045.214 ms
|
|
Time: 1059.200 ms
|
|
Time: 1062.328 ms
|
|
Time: 1136.018 ms
|
|
Time: 1260.091 ms
|
|
Time: 1276.883 ms
|
|
Time: 1319.351 ms
|
|
Time: 1438.854 ms
|
|
Time: 1475.457 ms
|
|
Time: 1538.211 ms
|
|
Time: 1549.004 ms
|
|
Time: 1744.642 ms
|
|
Time: 1771.258 ms
|
|
Time: 1959.530 ms
|
|
Time: 2300.140 ms
|
|
Time: 2589.641 ms
|
|
Time: 2612.780 ms
|
|
Time: 3100.024 ms
|
|
Time: 3284.125 ms
|
|
Time: 3379.792 ms
|
|
Time: 3750.278 ms
|
|
Time: 4302.278 ms
|
|
Time: 4780.624 ms
|
|
Time: 5000.056 ms
|
|
Time: 5092.604 ms
|
|
Time: 5168.722 ms
|
|
Time: 5292.941 ms
|
|
Time: 5895.964 ms
|
|
Time: 7003.164 ms
|
|
Time: 7099.449 ms
|
|
Time: 7115.083 ms
|
|
Time: 7384.940 ms
|
|
Time: 8214.010 ms
|
|
Time: 8700.771 ms
|
|
Time: 9331.225 ms
|
|
Time: 10503.360 ms
|
|
Time: 12496.026 ms
|
|
Time: 12982.474 ms
|
|
Time: 15192.390 ms
|
|
Time: 15392.161 ms
|
|
Time: 15958.295 ms
|
|
Time: 18375.693 ms
|
|
Time: 18617.706 ms
|
|
Time: 18927.515 ms
|
|
Time: 19898.018 ms
|
|
Time: 20865.979 ms
|
|
Time: 21000.907 ms
|
|
Time: 21297.585 ms
|
|
Time: 21714.518 ms
|
|
Time: 25423.235 ms
|
|
Time: 27543.052 ms
|
|
Time: 28314.182 ms
|
|
Time: 29400.278 ms
|
|
Time: 34142.534 ms
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 1: if posting/reading through Usenet, please send an appropriate
|
|
subscribe-nomail command to majordomo@postgresql.org so that your
|
|
message can get through to the mailing list cleanly
|
|
|
|
From pgsql-hackers-owner+M79733@postgresql.org Wed Feb 15 20:22:07 2006
|
|
Return-path: <pgsql-hackers-owner+M79733@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1G1M6529533
|
|
for <pgman@candle.pha.pa.us>; Wed, 15 Feb 2006 20:22:06 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id E5C5467B58F;
|
|
Wed, 15 Feb 2006 21:22:03 -0400 (AST)
|
|
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id 3DAA69DCACE;
|
|
Wed, 15 Feb 2006 21:21:34 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 76351-01; Wed, 15 Feb 2006 21:21:36 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from sss.pgh.pa.us (sss.pgh.pa.us [66.207.139.130])
|
|
by postgresql.org (Postfix) with ESMTP id 2FBB59DCA3F;
|
|
Wed, 15 Feb 2006 21:21:31 -0400 (AST)
|
|
Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1])
|
|
by sss.pgh.pa.us (8.13.1/8.13.1) with ESMTP id k1G1LXXi021616;
|
|
Wed, 15 Feb 2006 20:21:33 -0500 (EST)
|
|
To: Ron <rjpeace@earthlink.net>
|
|
cc: pgsql-performance@postgresql.org, pgsql-hackers@postgresql.org
|
|
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
|
|
In-Reply-To: <7.0.1.0.2.20060215194635.03b55da0@earthlink.net>
|
|
References: <43F38867.6010701@gpdnet.co.uk> <19510.1140036968@sss.pgh.pa.us> <19779.1140038874@sss.pgh.pa.us> <43F39E53.1020009@gpdnet.co.uk> <20781.1140046109@sss.pgh.pa.us> <7.0.1.0.2.20060215194635.03b55da0@earthlink.net>
|
|
Comments: In-reply-to Ron <rjpeace@earthlink.net>
|
|
message dated "Wed, 15 Feb 2006 19:57:51 -0500"
|
|
Date: Wed, 15 Feb 2006 20:21:33 -0500
|
|
Message-ID: <21615.1140052893@sss.pgh.pa.us>
|
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.11 required=5 tests=[AWL=0.110]
|
|
X-Spam-Score: 0.11
|
|
X-Mailing-List: pgsql-hackers
|
|
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-hackers.postgresql.org>
|
|
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-hackers@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
Ron <rjpeace@earthlink.net> writes:
|
|
> How are we choosing our pivots?
|
|
|
|
See qsort.c: it looks like median of nine equally spaced inputs (ie,
|
|
the 1/8th points of the initial input array, plus the end points),
|
|
implemented as two rounds of median-of-three choices. With half of the
|
|
data inputs zero, it's not too improbable for two out of the three
|
|
samples to be zeroes in which case I think the med3 result will be zero
|
|
--- so choosing a pivot of zero is much more probable than one would
|
|
like, and doing so in many levels of recursion causes the problem.
|
|
|
|
I think. I'm not too sure if the code isn't just being sloppy about the
|
|
case where many data values are equal to the pivot --- there's a special
|
|
case there to switch to insertion sort, and maybe that's getting invoked
|
|
too soon. It'd be useful to get a line-level profile of the behavior of
|
|
this code in the slow cases...
|
|
|
|
regards, tom lane
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 3: Have you checked our extensive FAQ?
|
|
|
|
http://www.postgresql.org/docs/faq
|
|
|
|
From pgsql-performance-owner+M17282@postgresql.org Fri Feb 17 23:11:11 2006
|
|
Return-path: <pgsql-performance-owner+M17282@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id k1I4BA515503
|
|
for <pgman@candle.pha.pa.us>; Fri, 17 Feb 2006 23:11:10 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id 2825F67B5F5;
|
|
Sat, 18 Feb 2006 00:11:07 -0400 (AST)
|
|
X-Original-To: pgsql-performance-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id 7BB8A9DCC4F;
|
|
Wed, 15 Feb 2006 21:37:57 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 79365-02; Wed, 15 Feb 2006 21:38:00 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from postal.corporate.connx.com (postal.corporate.connx.com [65.212.159.187])
|
|
by postgresql.org (Postfix) with ESMTP id 33BEA9DCACE;
|
|
Wed, 15 Feb 2006 21:37:54 -0400 (AST)
|
|
X-MimeOLE: Produced By Microsoft Exchange V6.5
|
|
Content-class: urn:content-classes:message
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
|
|
Date: Wed, 15 Feb 2006 17:37:58 -0800
|
|
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D54C@postal.corporate.connx.com>
|
|
Thread-Topic: [HACKERS] qsort again (was Re: [PERFORM] Strange Create Index behaviour)
|
|
Thread-Index: AcYyl2fPgxfNXHIRRyOEN4ZGeHtA3wAAEaNQ
|
|
From: "Dann Corbit" <DCorbit@connx.com>
|
|
To: "Tom Lane" <tgl@sss.pgh.pa.us>, "Ron" <rjpeace@earthlink.net>
|
|
cc: <pgsql-performance@postgresql.org>, <pgsql-hackers@postgresql.org>
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.075 required=5 tests=[AWL=0.075]
|
|
X-Spam-Score: 0.075
|
|
X-Mailing-List: pgsql-performance
|
|
List-Archive: <http://archives.postgresql.org/pgsql-performance>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-performance.postgresql.org>
|
|
List-Owner: <mailto:pgsql-performance-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-performance@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-performance>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-performance>
|
|
Precedence: bulk
|
|
Sender: pgsql-performance-owner@postgresql.org
|
|
Content-Transfer-Encoding: 8bit
|
|
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id k1I4BA515503
|
|
Status: ORr
|
|
|
|
|
|
|
|
> -----Original Message-----
|
|
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
|
|
> owner@postgresql.org] On Behalf Of Tom Lane
|
|
> Sent: Wednesday, February 15, 2006 5:22 PM
|
|
> To: Ron
|
|
> Cc: pgsql-performance@postgresql.org; pgsql-hackers@postgresql.org
|
|
> Subject: Re: [HACKERS] qsort again (was Re: [PERFORM] Strange Create
|
|
Index
|
|
> behaviour)
|
|
>
|
|
> Ron <rjpeace@earthlink.net> writes:
|
|
> > How are we choosing our pivots?
|
|
>
|
|
> See qsort.c: it looks like median of nine equally spaced inputs (ie,
|
|
> the 1/8th points of the initial input array, plus the end points),
|
|
> implemented as two rounds of median-of-three choices. With half of
|
|
the
|
|
> data inputs zero, it's not too improbable for two out of the three
|
|
> samples to be zeroes in which case I think the med3 result will be
|
|
zero
|
|
> --- so choosing a pivot of zero is much more probable than one would
|
|
> like, and doing so in many levels of recursion causes the problem.
|
|
|
|
Adding some randomness to the selection of the pivot is a known
|
|
technique to fix the oddball partitions problem. However, Bentley and
|
|
Sedgewick proved that every quick sort algorithm has some input set that
|
|
makes it go quadratic (hence the recent popularity of introspective
|
|
sort, which switches to heapsort if quadratic behavior is detected. The
|
|
C++ template I submitted was an example of introspective sort, but
|
|
PostgreSQL does not use C++ so it was not helpful).
|
|
|
|
> I think. I'm not too sure if the code isn't just being sloppy about
|
|
the
|
|
> case where many data values are equal to the pivot --- there's a
|
|
special
|
|
> case there to switch to insertion sort, and maybe that's getting
|
|
invoked
|
|
> too soon.
|
|
|
|
Here are some cases known to make qsort go quadratic:
|
|
1. Data already sorted
|
|
2. Data reverse sorted
|
|
3. Data organ-pipe sorted or ramp
|
|
4. Almost all data of the same value
|
|
|
|
There are probably other cases. Randomizing the pivot helps some, as
|
|
does check for in-order or reverse order partitions.
|
|
|
|
Imagine if 1/3 of the partitions fall into a category that causes
|
|
quadratic behavior (have one of the above formats and have more than
|
|
CUTOFF elements in them).
|
|
|
|
It is doubtful that the switch to insertion sort is causing any sort of
|
|
problems. It is only going to be invoked on tiny sets, for which it has
|
|
a fixed cost that is probably less that qsort() function calls on sets
|
|
of the same size.
|
|
|
|
>It'd be useful to get a line-level profile of the behavior of
|
|
> this code in the slow cases...
|
|
|
|
I guess that my in-order or presorted tests [which often arise when
|
|
there are very few distinct values] may solve the bad partition
|
|
problems. Don't forget that the algorithm is called recursively.
|
|
|
|
> regards, tom lane
|
|
>
|
|
> ---------------------------(end of
|
|
broadcast)---------------------------
|
|
> TIP 3: Have you checked our extensive FAQ?
|
|
>
|
|
> http://www.postgresql.org/docs/faq
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 2: Don't 'kill -9' the postmaster
|
|
|
|
From kleptog@svana.org Mon Dec 19 06:37:51 2005
|
|
Return-path: <kleptog@svana.org>
|
|
Received: from svana.org (mail@svana.org [203.20.62.76])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBJBboe20936
|
|
for <pgman@candle.pha.pa.us>; Mon, 19 Dec 2005 06:37:51 -0500 (EST)
|
|
Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
|
|
id 1EoJKc-00045V-00; Mon, 19 Dec 2005 22:37:30 +1100
|
|
Date: Mon, 19 Dec 2005 12:37:30 +0100
|
|
From: Martijn van Oosterhout <kleptog@svana.org>
|
|
To: Dann Corbit <DCorbit@connx.com>
|
|
cc: Tom Lane <tgl@sss.pgh.pa.us>, Qingqing Zhou <zhouqq@cs.toronto.edu>,
|
|
Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
|
|
pgsql-hackers@postgresql.org
|
|
Subject: Re: [HACKERS] Re: Which qsort is used
|
|
Message-ID: <20051219113724.GD12251@svana.org>
|
|
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
|
|
References: <D425483C2C5C9F49B5B7A41F8944154757D38D@postal.corporate.connx.com>
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/signed; micalg=pgp-sha1;
|
|
protocol="application/pgp-signature"; boundary="5gxpn/Q6ypwruk0T"
|
|
Content-Disposition: inline
|
|
In-Reply-To: <D425483C2C5C9F49B5B7A41F8944154757D38D@postal.corporate.connx.com>
|
|
User-Agent: Mutt/1.3.28i
|
|
X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
|
|
X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
|
|
X-PGP-Key-URL: <http://svana.org/kleptog/0DC67BE6.pgp.asc>
|
|
Status: OR
|
|
|
|
|
|
--5gxpn/Q6ypwruk0T
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Disposition: inline
|
|
Content-Transfer-Encoding: quoted-printable
|
|
|
|
On Fri, Dec 16, 2005 at 10:43:58PM -0800, Dann Corbit wrote:
|
|
> I am actually quite impressed with the excellence of Bentley's sort out
|
|
> of the box. It's definitely the best library implementation of a sort I
|
|
> have seen.
|
|
|
|
I'm not sure whether we have a conclusion here, but I do have one
|
|
question: is there a significant difference in the number of times the
|
|
comparison routines are called? Comparisons in PostgreSQL are fairly
|
|
expensive given the fmgr overhead and when comparing tuples it's even
|
|
worse.
|
|
|
|
We don't want to accedently pick a routine that saves data shuffling by
|
|
adding extra comparisons. The stats at [1] don't say. They try to
|
|
factor in CPU cost but they seem to use unrealistically small values. I
|
|
would think a number around 50 (or higher) would be more
|
|
representative.
|
|
|
|
[1] http://www.cs.toronto.edu/~zhouqq/postgresql/sort/sort.html
|
|
|
|
Have a nice day,
|
|
--=20
|
|
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
|
|
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
|
|
> tool for doing 5% of the work and then sitting around waiting for someone
|
|
> else to do the other 95% so you can sue them.
|
|
|
|
--5gxpn/Q6ypwruk0T
|
|
Content-Type: application/pgp-signature
|
|
Content-Disposition: inline
|
|
|
|
-----BEGIN PGP SIGNATURE-----
|
|
Version: GnuPG v1.0.6 (GNU/Linux)
|
|
Comment: For info see http://www.gnupg.org
|
|
|
|
iD8DBQFDpptzIB7bNG8LQkwRAmC6AJ4qYrIm3SYnBV3BybSmm+Gl4vpEywCfRDxg
|
|
bnIK4INRqOVFNBAKR/gDPcM=
|
|
=92qA
|
|
-----END PGP SIGNATURE-----
|
|
|
|
--5gxpn/Q6ypwruk0T--
|
|
|
|
From mkoi-pg@aon.at Wed Dec 21 19:44:03 2005
|
|
Return-path: <mkoi-pg@aon.at>
|
|
Received: from email.aon.at (warsl404pip5.highway.telekom.at [195.3.96.77])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM0i2e05649
|
|
for <pgman@candle.pha.pa.us>; Wed, 21 Dec 2005 19:44:02 -0500 (EST)
|
|
Received: (qmail 12703 invoked from network); 22 Dec 2005 00:43:51 -0000
|
|
Received: from m148p015.dipool.highway.telekom.at (HELO Sokrates) ([62.46.8.111])
|
|
(envelope-sender <mkoi-pg@aon.at>)
|
|
by smarthub78.highway.telekom.at (qmail-ldap-1.03) with SMTP
|
|
for <tgl@sss.pgh.pa.us>; 22 Dec 2005 00:43:51 -0000
|
|
From: Manfred Koizar <mkoi-pg@aon.at>
|
|
To: Tom Lane <tgl@sss.pgh.pa.us>
|
|
cc: "Dann Corbit" <DCorbit@connx.com>, "Qingqing Zhou" <zhouqq@cs.toronto.edu>,
|
|
"Bruce Momjian" <pgman@candle.pha.pa.us>,
|
|
"Luke Lonergan" <llonergan@greenplum.com>,
|
|
"Neil Conway" <neilc@samurai.com>, pgsql-hackers@postgresql.org
|
|
Subject: Re: [HACKERS] Re: Which qsort is used
|
|
Date: Thu, 22 Dec 2005 01:43:34 +0100
|
|
Message-ID: <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
|
|
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us>
|
|
In-Reply-To: <3148.1134795805@sss.pgh.pa.us>
|
|
X-Mailer: Forte Agent 3.1/32.783
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Transfer-Encoding: 7bit
|
|
Status: OR
|
|
|
|
On Sat, 17 Dec 2005 00:03:25 -0500, Tom Lane <tgl@sss.pgh.pa.us>
|
|
wrote:
|
|
>I've still got a problem with these checks; I think they are a net
|
|
>waste of cycles on average. [...]
|
|
> and when they fail, those cycles are entirely wasted;
|
|
>you have not advanced the state of the sort at all.
|
|
|
|
How can we make the initial check "adavance the state of the sort"?
|
|
One answer might be to exclude the sorted sequence at the start of the
|
|
array from the qsort, and merge the two sorted lists as the final
|
|
stage of the sort.
|
|
|
|
Qsorting N elements costs O(N*lnN), so excluding H elements from the
|
|
sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
|
|
plus some (<=50%) more memory, unless someone knows a fast in-place
|
|
merge. So depending on the constant factors involved there might be a
|
|
usable solution.
|
|
|
|
I've been playing with some numbers and assuming the constant factors
|
|
to be equal for all the O()'s this method starts to pay off at
|
|
H for N
|
|
20 100
|
|
130 1000
|
|
8000 100000
|
|
Servus
|
|
Manfred
|
|
|
|
From pgsql-hackers-owner+M77795=pgman=candle.pha.pa.us@postgresql.org Thu Dec 22 02:02:28 2005
|
|
Return-path: <pgsql-hackers-owner+M77795=pgman=candle.pha.pa.us@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBM72Re16910
|
|
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 02:02:28 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id A31E067AAA0
|
|
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 03:02:22 -0400 (AST)
|
|
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id 2C8EC9DCA92
|
|
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 22 Dec 2005 03:01:56 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 26033-04
|
|
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
|
|
Thu, 22 Dec 2005 03:01:55 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from svana.org (svana.org [203.20.62.76])
|
|
by postgresql.org (Postfix) with ESMTP id 800859DC81D
|
|
for <pgsql-hackers@postgresql.org>; Thu, 22 Dec 2005 03:01:51 -0400 (AST)
|
|
Received: from kleptog by svana.org with local (Exim 3.35 #1 (Debian))
|
|
id 1EpKRg-0005ox-00; Thu, 22 Dec 2005 18:01:00 +1100
|
|
Date: Thu, 22 Dec 2005 08:01:00 +0100
|
|
From: Martijn van Oosterhout <kleptog@svana.org>
|
|
To: Manfred Koizar <mkoi-pg@aon.at>
|
|
cc: Tom Lane <tgl@sss.pgh.pa.us>, Dann Corbit <DCorbit@connx.com>,
|
|
Qingqing Zhou <zhouqq@cs.toronto.edu>,
|
|
Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
|
|
pgsql-hackers@postgresql.org
|
|
Subject: Re: [HACKERS] Re: Which qsort is used
|
|
Message-ID: <20051222070057.GA21783@svana.org>
|
|
Reply-To: Martijn van Oosterhout <kleptog@svana.org>
|
|
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us> <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
|
|
MIME-Version: 1.0
|
|
Content-Type: multipart/signed; micalg=pgp-sha1;
|
|
protocol="application/pgp-signature"; boundary="FL5UXtIhxfXey3p5"
|
|
Content-Disposition: inline
|
|
In-Reply-To: <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com>
|
|
User-Agent: Mutt/1.3.28i
|
|
X-PGP-Key-ID: Length=1024; ID=0x0DC67BE6
|
|
X-PGP-Key-Fingerprint: 295F A899 A81A 156D B522 48A7 6394 F08A 0DC6 7BE6
|
|
X-PGP-Key-URL: <http://svana.org/kleptog/0DC67BE6.pgp.asc>
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.065 required=5 tests=[AWL=0.065]
|
|
X-Spam-Score: 0.065
|
|
X-Mailing-List: pgsql-hackers
|
|
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-hackers.postgresql.org>
|
|
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-hackers@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
|
|
--FL5UXtIhxfXey3p5
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Disposition: inline
|
|
Content-Transfer-Encoding: quoted-printable
|
|
|
|
On Thu, Dec 22, 2005 at 01:43:34AM +0100, Manfred Koizar wrote:
|
|
> Qsorting N elements costs O(N*lnN), so excluding H elements from the
|
|
> sort reduces the cost by at least O(H*lnN). The merge step costs O(N)
|
|
> plus some (<=3D50%) more memory, unless someone knows a fast in-place
|
|
> merge. So depending on the constant factors involved there might be a
|
|
> usable solution.
|
|
|
|
But where are you including the cost to check how many cells are
|
|
already sorted? That would be O(H), right? This is where we come back
|
|
to the issue that comparisons in PostgreSQL are expensive. The cpu_cost
|
|
in the tests I saw so far is unrealistically low.
|
|
|
|
> I've been playing with some numbers and assuming the constant factors
|
|
> to be equal for all the O()'s this method starts to pay off at
|
|
> H for N
|
|
> 20 100 20%
|
|
> 130 1000 13%
|
|
> 8000 100000 8%
|
|
|
|
Hmm, what are the chances you have 100000 unordered items to sort and
|
|
that the first 8% will already be in order. ISTM that that probability
|
|
will be close enough to zero to not matter...
|
|
|
|
Have a nice day,
|
|
--=20
|
|
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
|
|
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
|
|
> tool for doing 5% of the work and then sitting around waiting for someone
|
|
> else to do the other 95% so you can sue them.
|
|
|
|
--FL5UXtIhxfXey3p5
|
|
Content-Type: application/pgp-signature
|
|
Content-Disposition: inline
|
|
|
|
-----BEGIN PGP SIGNATURE-----
|
|
Version: GnuPG v1.0.6 (GNU/Linux)
|
|
Comment: For info see http://www.gnupg.org
|
|
|
|
iD8DBQFDqk8oIB7bNG8LQkwRAjJhAJ47eXRi1DJ02cfKcnN2iPkaBB0eaQCeIiF+
|
|
HOAYIPQrU2gpUUiGT3aGUUw=
|
|
=R0hU
|
|
-----END PGP SIGNATURE-----
|
|
|
|
--FL5UXtIhxfXey3p5--
|
|
|
|
From pgsql-hackers-owner+M77831=pgman=candle.pha.pa.us@postgresql.org Thu Dec 22 16:59:19 2005
|
|
Return-path: <pgsql-hackers-owner+M77831=pgman=candle.pha.pa.us@postgresql.org>
|
|
Received: from ams.hub.org (ams.hub.org [200.46.204.13])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id jBMLxJe07480
|
|
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 16:59:19 -0500 (EST)
|
|
Received: from postgresql.org (postgresql.org [200.46.204.71])
|
|
by ams.hub.org (Postfix) with ESMTP id D1DBE67AC1B
|
|
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 17:59:16 -0400 (AST)
|
|
X-Original-To: pgsql-hackers-postgresql.org@localhost.postgresql.org
|
|
Received: from localhost (av.hub.org [200.46.204.144])
|
|
by postgresql.org (Postfix) with ESMTP id BE8249DCBEB
|
|
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>; Thu, 22 Dec 2005 17:58:53 -0400 (AST)
|
|
Received: from postgresql.org ([200.46.204.71])
|
|
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
|
with ESMTP id 64765-01
|
|
for <pgsql-hackers-postgresql.org@localhost.postgresql.org>;
|
|
Thu, 22 Dec 2005 17:58:54 -0400 (AST)
|
|
X-Greylist: from auto-whitelisted by SQLgrey-
|
|
Received: from email.aon.at (warsl404pip7.highway.telekom.at [195.3.96.91])
|
|
by postgresql.org (Postfix) with ESMTP id 3E08E9DCA5C
|
|
for <pgsql-hackers@postgresql.org>; Thu, 22 Dec 2005 17:58:49 -0400 (AST)
|
|
Received: (qmail 6986 invoked from network); 22 Dec 2005 21:58:49 -0000
|
|
Received: from m150p015.dipool.highway.telekom.at (HELO Sokrates) ([62.46.8.175])
|
|
(envelope-sender <mkoi-pg@aon.at>)
|
|
by smarthub76.highway.telekom.at (qmail-ldap-1.03) with SMTP
|
|
for <kleptog@svana.org>; 22 Dec 2005 21:58:49 -0000
|
|
From: Manfred Koizar <mkoi-pg@aon.at>
|
|
To: Martijn van Oosterhout <kleptog@svana.org>
|
|
cc: Tom Lane <tgl@sss.pgh.pa.us>, Dann Corbit <DCorbit@connx.com>,
|
|
Qingqing Zhou <zhouqq@cs.toronto.edu>,
|
|
Bruce Momjian <pgman@candle.pha.pa.us>,
|
|
Luke Lonergan <llonergan@greenplum.com>, Neil Conway <neilc@samurai.com>,
|
|
pgsql-hackers@postgresql.org
|
|
Subject: Re: [HACKERS] Re: Which qsort is used
|
|
Date: Thu, 22 Dec 2005 22:58:31 +0100
|
|
Message-ID: <4r6mq19fe6937mu9130h45ip3oeg135qo3@4ax.com>
|
|
References: <D425483C2C5C9F49B5B7A41F8944154757D386@postal.corporate.connx.com> <3148.1134795805@sss.pgh.pa.us> <odqjq1tv6cb77ri4df0aehqal8o0ljtkar@4ax.com> <20051222070057.GA21783@svana.org>
|
|
In-Reply-To: <20051222070057.GA21783@svana.org>
|
|
X-Mailer: Forte Agent 3.1/32.783
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Transfer-Encoding: 7bit
|
|
X-Virus-Scanned: by amavisd-new at hub.org
|
|
X-Spam-Status: No, score=0.398 required=5 tests=[AWL=0.398]
|
|
X-Spam-Score: 0.398
|
|
X-Mailing-List: pgsql-hackers
|
|
List-Archive: <http://archives.postgresql.org/pgsql-hackers>
|
|
List-Help: <mailto:majordomo@postgresql.org?body=help>
|
|
List-Id: <pgsql-hackers.postgresql.org>
|
|
List-Owner: <mailto:pgsql-hackers-owner@postgresql.org>
|
|
List-Post: <mailto:pgsql-hackers@postgresql.org>
|
|
List-Subscribe: <mailto:majordomo@postgresql.org?body=sub%20pgsql-hackers>
|
|
List-Unsubscribe: <mailto:majordomo@postgresql.org?body=unsub%20pgsql-hackers>
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
|
|
<kleptog@svana.org> wrote:
|
|
>But where are you including the cost to check how many cells are
|
|
>already sorted? That would be O(H), right?
|
|
|
|
Yes. I didn't mention it, because H < N.
|
|
|
|
> This is where we come back
|
|
>to the issue that comparisons in PostgreSQL are expensive.
|
|
|
|
So we agree that we should try to reduce the number of comparisons.
|
|
How many comparisons does it take to sort 100000 items? 1.5 million?
|
|
|
|
>Hmm, what are the chances you have 100000 unordered items to sort and
|
|
>that the first 8% will already be in order. ISTM that that probability
|
|
>will be close enough to zero to not matter...
|
|
|
|
If the items are totally unordered, the check is so cheap you won't
|
|
even notice. OTOH in Tom's example ...
|
|
|
|
|What I think is much more probable in the Postgres environment
|
|
|is almost-but-not-quite-ordered inputs --- eg, a table that was
|
|
|perfectly ordered by key when filled, but some of the tuples have since
|
|
|been moved by UPDATEs.
|
|
|
|
... I'd not be surprised if H is 90% of N.
|
|
Servus
|
|
Manfred
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 2: Don't 'kill -9' the postmaster
|
|
|
|
From DCorbit@connx.com Thu Dec 22 17:22:03 2005
|
|
Return-path: <DCorbit@connx.com>
|
|
Received: from postal.corporate.connx.com (postal.corporate.connx.com [65.212.159.187])
|
|
by candle.pha.pa.us (8.11.6/8.11.6) with SMTP id jBMMLve11671
|
|
for <pgman@candle.pha.pa.us>; Thu, 22 Dec 2005 17:22:03 -0500 (EST)
|
|
Content-class: urn:content-classes:message
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="us-ascii"
|
|
Subject: RE: [HACKERS] Re: Which qsort is used
|
|
X-MimeOLE: Produced By Microsoft Exchange V6.5
|
|
Date: Thu, 22 Dec 2005 14:21:49 -0800
|
|
Message-ID: <D425483C2C5C9F49B5B7A41F8944154757D3AC@postal.corporate.connx.com>
|
|
Thread-Topic: [HACKERS] Re: Which qsort is used
|
|
Thread-Index: AcYHQuXJdKs8JVgmSKywUqld6KYccQAAfWAA
|
|
From: "Dann Corbit" <DCorbit@connx.com>
|
|
To: "Manfred Koizar" <mkoi-pg@aon.at>,
|
|
"Martijn van Oosterhout" <kleptog@svana.org>
|
|
cc: "Tom Lane" <tgl@sss.pgh.pa.us>, "Qingqing Zhou" <zhouqq@cs.toronto.edu>,
|
|
"Bruce Momjian" <pgman@candle.pha.pa.us>,
|
|
"Luke Lonergan" <llonergan@greenplum.com>,
|
|
"Neil Conway" <neilc@samurai.com>, <pgsql-hackers@postgresql.org>
|
|
Content-Transfer-Encoding: 8bit
|
|
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id jBMMLve11671
|
|
Status: OR
|
|
|
|
An interesting article on sorting and comparison count:
|
|
http://www.acm.org/jea/ARTICLES/Vol7Nbr5.pdf
|
|
|
|
Here is the article, the code, and an implementation that I have been
|
|
toying with:
|
|
http://cap.connx.com/chess-engines/new-approach/algos.zip
|
|
|
|
Algorithm quickheap is especially interesting because it does not
|
|
require much additional space (just an array of integers up to size
|
|
log(element_count) and in addition, it has very few data movements.
|
|
|
|
> -----Original Message-----
|
|
> From: Manfred Koizar [mailto:mkoi-pg@aon.at]
|
|
> Sent: Thursday, December 22, 2005 1:59 PM
|
|
> To: Martijn van Oosterhout
|
|
> Cc: Tom Lane; Dann Corbit; Qingqing Zhou; Bruce Momjian; Luke
|
|
Lonergan;
|
|
> Neil Conway; pgsql-hackers@postgresql.org
|
|
> Subject: Re: [HACKERS] Re: Which qsort is used
|
|
>
|
|
> On Thu, 22 Dec 2005 08:01:00 +0100, Martijn van Oosterhout
|
|
> <kleptog@svana.org> wrote:
|
|
> >But where are you including the cost to check how many cells are
|
|
> >already sorted? That would be O(H), right?
|
|
>
|
|
> Yes. I didn't mention it, because H < N.
|
|
>
|
|
> > This is where we come back
|
|
> >to the issue that comparisons in PostgreSQL are expensive.
|
|
>
|
|
> So we agree that we should try to reduce the number of comparisons.
|
|
> How many comparisons does it take to sort 100000 items? 1.5 million?
|
|
>
|
|
> >Hmm, what are the chances you have 100000 unordered items to sort and
|
|
> >that the first 8% will already be in order. ISTM that that
|
|
probability
|
|
> >will be close enough to zero to not matter...
|
|
>
|
|
> If the items are totally unordered, the check is so cheap you won't
|
|
> even notice. OTOH in Tom's example ...
|
|
>
|
|
> |What I think is much more probable in the Postgres environment
|
|
> |is almost-but-not-quite-ordered inputs --- eg, a table that was
|
|
> |perfectly ordered by key when filled, but some of the tuples have
|
|
since
|
|
> |been moved by UPDATEs.
|
|
>
|
|
> ... I'd not be surprised if H is 90% of N.
|
|
> Servus
|
|
> Manfred
|
|
|