555 lines
21 KiB
Plaintext
555 lines
21 KiB
Plaintext
From owner-pgsql-hackers@hub.org Sun Jun 14 18:45:04 1998
|
|
Received: from hub.org (hub.org [209.47.148.200])
|
|
by candle.pha.pa.us (8.8.5/8.8.5) with ESMTP id SAA03690
|
|
for <maillist@candle.pha.pa.us>; Sun, 14 Jun 1998 18:45:00 -0400 (EDT)
|
|
Received: from localhost (majordom@localhost) by hub.org (8.8.8/8.7.5) with SMTP id SAA28049; Sun, 14 Jun 1998 18:39:42 -0400 (EDT)
|
|
Received: by hub.org (TLB v0.10a (1.23 tibbs 1997/01/09 00:29:32)); Sun, 14 Jun 1998 18:36:06 +0000 (EDT)
|
|
Received: (from majordom@localhost) by hub.org (8.8.8/8.7.5) id SAA27943 for pgsql-hackers-outgoing; Sun, 14 Jun 1998 18:36:04 -0400 (EDT)
|
|
Received: from angular.illustra.com (ifmxoak.illustra.com [206.175.10.34]) by hub.org (8.8.8/8.7.5) with ESMTP id SAA27925 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 18:35:47 -0400 (EDT)
|
|
Received: from hawk.illustra.com (hawk.illustra.com [158.58.61.70]) by angular.illustra.com (8.7.4/8.7.3) with SMTP id PAA21293 for <pgsql-hackers@postgresql.org>; Sun, 14 Jun 1998 15:35:12 -0700 (PDT)
|
|
Received: by hawk.illustra.com (5.x/smail2.5/06-10-94/S)
|
|
id AA07922; Sun, 14 Jun 1998 15:35:13 -0700
|
|
From: dg@illustra.com (David Gould)
|
|
Message-Id: <9806142235.AA07922@hawk.illustra.com>
|
|
Subject: [HACKERS] performance tests, initial results
|
|
To: pgsql-hackers@postgreSQL.org
|
|
Date: Sun, 14 Jun 1998 15:35:13 -0700 (PDT)
|
|
Mime-Version: 1.0
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Content-Transfer-Encoding: 7bit
|
|
Sender: owner-pgsql-hackers@hub.org
|
|
Precedence: bulk
|
|
Status: RO
|
|
|
|
|
|
I have been playing a little with the performance tests found in
|
|
pgsql/src/tests/performance and have a few observations that might be of
|
|
minor interest.
|
|
|
|
The tests themselves are simple enough although the result parsing in the
|
|
driver did not work on Linux. I am enclosing a patch below to fix this. I
|
|
think it will also work better on the other systems.
|
|
|
|
A summary of results from my testing are below. Details are at the bottom
|
|
of this message.
|
|
|
|
My test system is 'leslie':
|
|
|
|
linux 2.0.32, gcc version 2.7.2.3
|
|
P133, HX chipset, 512K L2, 32MB mem
|
|
NCR810 fast scsi, Quantum Atlas 2GB drive (7200 rpm).
|
|
|
|
|
|
Results Summary (times in seconds)
|
|
|
|
Single txn 8K txn Create 8K idx 8K random Simple
|
|
Case Description 8K insert 8K insert Index Insert Scans Orderby
|
|
=================== ========== ========= ====== ====== ========= =======
|
|
1 From Distribution
|
|
P90 FreeBsd -B256 39.56 1190.98 3.69 46.65 65.49 2.27
|
|
IDE
|
|
|
|
2 Running on leslie
|
|
P133 Linux 2.0.32 15.48 326.75 2.99 20.69 35.81 1.68
|
|
SCSI 32M
|
|
|
|
3 leslie, -o -F
|
|
no forced writes 15.90 24.98 2.63 20.46 36.43 1.69
|
|
|
|
4 leslie, -o -F
|
|
no ASSERTS 14.92 23.23 1.38 18.67 33.79 1.58
|
|
|
|
5 leslie, -o -F -B2048
|
|
more buffers 21.31 42.28 2.65 25.74 42.26 1.72
|
|
|
|
6 leslie, -o -F -B2048
|
|
more bufs, no ASSERT 20.52 39.79 1.40 24.77 39.51 1.55
|
|
|
|
|
|
|
|
|
|
Case to Case Difference Factors (+ is faster)
|
|
|
|
Single txn 8K txn Create 8K idx 8K random Simple
|
|
Case Description 8K insert 8K insert Index Insert Scans Orderby
|
|
=================== ========== ========= ====== ====== ========= =======
|
|
|
|
leslie vs BSD P90. 2.56 3.65 1.23 2.25 1.83 1.35
|
|
|
|
(noflush -F) vs no -F -1.03 13.08 1.14 1.01 -1.02 1.00
|
|
|
|
No Assert vs Assert 1.05 1.07 1.90 1.06 1.07 1.09
|
|
|
|
-B256 vs -B2048 1.34 1.69 1.01 1.26 1.16 1.02
|
|
|
|
|
|
Observations:
|
|
|
|
- leslie (P133 linux) appears to be about 1.8 times faster than the
|
|
P90 BSD system used for the test result distributed with the source, not
|
|
counting the 8K txn insert case which was completely disk bound.
|
|
|
|
- SCSI disks make a big (factor of 3.6) difference. During this test the
|
|
disk was hammering and cpu utilization was < 10%.
|
|
|
|
- Assertion checking seems to cost about 7% except for create index where
|
|
it costs 90%
|
|
|
|
- the -F option to avoid flushing buffers has tremendous effect if there are
|
|
many very small transactions. Or, another way, flushing at the end of the
|
|
transaction is a major disaster for performance.
|
|
|
|
- Something is very wrong with our buffer cache implementation. Going from
|
|
256 buffers to 2048 buffers costs an average of 25%. In the 8K txn case
|
|
it costs about 70%. I see looking at the code and profiling that in the 8K
|
|
txn case this is in BufferSync() which examines all the buffers at commit
|
|
time. I don't quite understand why it is so costly for the single 8K row
|
|
txn (35%) though.
|
|
|
|
It would be nice to have some more tests. Maybe the Wisconsin stuff will
|
|
be useful.
|
|
|
|
|
|
|
|
----------------- patch to test harness. apply from pgsql ------------
|
|
*** src/test/performance/runtests.pl.orig Sun Jun 14 11:34:04 1998
|
|
|
|
Differences %
|
|
|
|
|
|
----------------- patch to test harness. apply from pgsql ------------
|
|
*** src/test/performance/runtests.pl.orig Sun Jun 14 11:34:04 1998
|
|
--- src/test/performance/runtests.pl Sun Jun 14 12:07:30 1998
|
|
***************
|
|
*** 84,123 ****
|
|
open (STDERR, ">$TmpFile") or die;
|
|
select (STDERR); $| = 1;
|
|
|
|
! for ($i = 0; $i <= $#perftests; $i++)
|
|
! {
|
|
$test = $perftests[$i];
|
|
($test, $XACTBLOCK) = split (/ /, $test);
|
|
$runtest = $test;
|
|
! if ( $test =~ /\.ntm/ )
|
|
! {
|
|
! #
|
|
# No timing for this queries
|
|
- #
|
|
close (STDERR); # close $TmpFile
|
|
open (STDERR, ">/dev/null") or die;
|
|
$runtest =~ s/\.ntm//;
|
|
}
|
|
! else
|
|
! {
|
|
close (STDOUT);
|
|
open(STDOUT, ">&SAVEOUT");
|
|
print STDOUT "\nRunning: $perftests[$i+1] ...";
|
|
close (STDOUT);
|
|
open (STDOUT, ">/dev/null") or die;
|
|
select (STDERR); $| = 1;
|
|
! printf "$perftests[$i+1]: ";
|
|
}
|
|
|
|
do "sqls/$runtest";
|
|
|
|
# Restore STDERR to $TmpFile
|
|
! if ( $test =~ /\.ntm/ )
|
|
! {
|
|
close (STDERR);
|
|
open (STDERR, ">>$TmpFile") or die;
|
|
}
|
|
-
|
|
select (STDERR); $| = 1;
|
|
$i++;
|
|
}
|
|
--- 84,116 ----
|
|
open (STDERR, ">$TmpFile") or die;
|
|
select (STDERR); $| = 1;
|
|
|
|
! for ($i = 0; $i <= $#perftests; $i++) {
|
|
$test = $perftests[$i];
|
|
($test, $XACTBLOCK) = split (/ /, $test);
|
|
$runtest = $test;
|
|
! if ( $test =~ /\.ntm/ ) {
|
|
# No timing for this queries
|
|
close (STDERR); # close $TmpFile
|
|
open (STDERR, ">/dev/null") or die;
|
|
$runtest =~ s/\.ntm//;
|
|
}
|
|
! else {
|
|
close (STDOUT);
|
|
open(STDOUT, ">&SAVEOUT");
|
|
print STDOUT "\nRunning: $perftests[$i+1] ...";
|
|
close (STDOUT);
|
|
open (STDOUT, ">/dev/null") or die;
|
|
select (STDERR); $| = 1;
|
|
! print "$perftests[$i+1]: ";
|
|
}
|
|
|
|
do "sqls/$runtest";
|
|
|
|
# Restore STDERR to $TmpFile
|
|
! if ( $test =~ /\.ntm/ ) {
|
|
close (STDERR);
|
|
open (STDERR, ">>$TmpFile") or die;
|
|
}
|
|
select (STDERR); $| = 1;
|
|
$i++;
|
|
}
|
|
***************
|
|
*** 128,138 ****
|
|
open (TMPF, "<$TmpFile") or die;
|
|
open (RESF, ">$ResFile") or die;
|
|
|
|
! while (<TMPF>)
|
|
! {
|
|
! $str = $_;
|
|
! ($test, $rtime) = split (/:/, $str);
|
|
! ($tmp, $rtime, $rest) = split (/[ ]+/, $rtime);
|
|
! print RESF "$test: $rtime\n";
|
|
}
|
|
|
|
--- 121,130 ----
|
|
open (TMPF, "<$TmpFile") or die;
|
|
open (RESF, ">$ResFile") or die;
|
|
|
|
! while (<TMPF>) {
|
|
! if (m/^(.*: ).* ([0-9:.]+) *elapsed/) {
|
|
! ($test, $rtime) = ($1, $2);
|
|
! print RESF $test, $rtime, "\n";
|
|
! }
|
|
}
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
|
|
------------------------- testcase detail --------------------------
|
|
|
|
1. from distribution
|
|
DBMS: PostgreSQL 6.2b10
|
|
OS: FreeBSD 2.1.5-RELEASE
|
|
HardWare: i586/90, 24M RAM, IDE
|
|
StartUp: postmaster -B 256 '-o -S 2048' -S
|
|
Compiler: gcc 2.6.3
|
|
Compiled: -O, without CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.20
|
|
8192 INSERTs INTO SIMPLE (1 xact): 39.58
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 1190.98
|
|
Create INDEX on SIMPLE: 3.69
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 46.65
|
|
8192 random INDEX scans on SIMPLE (1 xact): 65.49
|
|
ORDER BY SIMPLE: 2.27
|
|
|
|
|
|
2. run on leslie with asserts
|
|
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
|
|
OS: Linux 2.0.32 leslie
|
|
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
|
|
StartUp: postmaster -B 256 '-o -S 2048' -S
|
|
Compiler: gcc 2.7.2.3
|
|
Compiled: -O, WITH CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.10
|
|
8192 INSERTs INTO SIMPLE (1 xact): 15.48
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 326.75
|
|
Create INDEX on SIMPLE: 2.99
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.69
|
|
8192 random INDEX scans on SIMPLE (1 xact): 35.81
|
|
ORDER BY SIMPLE: 1.68
|
|
|
|
|
|
3. with -F to avoid forced i/o
|
|
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
|
|
OS: Linux 2.0.32 leslie
|
|
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
|
|
StartUp: postmaster -B 256 '-o -S 2048 -F' -S
|
|
Compiler: gcc 2.7.2.3
|
|
Compiled: -O, WITH CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.10
|
|
8192 INSERTs INTO SIMPLE (1 xact): 15.90
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 24.98
|
|
Create INDEX on SIMPLE: 2.63
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 20.46
|
|
8192 random INDEX scans on SIMPLE (1 xact): 36.43
|
|
ORDER BY SIMPLE: 1.69
|
|
|
|
|
|
4. no asserts, -F to avoid forced I/O
|
|
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
|
|
OS: Linux 2.0.32 leslie
|
|
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
|
|
StartUp: postmaster -B 256 '-o -S 2048' -S
|
|
Compiler: gcc 2.7.2.3
|
|
Compiled: -O, No CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.10
|
|
8192 INSERTs INTO SIMPLE (1 xact): 14.92
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 23.23
|
|
Create INDEX on SIMPLE: 1.38
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 18.67
|
|
8192 random INDEX scans on SIMPLE (1 xact): 33.79
|
|
ORDER BY SIMPLE: 1.58
|
|
|
|
|
|
5. with more buffers (2048 vs 256) and -F to avoid forced i/o
|
|
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
|
|
OS: Linux 2.0.32 leslie
|
|
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
|
|
StartUp: postmaster -B 2048 '-o -S 2048 -F' -S
|
|
Compiler: gcc 2.7.2.3
|
|
Compiled: -O, WITH CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.11
|
|
8192 INSERTs INTO SIMPLE (1 xact): 21.31
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 42.28
|
|
Create INDEX on SIMPLE: 2.65
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 25.74
|
|
8192 random INDEX scans on SIMPLE (1 xact): 42.26
|
|
ORDER BY SIMPLE: 1.72
|
|
|
|
|
|
6. No Asserts, more buffers (2048 vs 256) and -F to avoid forced i/o
|
|
DBMS: PostgreSQL 6.3.2 (plus changes to 98/06/01)
|
|
OS: Linux 2.0.32 leslie
|
|
HardWare: i586/133 HX 512, 32M RAM, fast SCSI, 7200rpm
|
|
StartUp: postmaster -B 2048 '-o -S 2048 -F' -S
|
|
Compiler: gcc 2.7.2.3
|
|
Compiled: -O, No CASSERT checking, with
|
|
-DTBL_FREE_CMD_MEMORY (to free memory
|
|
if BEGIN/END after each query execution)
|
|
DB connection startup: 0.11
|
|
8192 INSERTs INTO SIMPLE (1 xact): 20.52
|
|
8192 INSERTs INTO SIMPLE (8192 xacts): 39.79
|
|
Create INDEX on SIMPLE: 1.40
|
|
8192 INSERTs INTO SIMPLE with INDEX (1 xact): 24.77
|
|
8192 random INDEX scans on SIMPLE (1 xact): 39.51
|
|
ORDER BY SIMPLE: 1.55
|
|
---------------------------------------------------------------------
|
|
|
|
-dg
|
|
|
|
David Gould dg@illustra.com 510.628.3783 or 510.305.9468
|
|
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
|
|
"Don't worry about people stealing your ideas. If your ideas are any
|
|
good, you'll have to ram them down people's throats." -- Howard Aiken
|
|
|
|
|
|
From owner-pgsql-hackers@hub.org Tue Oct 19 10:31:10 1999
|
|
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id KAA29087
|
|
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:31:08 -0400 (EDT)
|
|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id KAA27535 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 10:19:47 -0400 (EDT)
|
|
Received: from localhost (majordom@localhost)
|
|
by hub.org (8.9.3/8.9.3) with SMTP id KAA30328;
|
|
Tue, 19 Oct 1999 10:12:10 -0400 (EDT)
|
|
(envelope-from owner-pgsql-hackers)
|
|
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 10:11:55 -0400
|
|
Received: (from majordom@localhost)
|
|
by hub.org (8.9.3/8.9.3) id KAA30030
|
|
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 10:11:00 -0400 (EDT)
|
|
(envelope-from owner-pgsql-hackers@postgreSQL.org)
|
|
Received: from sss.sss.pgh.pa.us (sss.pgh.pa.us [209.114.166.2])
|
|
by hub.org (8.9.3/8.9.3) with ESMTP id KAA29914
|
|
for <pgsql-hackers@postgreSQL.org>; Tue, 19 Oct 1999 10:10:33 -0400 (EDT)
|
|
(envelope-from tgl@sss.pgh.pa.us)
|
|
Received: from sss.sss.pgh.pa.us (localhost [127.0.0.1])
|
|
by sss.sss.pgh.pa.us (8.9.1/8.9.1) with ESMTP id KAA09038;
|
|
Tue, 19 Oct 1999 10:09:15 -0400 (EDT)
|
|
To: "Hiroshi Inoue" <Inoue@tpf.co.jp>
|
|
cc: "Vadim Mikheev" <vadim@krs.ru>, pgsql-hackers@postgreSQL.org
|
|
Subject: Re: [HACKERS] mdnblocks is an amazing time sink in huge relations
|
|
In-reply-to: Your message of Tue, 19 Oct 1999 19:03:22 +0900
|
|
<000801bf1a19$2d88ae20$2801007e@cadzone.tpf.co.jp>
|
|
Date: Tue, 19 Oct 1999 10:09:15 -0400
|
|
Message-ID: <9036.940342155@sss.pgh.pa.us>
|
|
From: Tom Lane <tgl@sss.pgh.pa.us>
|
|
Sender: owner-pgsql-hackers@postgreSQL.org
|
|
Status: OR
|
|
|
|
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
|
|
> 1. shared cache holds committed system tuples.
|
|
> 2. private cache holds uncommitted system tuples.
|
|
> 3. relpages of shared cache are updated immediately by
|
|
> phisical change and corresponding buffer pages are
|
|
> marked dirty.
|
|
> 4. on commit, the contents of uncommitted tuples except
|
|
> relpages,reltuples,... are copied to correponding tuples
|
|
> in shared cache and the combined contents are
|
|
> committed.
|
|
> If so,catalog cache invalidation would be no longer needed.
|
|
> But synchronization of the step 4. may be difficult.
|
|
|
|
I think the main problem is that relpages and reltuples shouldn't
|
|
be kept in pg_class columns at all, because they need to have
|
|
very different update behavior from the other pg_class columns.
|
|
|
|
The rest of pg_class is update-on-commit, and we can lock down any one
|
|
row in the normal MVCC way (if transaction A has modified a row and
|
|
transaction B also wants to modify it, B waits for A to commit or abort,
|
|
so it can know which version of the row to start from). Furthermore,
|
|
there can legitimately be several different values of a row in use in
|
|
different places: the latest committed, an uncommitted modification, and
|
|
one or more old values that are still being used by active transactions
|
|
because they were current when those transactions started. (BTW, the
|
|
present relcache is pretty bad about maintaining pure MVCC transaction
|
|
semantics like this, but it seems clear to me that that's the direction
|
|
we want to go in.)
|
|
|
|
relpages cannot operate this way. To be useful for avoiding lseeks,
|
|
relpages *must* change exactly when the physical file changes. It
|
|
matters not at all whether the particular transaction that extended the
|
|
file ultimately commits or not. Moreover there can be only one correct
|
|
value (per relation) across the whole system, because there is only one
|
|
length of the relation file.
|
|
|
|
If we want to take reltuples seriously and try to maintain it
|
|
on-the-fly, then I think it needs still a third behavior. Clearly
|
|
it cannot be updated using MVCC rules, or we lose all writer
|
|
concurrency (if A has added tuples to a rel, B would have to wait
|
|
for A to commit before it could update reltuples...). Furthermore
|
|
"updating" isn't a simple matter of storing what you think the new
|
|
value is; otherwise two transactions adding tuples in parallel would
|
|
leave the wrong answer after B commits and overwrites A's value.
|
|
I think it would work for each transaction to keep track of a net delta
|
|
in reltuples for each table it's changed (total tuples added less total
|
|
tuples deleted), and then atomically add that value to the table's
|
|
shared reltuples counter during commit. But that still leaves the
|
|
problem of how you use the counter during a transaction to get an
|
|
accurate answer to the question "If I scan this table now, how many tuples
|
|
will I see?" At the time the question is asked, the current shared
|
|
counter value might include the effects of transactions that have
|
|
committed since your transaction started, and therefore are not visible
|
|
under MVCC rules. I think getting the correct answer would involve
|
|
making an instantaneous copy of the current counter at the start of
|
|
your xact, and then adding your own private net-uncommitted-delta to
|
|
the saved shared counter value when asked the question. This doesn't
|
|
look real practical --- you'd have to save the reltuples counts of
|
|
*all* tables in the database at the start of each xact, on the off
|
|
chance that you might need them. Ugh. Perhaps someone has a better
|
|
idea. In any case, reltuples clearly needs different mechanisms than
|
|
the ordinary fields in pg_class do, because updating it will be a
|
|
performance bottleneck otherwise.
|
|
|
|
If we allow reltuples to be updated only by vacuum-like events, as
|
|
it is now, then I think keeping it in pg_class is still OK.
|
|
|
|
In short, it seems clear to me that relpages should be removed from
|
|
pg_class and kept somewhere else if we want to make it more reliable
|
|
than it is now, and the same for reltuples (but reltuples doesn't
|
|
behave the same as relpages, and probably ought to be handled
|
|
differently).
|
|
|
|
regards, tom lane
|
|
|
|
************
|
|
|
|
From owner-pgsql-hackers@hub.org Tue Oct 19 21:25:30 1999
|
|
Received: from renoir.op.net (root@renoir.op.net [209.152.193.4])
|
|
by candle.pha.pa.us (8.9.0/8.9.0) with ESMTP id VAA28130
|
|
for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:25:26 -0400 (EDT)
|
|
Received: from hub.org (hub.org [216.126.84.1]) by renoir.op.net (o1/$Revision: 1.2 $) with ESMTP id VAA10512 for <maillist@candle.pha.pa.us>; Tue, 19 Oct 1999 21:15:28 -0400 (EDT)
|
|
Received: from localhost (majordom@localhost)
|
|
by hub.org (8.9.3/8.9.3) with SMTP id VAA50745;
|
|
Tue, 19 Oct 1999 21:07:23 -0400 (EDT)
|
|
(envelope-from owner-pgsql-hackers)
|
|
Received: by hub.org (bulk_mailer v1.5); Tue, 19 Oct 1999 21:07:01 -0400
|
|
Received: (from majordom@localhost)
|
|
by hub.org (8.9.3/8.9.3) id VAA50644
|
|
for pgsql-hackers-outgoing; Tue, 19 Oct 1999 21:06:06 -0400 (EDT)
|
|
(envelope-from owner-pgsql-hackers@postgreSQL.org)
|
|
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
|
|
by hub.org (8.9.3/8.9.3) with ESMTP id VAA50584
|
|
for <pgsql-hackers@postgreSQL.org>; Tue, 19 Oct 1999 21:05:26 -0400 (EDT)
|
|
(envelope-from Inoue@tpf.co.jp)
|
|
Received: from cadzone ([126.0.1.40] (may be forged))
|
|
by sd.tpf.co.jp (2.5 Build 2640 (Berkeley 8.8.6)/8.8.4) with SMTP
|
|
id KAA01715; Wed, 20 Oct 1999 10:05:14 +0900
|
|
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
|
|
To: "Tom Lane" <tgl@sss.pgh.pa.us>
|
|
Cc: <pgsql-hackers@postgreSQL.org>
|
|
Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge relations
|
|
Date: Wed, 20 Oct 1999 10:09:13 +0900
|
|
Message-ID: <000501bf1a97$b925a860$2801007e@cadzone.tpf.co.jp>
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: 7bit
|
|
X-Priority: 3 (Normal)
|
|
X-MSMail-Priority: Normal
|
|
X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0
|
|
X-Mimeole: Produced By Microsoft MimeOLE V4.72.2106.4
|
|
Importance: Normal
|
|
Sender: owner-pgsql-hackers@postgreSQL.org
|
|
Status: ORr
|
|
|
|
> -----Original Message-----
|
|
> From: Hiroshi Inoue [mailto:Inoue@tpf.co.jp]
|
|
> Sent: Tuesday, October 19, 1999 6:45 PM
|
|
> To: Tom Lane
|
|
> Cc: pgsql-hackers@postgreSQL.org
|
|
> Subject: RE: [HACKERS] mdnblocks is an amazing time sink in huge
|
|
> relations
|
|
>
|
|
>
|
|
> >
|
|
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
|
|
>
|
|
> [snip]
|
|
>
|
|
> >
|
|
> > > Deletion is necessary only not to consume disk space.
|
|
> > >
|
|
> > > For example vacuum could remove not deleted files.
|
|
> >
|
|
> > Hmm ... interesting idea ... but I can hear the complaints
|
|
> > from users already...
|
|
> >
|
|
>
|
|
> My idea is only an analogy of PostgreSQL's simple recovery
|
|
> mechanism of tuples.
|
|
>
|
|
> And my main point is
|
|
> "delete fails after commit" doesn't harm the database
|
|
> except that not deleted files consume disk space.
|
|
>
|
|
> Of cource,it's preferable to delete relation files immediately
|
|
> after(or just when) commit.
|
|
> Useless files are visible though useless tuples are invisible.
|
|
>
|
|
|
|
Anyway I don't need "DROP TABLE inside transactions" now
|
|
and my idea is originally for that issue.
|
|
|
|
After a thought,I propose the following solution.
|
|
|
|
1. mdcreate() couldn't create existent relation files.
|
|
If the existent file is of length zero,we would overwrite
|
|
the file.(seems the comment in md.c says so but the
|
|
code doesn't do so).
|
|
If the file is an Index relation file,we would overwrite
|
|
the file.
|
|
|
|
2. mdunlink() couldn't unlink non-existent relation files.
|
|
mdunlink() doesn't call elog(ERROR) even if the file
|
|
doesn't exist,though I couldn't find where to change
|
|
now.
|
|
mdopen() doesn't call elog(ERROR) even if the file
|
|
doesn't exist and leaves the relation as CLOSED.
|
|
|
|
Comments ?
|
|
|
|
Regards.
|
|
|
|
Hiroshi Inoue
|
|
Inoue@tpf.co.jp
|
|
|
|
************
|
|
|