1178 lines
50 KiB
Plaintext
1178 lines
50 KiB
Plaintext
From pgsql-hackers-owner+M11649@postgresql.org Wed Aug 1 15:22:46 2001
|
|
Return-path: <pgsql-hackers-owner+M11649@postgresql.org>
|
|
Received: from postgresql.org (webmail.postgresql.org [216.126.85.28])
|
|
by candle.pha.pa.us (8.10.1/8.10.1) with ESMTP id f71JMjN09768
|
|
for <pgman@candle.pha.pa.us>; Wed, 1 Aug 2001 15:22:45 -0400 (EDT)
|
|
Received: from postgresql.org.org (webmail.postgresql.org [216.126.85.28])
|
|
by postgresql.org (8.11.3/8.11.1) with SMTP id f71JMUf62338;
|
|
Wed, 1 Aug 2001 15:22:30 -0400 (EDT)
|
|
(envelope-from pgsql-hackers-owner+M11649@postgresql.org)
|
|
Received: from sectorbase2.sectorbase.com (sectorbase2.sectorbase.com [63.88.121.62] (may be forged))
|
|
by postgresql.org (8.11.3/8.11.1) with SMTP id f71J4df57086
|
|
for <pgsql-hackers@postgresql.org>; Wed, 1 Aug 2001 15:04:40 -0400 (EDT)
|
|
(envelope-from vmikheev@SECTORBASE.COM)
|
|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
|
|
id <PG1LSSPZ>; Wed, 1 Aug 2001 12:04:31 -0700
|
|
Message-ID: <3705826352029646A3E91C53F7189E32016705@sectorbase2.sectorbase.com>
|
|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
To: "'pgsql-hackers@postgresql.org'" <pgsql-hackers@postgresql.org>
|
|
Subject: [HACKERS] Using POSIX mutex-es
|
|
Date: Wed, 1 Aug 2001 12:04:24 -0700
|
|
MIME-Version: 1.0
|
|
X-Mailer: Internet Mail Service (5.5.2653.19)
|
|
Content-Type: text/plain;
|
|
charset="koi8-r"
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: RO
|
|
|
|
1. Just changed
|
|
TAS(lock) to pthread_mutex_trylock(lock)
|
|
S_LOCK(lock) to pthread_mutex_lock(lock)
|
|
S_UNLOCK(lock) to pthread_mutex_unlock(lock)
|
|
(and S_INIT_LOCK to share mutex-es between processes).
|
|
|
|
2. pgbench was initialized with scale 10.
|
|
SUN WS 10 (512Mb), Solaris 2.6 (I'm unable to test on E4500 -:()
|
|
-B 16384, wal_files 8, wal_buffers 256,
|
|
checkpoint_segments 64, checkpoint_timeout 3600
|
|
50 clients x 100 transactions
|
|
(after initialization DB dir was saved and before each test
|
|
copyed back and vacuum-ed).
|
|
|
|
3. No difference.
|
|
Mutex version maybe 0.5-1 % faster (eg: 37.264238 tps vs 37.083339 tps).
|
|
|
|
So - no gain, but no performance loss "from using pthread library"
|
|
(I've also run tests with 1 client), at least on Solaris.
|
|
|
|
And so - looks like we can use POSIX mutex-es and conditional variables
|
|
(not semaphores; man pthread_cond_wait) and should implement light lmgr,
|
|
probably with priority locking.
|
|
|
|
Vadim
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 2: you can get off all lists at once with the unregister command
|
|
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
|
|
From pgsql-hackers-owner+M18052=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 13:39:19 2002
|
|
Return-path: <pgsql-hackers-owner+M18052=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0NIdIU26480
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 13:39:18 -0500 (EST)
|
|
Received: (qmail 59371 invoked by alias); 23 Jan 2002 18:39:18 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 23 Jan 2002 18:39:18 -0000
|
|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0NIJ8l47400
|
|
for <pgsql-hackers@postgreSQL.org>; Wed, 23 Jan 2002 13:19:08 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.11.6/8.10.1) id g0NIJ5i24508
|
|
for pgsql-hackers@postgreSQL.org; Wed, 23 Jan 2002 13:19:05 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-ID: <200201231819.g0NIJ5i24508@candle.pha.pa.us>
|
|
Subject: [HACKERS] Savepoints
|
|
To: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST)
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: RO
|
|
|
|
I have talked in the past about a possible implementation of
|
|
savepoints/nested transactions. I would like to more formally outline
|
|
my ideas below.
|
|
|
|
We have talked about using WAL for such a purpose, but that requires WAL
|
|
files to remain for the life of a transaction, which seems unacceptable.
|
|
Other database systems do that, and it is a pain for administrators. I
|
|
realized we could do some sort of WAL compaction, but that seems quite
|
|
complex too.
|
|
|
|
Basically, under my plan, WAL would be unchanged. WAL's function is
|
|
crash recovery, and it would retain that. There would also be no
|
|
on-disk changes. I would use the command counter in certain cases to
|
|
identify savepoints.
|
|
|
|
My idea is to keep savepoint undo information in a private area per
|
|
backend, either in memory or on disk. We can either save the
|
|
relid/tids of modified rows, or if there are too many, discard the
|
|
saved ones and just remember the modified relids. On rollback to save
|
|
point, either clear up the modified relid/tids, or sequential scan
|
|
through the relid and clear up all the tuples that have our transaction
|
|
id and have command counters that are part of the undo savepoint.
|
|
|
|
It seems marking undo savepoint rows with a fixed aborted transaction id
|
|
would be the easiest solution.
|
|
|
|
Of course, we only remember modified rows when we are in savepoints, and
|
|
only undo them when we rollback to a savepoint. Transaction processing
|
|
remains the same.
|
|
|
|
There is no reason for other backend to be able to see savepoint undo
|
|
information, and keeping it private greatly simplifies the
|
|
implementation.
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 6: Have you searched our list archives?
|
|
|
|
http://archives.postgresql.org
|
|
|
|
From hstenger@adinet.com.uy Wed Jan 23 14:13:33 2002
|
|
Return-path: <hstenger@adinet.com.uy>
|
|
Received: from correo.adinet.com.uy (fecorreo01.adinet.com.uy [206.99.44.217])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NJDWU29832
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 14:13:33 -0500 (EST)
|
|
Received: from adinet.com.uy (200.61.76.155) by correo.adinet.com.uy (5.5.052) (authenticated as hstenger@adinet.com.uy)
|
|
id 3C4DBC5C00017E9F; Wed, 23 Jan 2002 16:13:25 -0300
|
|
Message-ID: <3C4F0BC0.5CFBB919@adinet.com.uy>
|
|
Date: Wed, 23 Jan 2002 16:15:12 -0300
|
|
From: Haroldo Stenger <hstenger@adinet.com.uy>
|
|
X-Mailer: Mozilla 4.78 [en] (Win98; U)
|
|
X-Accept-Language: en
|
|
MIME-Version: 1.0
|
|
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
References: <200201231819.g0NIJ5i24508@candle.pha.pa.us>
|
|
Content-Type: text/plain; charset=us-ascii
|
|
Content-Transfer-Encoding: 7bit
|
|
Status: OR
|
|
|
|
Bruce Momjian wrote:
|
|
>
|
|
> Basically, under my plan, WAL would be unchanged. WAL's function is
|
|
> crash recovery, and it would retain that. There would also be no
|
|
> on-disk changes. I would use the command counter in certain cases to
|
|
> identify savepoints.
|
|
|
|
This is a pointer to the previous August thread, where your original proposal
|
|
was posted, and some WAL/not WAL discussion took place. Just not to repeat the
|
|
already mentioned points. Oh, it's google archive just for fun, and to not
|
|
overload hub.org ;-)
|
|
|
|
http://groups.google.com/groups?hl=en&threadm=200108050432.f754Wdo11696%40candle.pha.pa.us&rnum=1&prev=/groups%3Fhl%3Den%26selm%3D200108050432.f754Wdo11696%2540candle.pha.pa.us
|
|
|
|
Regards,
|
|
Haroldo.
|
|
|
|
From vmikheev@SECTORBASE.COM Wed Jan 23 18:23:04 2002
|
|
Return-path: <vmikheev@SECTORBASE.COM>
|
|
Received: from sectorbase2.sectorbase.com ([66.106.163.120])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0NNN3U21442
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 18:23:04 -0500 (EST)
|
|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
|
|
id <DKXVZ14S>; Wed, 23 Jan 2002 15:22:52 -0800
|
|
Message-ID: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
|
|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
To: "'Bruce Momjian'" <pgman@candle.pha.pa.us>,
|
|
PostgreSQL-development
|
|
<pgsql-hackers@postgreSQL.org>
|
|
Subject: RE: [HACKERS] Savepoints
|
|
Date: Wed, 23 Jan 2002 15:22:42 -0800
|
|
MIME-Version: 1.0
|
|
X-Mailer: Internet Mail Service (5.5.2653.19)
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Status: ORr
|
|
|
|
> I have talked in the past about a possible implementation of
|
|
> savepoints/nested transactions. I would like to more formally outline
|
|
> my ideas below.
|
|
|
|
Well, I would like to do the same -:)
|
|
|
|
> ...
|
|
> There is no reason for other backend to be able to see savepoint undo
|
|
> information, and keeping it private greatly simplifies the
|
|
> implementation.
|
|
|
|
Yes... and requires additional memory/disk space: we keep old records
|
|
in data files and we'll store them again...
|
|
|
|
How about: use overwriting smgr + put old records into rollback
|
|
segments - RS - (you have to keep them somewhere till TX's running
|
|
anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
changes and WAL will be used for RS/data files recovery).
|
|
Something like what Oracle does.
|
|
|
|
Vadim
|
|
|
|
From pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:15:02 2002
|
|
Return-path: <pgsql-hackers-owner+M18085=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1F1U26461
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 20:15:02 -0500 (EST)
|
|
Received: (qmail 92866 invoked by alias); 24 Jan 2002 01:14:59 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 24 Jan 2002 01:14:59 -0000
|
|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O18ml91949
|
|
for <pgsql-hackers@postgresql.org>; Wed, 23 Jan 2002 20:08:50 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.11.6/8.10.1) id g0O18jV26044;
|
|
Wed, 23 Jan 2002 20:08:45 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-ID: <200201240108.g0O18jV26044@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
Date: Wed, 23 Jan 2002 20:08:45 -0500 (EST)
|
|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
Mikheev, Vadim wrote:
|
|
> > I have talked in the past about a possible implementation of
|
|
> > savepoints/nested transactions. I would like to more formally outline
|
|
> > my ideas below.
|
|
>
|
|
> Well, I would like to do the same -:)
|
|
|
|
Good.
|
|
|
|
> > ...
|
|
> > There is no reason for other backend to be able to see savepoint undo
|
|
> > information, and keeping it private greatly simplifies the
|
|
> > implementation.
|
|
>
|
|
> Yes... and requires additional memory/disk space: we keep old records
|
|
> in data files and we'll store them again...
|
|
|
|
I was suggesting keeping only relid/tid or in some cases only relid.
|
|
Seems like one or the other will fit all needs: relid/tid for update of
|
|
a few rows, relid for many rows updated in the same table. I saw no
|
|
need to store the actual data.
|
|
|
|
> How about: use overwriting smgr + put old records into rollback
|
|
> segments - RS - (you have to keep them somewhere till TX's running
|
|
> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> changes and WAL will be used for RS/data files recovery).
|
|
> Something like what Oracle does.
|
|
|
|
Why record the old data rows rather than the tids? While the
|
|
transaction is running, the rows can't be moved anyway. Also, why store
|
|
them in a shared area. That has additional requirements because one old
|
|
transaction can require all transactions to keep their stuff around.
|
|
Why not just make it a private data file for each backend?
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
|
|
From pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org Wed Jan 23 20:25:47 2002
|
|
Return-path: <pgsql-hackers-owner+M18086=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0O1PkU26964
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 20:25:47 -0500 (EST)
|
|
Received: (qmail 94878 invoked by alias); 24 Jan 2002 01:25:44 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 24 Jan 2002 01:25:44 -0000
|
|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0O1L1l94075
|
|
for <pgsql-hackers@postgreSQL.org>; Wed, 23 Jan 2002 20:21:01 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.11.6/8.10.1) id g0O1Kwm26748;
|
|
Wed, 23 Jan 2002 20:20:58 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-ID: <200201240120.g0O1Kwm26748@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
Date: Wed, 23 Jan 2002 20:20:58 -0500 (EST)
|
|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
> > There is no reason for other backend to be able to see savepoint undo
|
|
> > information, and keeping it private greatly simplifies the
|
|
> > implementation.
|
|
>
|
|
> Yes... and requires additional memory/disk space: we keep old records
|
|
> in data files and we'll store them again...
|
|
>
|
|
> How about: use overwriting smgr + put old records into rollback
|
|
> segments - RS - (you have to keep them somewhere till TX's running
|
|
> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> changes and WAL will be used for RS/data files recovery).
|
|
> Something like what Oracle does.
|
|
|
|
I am sorry. I see what you are saying now. I missed the words
|
|
"overwriting smgr". You are suggesting going to an overwriting storage
|
|
manager. Is this to be done only because of savepoints. Doesn't seem
|
|
worth it when I have a possible solution without such a drastic change.
|
|
Also, overwriting storage manager will require MVCC to read through
|
|
there to get accurate MVCC visibility, right?
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
|
|
|
From vmikheev@SECTORBASE.COM Wed Jan 23 21:03:29 2002
|
|
Return-path: <vmikheev@SECTORBASE.COM>
|
|
Received: from sectorbase2.sectorbase.com ([66.106.163.120])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O23TU28813
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 21:03:29 -0500 (EST)
|
|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
|
|
id <DKXVZFBY>; Wed, 23 Jan 2002 18:03:18 -0800
|
|
Message-ID: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com>
|
|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
To: "'Bruce Momjian'" <pgman@candle.pha.pa.us>
|
|
cc: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
|
|
Subject: RE: [HACKERS] Savepoints
|
|
Date: Wed, 23 Jan 2002 18:03:11 -0800
|
|
MIME-Version: 1.0
|
|
X-Mailer: Internet Mail Service (5.5.2653.19)
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Status: ORr
|
|
|
|
> > How about: use overwriting smgr + put old records into rollback
|
|
> > segments - RS - (you have to keep them somewhere till TX's running
|
|
> > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> > changes and WAL will be used for RS/data files recovery).
|
|
> > Something like what Oracle does.
|
|
>
|
|
> I am sorry. I see what you are saying now. I missed the words
|
|
|
|
And I'm sorry for missing your notes about storing relid+tid only.
|
|
|
|
> "overwriting smgr". You are suggesting going to an overwriting
|
|
> storage manager. Is this to be done only because of savepoints.
|
|
|
|
No. One point I made a few monthes ago (and never got objections)
|
|
is - why to keep old data in data files sooooo long?
|
|
Imagine long running TX (eg pg_dump). Why other TX-s must read
|
|
again and again completely useless (for them) old data we keep
|
|
for pg_dump?
|
|
|
|
> Doesn't seem worth it when I have a possible solution without
|
|
> such a drastic change.
|
|
> Also, overwriting storage manager will require MVCC to read
|
|
> through there to get accurate MVCC visibility, right?
|
|
|
|
Right... just like now non-overwriting smgr requires *ALL*
|
|
TX-s to read old data in data files. But with overwriting smgr
|
|
TX will read RS only when it is required and as far (much) as
|
|
it is required.
|
|
|
|
Simple solutions are not always the best ones.
|
|
Compare Oracle and InterBase. Both have MVCC.
|
|
Smgr-s are different. What RDBMS is more cool?
|
|
Why doesn't Oracle use more simple non-overwriting smgr
|
|
(as InterBase... and we do)?
|
|
|
|
Vadim
|
|
|
|
From dhogaza@pacifier.com Wed Jan 23 21:05:37 2002
|
|
Return-path: <dhogaza@pacifier.com>
|
|
Received: from comet.pacifier.com ([199.2.117.155])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0O25bU28962
|
|
for <pgman@candle.pha.pa.us>; Wed, 23 Jan 2002 21:05:37 -0500 (EST)
|
|
Received: from pacifier.com (dsl-dhogaza.pacifier.net [207.202.226.68])
|
|
by comet.pacifier.com (8.11.2/8.11.1) with ESMTP id g0O24qX29917;
|
|
Wed, 23 Jan 2002 18:04:52 -0800 (PST)
|
|
Message-ID: <3C4F6BF0.2010406@pacifier.com>
|
|
Date: Wed, 23 Jan 2002 18:05:36 -0800
|
|
From: Don Baccus <dhogaza@pacifier.com>
|
|
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.7) Gecko/20011221
|
|
X-Accept-Language: en-us
|
|
MIME-Version: 1.0
|
|
To: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
cc: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
|
|
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
References: <200201240120.g0O1Kwm26748@candle.pha.pa.us>
|
|
Content-Type: text/plain; charset=us-ascii; format=flowed
|
|
Content-Transfer-Encoding: 7bit
|
|
Status: OR
|
|
|
|
Bruce Momjian wrote:
|
|
|
|
|
|
> I am sorry. I see what you are saying now. I missed the words
|
|
> "overwriting smgr". You are suggesting going to an overwriting storage
|
|
> manager.
|
|
|
|
|
|
Overwriting storage managers don't suffer from unbounded growth of
|
|
datafiles until garbage collection (vacuum) is performed. In fact,
|
|
there's no need for a vacuum-style utility. The rollback segments only
|
|
need to keep around enough past history to rollback transactions that
|
|
are executing.
|
|
|
|
Of course, then the size of your transactions are limited by the size of
|
|
your rollback segments, which in Oracle are fixed in length when you
|
|
build your database (there are ways to change this when you figure out
|
|
that you didn't pick a good number when creating it).
|
|
|
|
>Is this to be done only because of savepoints.
|
|
|
|
Not in traditional storage managers such as Oracle uses. The complexity
|
|
of managing visibility and the like are traded off against the fact that
|
|
you're not stuck ever needing to garbage collect a database that
|
|
occupies a roomful of disks.
|
|
|
|
It's a trade-off. PG's current storage manager seems to work awfully
|
|
well in a lot of common database scenarios, and Tom's new vacuum is
|
|
meant to help mitigate against the drawbacks. But overwriting storage
|
|
managers certainly have their advantages, too.
|
|
|
|
> Doesn't seem
|
|
|
|
> worth it when I have a possible solution without such a drastic change.
|
|
> Also, overwriting storage manager will require MVCC to read through
|
|
> there to get accurate MVCC visibility, right?
|
|
|
|
|
|
Yep...
|
|
|
|
--
|
|
Don Baccus
|
|
Portland, OR
|
|
http://donb.photo.net, http://birdnotes.net, http://openacs.org
|
|
|
|
|
|
From Inoue@tpf.co.jp Thu Jan 24 11:34:48 2002
|
|
Return-path: <Inoue@tpf.co.jp>
|
|
Received: from p2272.nsk.ne.jp (p2272.nsk.ne.jp [210.145.18.145])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0OGYjU23980
|
|
for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 11:34:47 -0500 (EST)
|
|
Received: from mcadnote1 (ppm132.noc.fukui.nsk.ne.jp [61.198.95.32])
|
|
by p2272.nsk.ne.jp (8.9.3/3.7W-20000722) with SMTP id BAA12147;
|
|
Fri, 25 Jan 2002 01:34:24 +0900 (JST)
|
|
From: "Hiroshi Inoue" <Inoue@tpf.co.jp>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
cc: "PostgreSQL-development" <pgsql-hackers@postgreSQL.org>,
|
|
"'Bruce Momjian'" <pgman@candle.pha.pa.us>
|
|
Subject: RE: [HACKERS] Savepoints
|
|
Date: Fri, 25 Jan 2002 01:34:29 +0900
|
|
Message-ID: <EKEJJICOHDIEMGPNIFIJKEFBGJAA.Inoue@tpf.co.jp>
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Content-Transfer-Encoding: 7bit
|
|
X-Priority: 3 (Normal)
|
|
X-MSMail-Priority: Normal
|
|
X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0)
|
|
In-Reply-To: <3705826352029646A3E91C53F7189E32518483@sectorbase2.sectorbase.com>
|
|
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
|
|
Importance: Normal
|
|
Status: OR
|
|
|
|
> -----Original Message-----
|
|
> From: Mikheev, Vadim
|
|
>
|
|
> How about: use overwriting smgr + put old records into rollback
|
|
> segments - RS - (you have to keep them somewhere till TX's running
|
|
> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> changes and WAL will be used for RS/data files recovery).
|
|
> Something like what Oracle does.
|
|
|
|
As long as we use no overwriting manager
|
|
1) Rollback(data) isn't needed in case of a db crash.
|
|
2) Rollback(data) isn't needed to cancal a transaction entirely.
|
|
3) We don't need to mind the transaction size so much.
|
|
|
|
We can't use the db any longer if a REDO recovery fails now.
|
|
Under overwriting smgr we can't use the db any longer either
|
|
if rollback fails. How could PG be not less reliable than now ?
|
|
|
|
regards,
|
|
Hiroshi Inoue
|
|
|
|
From pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:15:11 2002
|
|
Return-path: <pgsql-hackers-owner+M18123=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJFAU12547
|
|
for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 14:15:10 -0500 (EST)
|
|
Received: (qmail 43413 invoked by alias); 24 Jan 2002 19:13:48 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 24 Jan 2002 19:13:48 -0000
|
|
Received: from sectorbase2.sectorbase.com ([66.106.163.120])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJC4l42011
|
|
for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 14:12:04 -0500 (EST)
|
|
(envelope-from vmikheev@SECTORBASE.COM)
|
|
Received: by sectorbase2.sectorbase.com with Internet Mail Service (5.5.2653.19)
|
|
id <DKXVZF9P>; Thu, 24 Jan 2002 11:11:54 -0800
|
|
Message-ID: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com>
|
|
From: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
To: "'Hiroshi Inoue'" <Inoue@tpf.co.jp>
|
|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
|
|
"'Bruce Momjian'"
|
|
<pgman@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
Date: Thu, 24 Jan 2002 11:11:52 -0800
|
|
MIME-Version: 1.0
|
|
X-Mailer: Internet Mail Service (5.5.2653.19)
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
> > How about: use overwriting smgr + put old records into rollback
|
|
> > segments - RS - (you have to keep them somewhere till TX's running
|
|
> > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> > changes and WAL will be used for RS/data files recovery).
|
|
> > Something like what Oracle does.
|
|
>
|
|
> As long as we use no overwriting manager
|
|
> 1) Rollback(data) isn't needed in case of a db crash.
|
|
> 2) Rollback(data) isn't needed to cancal a transaction entirely.
|
|
|
|
-1) But vacuum must read a huge amount of data to remove dirt.
|
|
-2) But TX-s must read data they are not interested at all.
|
|
|
|
> 3) We don't need to mind the transaction size so much.
|
|
|
|
-3) The same with overwriting smgr and WAL used *only as REDO log*:
|
|
we are not required to keep WAL files for duration of transaction
|
|
- as soon as server knows that changes logged in some WAL file
|
|
applied to data files and RS on disk (and archived, for WAL-based
|
|
BAR) that file may be reused/removed. Old data will still occupy
|
|
space in RS but their space in data files will be available
|
|
for reuse.
|
|
|
|
> We can't use the db any longer if a REDO recovery fails now.
|
|
|
|
Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or
|
|
use good RAM - whatever caused problem with restart.
|
|
|
|
> Under overwriting smgr we can't use the db any longer either
|
|
> if rollback fails.
|
|
|
|
Why should it fail? Bugs? Fix them.
|
|
|
|
> How could PG be not less reliable than now ?
|
|
|
|
Is today' RG more reliable than Oracle, Informix, DB2?
|
|
|
|
Vadim
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 3: if posting/reading through Usenet, please send an appropriate
|
|
subscribe-nomail command to majordomo@postgresql.org so that your
|
|
message can get through to the mailing list cleanly
|
|
|
|
From pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 14:23:42 2002
|
|
Return-path: <pgsql-hackers-owner+M18125=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0OJNfU13481
|
|
for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 14:23:42 -0500 (EST)
|
|
Received: (qmail 49604 invoked by alias); 24 Jan 2002 19:23:40 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 24 Jan 2002 19:23:40 -0000
|
|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0OJMTl48885
|
|
for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 14:22:29 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.11.6/8.10.1) id g0OJMJf13378;
|
|
Thu, 24 Jan 2002 14:22:19 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-ID: <200201241922.g0OJMJf13378@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
In-Reply-To: <3705826352029646A3E91C53F7189E32518487@sectorbase2.sectorbase.com>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
Date: Thu, 24 Jan 2002 14:22:19 -0500 (EST)
|
|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
|
|
OK, I have had time to think about this, and I think I can put the two
|
|
proposals into perspective. I will use Vadim's terminology.
|
|
|
|
In our current setup, rollback/undo data is kept in the same file as our
|
|
live data. This data is used for two purposes, one, for rollback of
|
|
transactions, and perhaps subtransactions in the future, and second, for
|
|
MVCC visibility for backends making changes.
|
|
|
|
So, it seems the real question is whether a database modification should
|
|
write the old data into a separate rollback segment and modify the heap
|
|
data, or just create a new row and require the old row to be removed
|
|
later by vacuum.
|
|
|
|
Let's look at this behavior without MVCC. In such cases, if someone
|
|
tries to read a modified row, it will block and wait for the modifying
|
|
backend to commit or rollback, when it will then continue. In such
|
|
cases, there is no reason for the waiting transaction to read the old
|
|
data in the redo segment because it can't continue anyway.
|
|
|
|
Now, with MVCC, the backend has to read through the redo segment to get
|
|
the original data value for that row.
|
|
|
|
Now, while rollback segments do help with cleaning out old UPDATE rows,
|
|
how does it improve DELETE performance? Seems it would just mark it as
|
|
expired like we do now.
|
|
|
|
One objection I always had to redo segments was that if I start a
|
|
transaction in the morning and walk away, none of the redo segments can
|
|
be recycled. I was going to ask if we can force some type of redo
|
|
segment compaction to keep old active rows and delete rows no longer
|
|
visible to any transaction. However, I now realize that our VACUUM has
|
|
the same problem. Tuples with XID >= GetOldestXmin() are not recycled,
|
|
meaning we have this problem in our current implementation too. (I
|
|
wonder if our vacuum could be smarter about knowing which rows are
|
|
visible, perhaps by creating a sorted list of xid's and doing a binary
|
|
search on the list to determine visibility.)
|
|
|
|
So, I guess the issue is, do we want to keep redo information in the
|
|
main table, or split it out into redo segments. Certainly we have to
|
|
eliminate the Oracle restrictions that redo segment size is fixed at
|
|
install time.
|
|
|
|
The advantages of a redo segment is that hopefully we don't have
|
|
transactions reading through irrelevant undo information. The
|
|
disadvantage is that we now have redo information grouped into table
|
|
files where a sequential scan can be performed. (Index scans of redo
|
|
info are a performance problem currently.) We would have to somehow
|
|
efficiently access redo information grouped into the redo segments.
|
|
Perhaps a hash based in relid would help here. Another disadvantage is
|
|
concurrency. When we start modifying heap data in place, we have to
|
|
prevent other backends from seeing that modification while we move the
|
|
old data to the redo segment.
|
|
|
|
I guess my feeling is that if we can get vacuum to happen automatically,
|
|
how is our current non-overwriting storage manager different from redo
|
|
segments?
|
|
|
|
One big advantage of redo segments would be that right now, if someone
|
|
updates a row repeatedly, there are lots of heap versions of the row
|
|
that are difficult to shrink in the table, while if they are in the redo
|
|
segments, we can more efficiently remove them, and there is only on heap
|
|
row.
|
|
|
|
How is recovery handled with rollback segments? Do we write old and new
|
|
data to WAL? We just write new data to WAL now, right? Do we fsync
|
|
rollback segments?
|
|
|
|
Have I outlined this accurately?
|
|
|
|
---------------------------------------------------------------------------
|
|
|
|
Mikheev, Vadim wrote:
|
|
> > > How about: use overwriting smgr + put old records into rollback
|
|
> > > segments - RS - (you have to keep them somewhere till TX's running
|
|
> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> > > changes and WAL will be used for RS/data files recovery).
|
|
> > > Something like what Oracle does.
|
|
> >
|
|
> > I am sorry. I see what you are saying now. I missed the words
|
|
>
|
|
> And I'm sorry for missing your notes about storing relid+tid only.
|
|
>
|
|
> > "overwriting smgr". You are suggesting going to an overwriting
|
|
> > storage manager. Is this to be done only because of savepoints.
|
|
>
|
|
> No. One point I made a few monthes ago (and never got objections)
|
|
> is - why to keep old data in data files sooooo long?
|
|
> Imagine long running TX (eg pg_dump). Why other TX-s must read
|
|
> again and again completely useless (for them) old data we keep
|
|
> for pg_dump?
|
|
>
|
|
> > Doesn't seem worth it when I have a possible solution without
|
|
> > such a drastic change.
|
|
> > Also, overwriting storage manager will require MVCC to read
|
|
> > through there to get accurate MVCC visibility, right?
|
|
>
|
|
> Right... just like now non-overwriting smgr requires *ALL*
|
|
> TX-s to read old data in data files. But with overwriting smgr
|
|
> TX will read RS only when it is required and as far (much) as
|
|
> it is required.
|
|
>
|
|
> Simple solutions are not always the best ones.
|
|
> Compare Oracle and InterBase. Both have MVCC.
|
|
> Smgr-s are different. What RDBMS is more cool?
|
|
> Why doesn't Oracle use more simple non-overwriting smgr
|
|
> (as InterBase... and we do)?
|
|
>
|
|
> Vadim
|
|
>
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 2: you can get off all lists at once with the unregister command
|
|
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
|
|
|
From pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org Thu Jan 24 19:43:38 2002
|
|
Return-path: <pgsql-hackers-owner+M18141=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0P0hbU15026
|
|
for <pgman@candle.pha.pa.us>; Thu, 24 Jan 2002 19:43:38 -0500 (EST)
|
|
Received: (qmail 28642 invoked by alias); 25 Jan 2002 00:43:24 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 25 Jan 2002 00:43:24 -0000
|
|
Received: from sd.tpf.co.jp (sd.tpf.co.jp [210.161.239.34])
|
|
by postgresql.org (8.11.3/8.11.4) with SMTP id g0P0YIl27208
|
|
for <pgsql-hackers@postgreSQL.org>; Thu, 24 Jan 2002 19:34:18 -0500 (EST)
|
|
(envelope-from Inoue@tpf.co.jp)
|
|
Received: (qmail 3661 invoked from network); 25 Jan 2002 00:34:19 -0000
|
|
Received: from unknown (HELO viscomail.tpf.co.jp) (100.0.0.108)
|
|
by sd2.tpf-fw-c.co.jp with SMTP; 25 Jan 2002 00:34:19 -0000
|
|
Received: from tpf.co.jp (3dgateway1 [126.0.1.60])
|
|
by viscomail.tpf.co.jp (8.8.8+Sun/8.8.8) with ESMTP id JAA00756;
|
|
Fri, 25 Jan 2002 09:34:18 +0900 (JST)
|
|
Message-ID: <3C50A807.32A29E09@tpf.co.jp>
|
|
Date: Fri, 25 Jan 2002 09:34:15 +0900
|
|
From: Hiroshi Inoue <Inoue@tpf.co.jp>
|
|
X-Mailer: Mozilla 4.73 [ja] (Windows NT 5.0; U)
|
|
X-Accept-Language: ja
|
|
MIME-Version: 1.0
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
cc: PostgreSQL-development <pgsql-hackers@postgresql.org>,
|
|
"'Bruce Momjian'" <pgman@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
References: <3705826352029646A3E91C53F7189E3251848B@sectorbase2.sectorbase.com>
|
|
Content-Type: text/plain; charset=iso-2022-jp
|
|
Content-Transfer-Encoding: 7bit
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
"Mikheev, Vadim" wrote:
|
|
>
|
|
> > > How about: use overwriting smgr + put old records into rollback
|
|
> > > segments - RS - (you have to keep them somewhere till TX's running
|
|
> > > anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> > > changes and WAL will be used for RS/data files recovery).
|
|
> > > Something like what Oracle does.
|
|
> >
|
|
> > As long as we use no overwriting manager
|
|
> > 1) Rollback(data) isn't needed in case of a db crash.
|
|
> > 2) Rollback(data) isn't needed to cancal a transaction entirely.
|
|
>
|
|
> -1) But vacuum must read a huge amount of data to remove dirt.
|
|
> -2) But TX-s must read data they are not interested at all.
|
|
>
|
|
> > 3) We don't need to mind the transaction size so much.
|
|
>
|
|
> -3) The same with overwriting smgr and WAL used *only as REDO log*:
|
|
|
|
The larger RS becomes the longer it would take time to cancel
|
|
the transaction whereas it is executed in a momemnt under no
|
|
overwriting smgr and for example if RS exhausted all disk space
|
|
is PG really safe ? Other backends would also fail because they
|
|
couldn't write RS any mode. Many transactions would execute
|
|
UNDO operations simultaneously but there's no space to write
|
|
WALs (UNDO operations must be written to WAL also) and PG
|
|
system would abort. And could PG restart under such situations ?
|
|
Even though there's a way to recover from the situation, I
|
|
think we should avoid such dangerous situations from the
|
|
first. Basically recovery operations should never fail.
|
|
|
|
>
|
|
> > We can't use the db any longer if a REDO recovery fails now.
|
|
>
|
|
> Reset WAL and use/dump it. Annoying? Agreed. Fix bugs and/or
|
|
> use good RAM - whatever caused problem with restart.
|
|
|
|
As I already mentioned recovery operations should never fail.
|
|
>
|
|
> > Under overwriting smgr we can't use the db any longer either
|
|
> > if rollback fails.
|
|
>
|
|
> Why should it fail? Bugs? Fix them.
|
|
|
|
Rollback operations are executed much more often than
|
|
REDO recovery and it is hard to fix such bugs once PG
|
|
was released. Most people in such troubles have no
|
|
time to persue the cause. In reality I replied to the
|
|
PG restart troubles twice (with --wal-debug and pg_resetxlog
|
|
suggestions ) in Japan but got no further replies.
|
|
|
|
>
|
|
> > How could PG be not less reliable than now ?
|
|
>
|
|
> Is today' RG more reliable than Oracle, Informix, DB2?
|
|
|
|
I have never been and would never be optiomistic
|
|
about recovery. Is 7.1 more reliable than 7.0 from the
|
|
recovery POV ? I see no reason why overwriting smgr is
|
|
more relaible than no overwriting smgr as for recovery.
|
|
|
|
regards,
|
|
Hiroshi Inoue
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 6: Have you searched our list archives?
|
|
|
|
http://archives.postgresql.org
|
|
|
|
From ZeugswetterA@spardat.at Fri Jan 25 09:21:40 2002
|
|
Return-path: <ZeugswetterA@spardat.at>
|
|
Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g0PELde10640
|
|
for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 09:21:39 -0500 (EST)
|
|
Received: from m01x1.s-mxs.net [10.3.55.201]
|
|
by smxsat1.smxs.net
|
|
with XWall v3.18f ;
|
|
Fri, 25 Jan 2002 15:22:51 +0100
|
|
Received: from m0103.s-mxs.net [10.3.55.3]
|
|
by m01x1.s-mxs.net
|
|
with XWall v3.18a ;
|
|
Fri, 25 Jan 2002 15:21:23 +0100
|
|
Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
|
|
Fri, 25 Jan 2002 15:21:22 +0100
|
|
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
|
|
content-class: urn:content-classes:message
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Subject: RE: [HACKERS] Savepoints
|
|
Date: Fri, 25 Jan 2002 15:21:22 +0100
|
|
Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA42128DE@m0114.s-mxs.net>
|
|
Thread-Topic: [HACKERS] Savepoints
|
|
Thread-Index: AcGkZ8SMKn//UUTjS3mi+qC7+gZAwwBQ4YMA
|
|
From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
|
|
"Bruce Momjian" <pgman@candle.pha.pa.us>,
|
|
"PostgreSQL-development" <pgsql-hackers@postgresql.org>
|
|
X-OriginalArrivalTime: 25 Jan 2002 14:21:22.0648 (UTC) FILETIME=[9090BD80:01C1A5AB]
|
|
Content-Transfer-Encoding: 8bit
|
|
X-MIME-Autoconverted: from quoted-printable to 8bit by candle.pha.pa.us id g0PELde10640
|
|
Status: OR
|
|
|
|
Vadim wrote:
|
|
> How about: use overwriting smgr + put old records into rollback
|
|
> segments - RS - (you have to keep them somewhere till TX's running
|
|
> anyway) + use WAL only as REDO log (RS will be used to rollback TX'
|
|
> changes and WAL will be used for RS/data files recovery).
|
|
> Something like what Oracle does.
|
|
|
|
We have all the info we need in WAL and in the old rows,
|
|
why would you want to write them to RS ?
|
|
You only need RS for overwriting smgr.
|
|
|
|
Andreas
|
|
|
|
From pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:14:02 2002
|
|
Return-path: <pgsql-hackers-owner+M18209=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLE1e19182
|
|
for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 16:14:01 -0500 (EST)
|
|
Received: (qmail 85111 invoked by alias); 25 Jan 2002 21:13:59 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 25 Jan 2002 21:13:59 -0000
|
|
Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PL48l79366
|
|
for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 16:04:09 -0500 (EST)
|
|
(envelope-from ZeugswetterA@spardat.at)
|
|
Received: from m01x1.s-mxs.net [10.3.55.201]
|
|
by smxsat1.smxs.net
|
|
with XWall v3.18f ;
|
|
Fri, 25 Jan 2002 22:05:21 +0100
|
|
Received: from m0102.s-mxs.net [10.3.55.2]
|
|
by m01x1.s-mxs.net
|
|
with XWall v3.18a ;
|
|
Fri, 25 Jan 2002 22:03:54 +0100
|
|
Received: from m0114.s-mxs.net ([10.3.55.14]) by m0102.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
|
|
Fri, 25 Jan 2002 22:03:53 +0100
|
|
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
|
|
content-class: urn:content-classes:message
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Subject: Re: [HACKERS] Savepoints
|
|
Date: Fri, 25 Jan 2002 22:03:53 +0100
|
|
Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net>
|
|
Thread-Topic: [HACKERS] Savepoints
|
|
Thread-Index: AcGlDMGVwSWndt4kT1C7QhclLvQPWgA1arbw
|
|
From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
|
|
To: "Bruce Momjian" <pgman@candle.pha.pa.us>,
|
|
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
cc: "PostgreSQL-development" <pgsql-hackers@postgresql.org>
|
|
X-OriginalArrivalTime: 25 Jan 2002 21:03:53.0685 (UTC) FILETIME=[CBB48850:01C1A5E3]
|
|
Content-Transfer-Encoding: 8bit
|
|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLDAm83732
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: ORr
|
|
|
|
|
|
> Now, with MVCC, the backend has to read through the redo segment to get
|
|
|
|
You mean rollback segment, but ...
|
|
|
|
> the original data value for that row.
|
|
|
|
Will only need to be looked up if the row is currently beeing modified by
|
|
a not yet comitted txn (at least in the default read committed mode)
|
|
|
|
>
|
|
> Now, while rollback segments do help with cleaning out old UPDATE rows,
|
|
> how does it improve DELETE performance? Seems it would just mark it as
|
|
> expired like we do now.
|
|
|
|
delete would probably be:
|
|
1. mark original deleted and write whole row to RS
|
|
|
|
I don't think you would like to mix looking up deleted rows in heap
|
|
but updated rows in RS
|
|
|
|
Andreas
|
|
|
|
PS: not that I like overwrite with MVCC now
|
|
If you think of VACUUM as garbage collection PG is highly trendy with
|
|
the non-overwriting smgr.
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 5: Have you checked our extensive FAQ?
|
|
|
|
http://www.postgresql.org/users-lounge/docs/faq.html
|
|
|
|
From pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 16:53:45 2002
|
|
Return-path: <pgsql-hackers-owner+M18211=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0PLrie22174
|
|
for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 16:53:44 -0500 (EST)
|
|
Received: (qmail 96831 invoked by alias); 25 Jan 2002 21:53:43 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 25 Jan 2002 21:53:43 -0000
|
|
Received: from smxsat1.smxs.net (smxsat1.smxs.net [213.150.10.1])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PLpRl96298
|
|
for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 16:51:27 -0500 (EST)
|
|
(envelope-from ZeugswetterA@spardat.at)
|
|
Received: from m01x1.s-mxs.net [10.3.55.201]
|
|
by smxsat1.smxs.net
|
|
with XWall v3.18f ;
|
|
Fri, 25 Jan 2002 22:52:54 +0100
|
|
Received: from m0103.s-mxs.net [10.3.55.3]
|
|
by m01x1.s-mxs.net
|
|
with XWall v3.18a ;
|
|
Fri, 25 Jan 2002 22:51:25 +0100
|
|
Received: from m0114.s-mxs.net ([10.3.55.14]) by m0103.s-mxs.net with Microsoft SMTPSVC(5.0.2195.2966);
|
|
Fri, 25 Jan 2002 22:51:25 +0100
|
|
X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
|
|
content-class: urn:content-classes:message
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain;
|
|
charset="iso-8859-1"
|
|
Subject: Re: [HACKERS] Savepoints
|
|
Date: Fri, 25 Jan 2002 22:51:24 +0100
|
|
Message-ID: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C5@m0114.s-mxs.net>
|
|
Thread-Topic: [HACKERS] Savepoints
|
|
Thread-Index: AcGlznYKFcqoYpMnSlGQHhQuEf6LuAAGpxnQ
|
|
From: "Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>
|
|
To: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>
|
|
cc: <pgsql-hackers@postgresql.org>
|
|
X-OriginalArrivalTime: 25 Jan 2002 21:51:25.0008 (UTC) FILETIME=[6F39E500:01C1A5EA]
|
|
Content-Transfer-Encoding: 8bit
|
|
X-MIME-Autoconverted: from quoted-printable to 8bit by postgresql.org id g0PLrP196418
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
|
|
> > > How about: use overwriting smgr + put old records into rollback
|
|
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
> > > segments - RS - (you have to keep them somewhere till TX's running
|
|
> > > anyway) + use WAL only as REDO log (RS will be used to
|
|
> rollback TX'
|
|
> > > changes and WAL will be used for RS/data files recovery).
|
|
> > > Something like what Oracle does.
|
|
> >
|
|
> > We have all the info we need in WAL and in the old rows,
|
|
> > why would you want to write them to RS ?
|
|
> > You only need RS for overwriting smgr.
|
|
>
|
|
> This is what I'm saying - implement Overwriting smgr...
|
|
|
|
Yes I am sorry, I am catching up on email and had not read Bruce's
|
|
comment (nor yours correctly) :-(
|
|
|
|
I was also long in the pro overwriting camp, because I am used to
|
|
non MVCC dbs like DB/2 and Informix. (which I like very much)
|
|
But I am starting to doubt that overwriting is really so good for
|
|
an MVCC db. And I don't think PG wants to switch to non MVCC :-)
|
|
|
|
Imho it would only need a much more aggressive VACUUM backend.
|
|
(aka garbage collector :-) Maybe It could be designed to sniff the
|
|
redo log (buffer) to get a hint at what to actually clean out next.
|
|
|
|
Andreas
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 4: Don't 'kill -9' the postmaster
|
|
|
|
From pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org Fri Jan 25 19:14:24 2002
|
|
Return-path: <pgsql-hackers-owner+M18218=candle.pha.pa.us=pgman@postgresql.org>
|
|
Received: from server1.pgsql.org (www.postgresql.org [64.49.215.9])
|
|
by candle.pha.pa.us (8.11.6/8.10.1) with SMTP id g0Q0ENe03543
|
|
for <pgman@candle.pha.pa.us>; Fri, 25 Jan 2002 19:14:23 -0500 (EST)
|
|
Received: (qmail 22482 invoked by alias); 26 Jan 2002 00:13:55 -0000
|
|
Received: from unknown (HELO postgresql.org) (64.49.215.8)
|
|
by www.postgresql.org with SMTP; 26 Jan 2002 00:13:55 -0000
|
|
Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35])
|
|
by postgresql.org (8.11.3/8.11.4) with ESMTP id g0PNw1l20714
|
|
for <pgsql-hackers@postgresql.org>; Fri, 25 Jan 2002 18:58:01 -0500 (EST)
|
|
(envelope-from pgman@candle.pha.pa.us)
|
|
Received: (from pgman@localhost)
|
|
by candle.pha.pa.us (8.11.6/8.10.1) id g0PNvoL02515;
|
|
Fri, 25 Jan 2002 18:57:50 -0500 (EST)
|
|
From: Bruce Momjian <pgman@candle.pha.pa.us>
|
|
Message-ID: <200201252357.g0PNvoL02515@candle.pha.pa.us>
|
|
Subject: Re: [HACKERS] Savepoints
|
|
In-Reply-To: <46C15C39FEB2C44BA555E356FBCD6FA41EB4C4@m0114.s-mxs.net>
|
|
To: Zeugswetter Andreas SB SD <ZeugswetterA@spardat.at>
|
|
Date: Fri, 25 Jan 2002 18:57:50 -0500 (EST)
|
|
cc: "Mikheev, Vadim" <vmikheev@SECTORBASE.COM>,
|
|
PostgreSQL-development <pgsql-hackers@postgresql.org>
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Precedence: bulk
|
|
Sender: pgsql-hackers-owner@postgresql.org
|
|
Status: OR
|
|
|
|
Zeugswetter Andreas SB SD wrote:
|
|
>
|
|
> > Now, with MVCC, the backend has to read through the redo segment to get
|
|
>
|
|
> You mean rollback segment, but ...
|
|
|
|
|
|
Sorry, yes. I get redo/undo/rollback mixed up sometimes. :-)
|
|
|
|
> > the original data value for that row.
|
|
>
|
|
> Will only need to be looked up if the row is currently beeing modified by
|
|
> a not yet comitted txn (at least in the default read committed mode)
|
|
|
|
Uh, not really. The transaction may have completed after my transaction
|
|
started, meaning even though it looks like it is committed, to me, it is
|
|
not visible. Most MVCC visibility will require undo lookup.
|
|
|
|
>
|
|
> >
|
|
> > Now, while rollback segments do help with cleaning out old UPDATE rows,
|
|
> > how does it improve DELETE performance? Seems it would just mark it as
|
|
> > expired like we do now.
|
|
>
|
|
> delete would probably be:
|
|
> 1. mark original deleted and write whole row to RS
|
|
>
|
|
> I don't think you would like to mix looking up deleted rows in heap
|
|
> but updated rows in RS
|
|
|
|
Yes, so really the overwriting is only a big win for UPDATE. Right now,
|
|
UPDATE is DELETE/INSERT, and that DELETE makes MVCC happy. :-)
|
|
|
|
My whole goal was to simplify this so we can see the differences.
|
|
|
|
|
|
> PS: not that I like overwrite with MVCC now
|
|
> If you think of VACUUM as garbage collection PG is highly trendy with
|
|
> the non-overwriting smgr.
|
|
|
|
Yes, that is basically what it is now, a garbage collector that collects
|
|
in heap rather than in undo.
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|
|
---------------------------(end of broadcast)---------------------------
|
|
TIP 3: if posting/reading through Usenet, please send an appropriate
|
|
subscribe-nomail command to majordomo@postgresql.org so that your
|
|
message can get through to the mailing list cleanly
|
|
|
|
From pgman Wed Jan 23 10:36:13 2002
|
|
Subject: Savepoints
|
|
To: PostgreSQL-development <pgsql-hackers@postgreSQL.org>
|
|
Date: Wed, 23 Jan 2002 13:19:05 -0500 (EST)
|
|
X-Mailer: ELM [version 2.4ME+ PL96 (25)]
|
|
MIME-Version: 1.0
|
|
Content-Transfer-Encoding: 7bit
|
|
Content-Type: text/plain; charset=US-ASCII
|
|
Content-Length: 1829
|
|
Status: OR
|
|
|
|
I have talked in the past about a possible implementation of
|
|
savepoints/nested transactions. I would like to more formally outline
|
|
my ideas below.
|
|
|
|
We have talked about using WAL for such a purpose, but that requires WAL
|
|
files to remain for the life of a transaction, which seems unacceptable.
|
|
Other database systems do that, and it is a pain for administrators. I
|
|
realized we could do some sort of WAL compaction, but that seems quite
|
|
complex too.
|
|
|
|
Basically, under my plan, WAL would be unchanged. WAL's function is
|
|
crash recovery, and it would retain that. There would also be no
|
|
on-disk changes. I would use the command counter in certain cases to
|
|
identify savepoints.
|
|
|
|
My idea is to keep savepoint undo information in a private area per
|
|
backend, either in memory or on disk. We can either save the
|
|
relid/tids of modified rows, or if there are too many, discard the
|
|
saved ones and just remember the modified relids. On rollback to save
|
|
point, either clear up the modified relid/tids, or sequential scan
|
|
through the relid and clear up all the tuples that have our transaction
|
|
id and have command counters that are part of the undo savepoint.
|
|
|
|
It seems marking undo savepoint rows with a fixed aborted transaction id
|
|
would be the easiest solution.
|
|
|
|
Of course, we only remember modified rows when we are in savepoints, and
|
|
only undo them when we rollback to a savepoint. Transaction processing
|
|
remains the same.
|
|
|
|
There is no reason for other backend to be able to see savepoint undo
|
|
information, and keeping it private greatly simplifies the
|
|
implementation.
|
|
|
|
--
|
|
Bruce Momjian | http://candle.pha.pa.us
|
|
pgman@candle.pha.pa.us | (610) 853-3000
|
|
+ If your life is a hard drive, | 830 Blythe Avenue
|
|
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
|
|
|