Remove TODO.detail directory. All URLs now in TODO file as references.
This commit is contained in:
parent
883b6a8a51
commit
fa4f87769f
@ -1,3 +0,0 @@
|
||||
These files are in standard Unix mailbox format, and contain detailed
|
||||
information related to the TODO list.
|
||||
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@ -1,827 +0,0 @@
|
||||
From pgsql-admin-owner+M15281=pgman=candle.pha.pa.us@postgresql.org Thu Oct 21 18:57:36 2004
|
||||
Return-path: <pgsql-admin-owner+M15281=pgman=candle.pha.pa.us@postgresql.org>
|
||||
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9LLvYf17059
|
||||
for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 17:57:34 -0400 (EDT)
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 79D9132A71A
|
||||
for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 22:57:29 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 80515-02 for <pgman@candle.pha.pa.us>;
|
||||
Thu, 21 Oct 2004 21:57:26 +0000 (GMT)
|
||||
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 1209432A70E
|
||||
for <pgman@candle.pha.pa.us>; Thu, 21 Oct 2004 22:57:29 +0100 (BST)
|
||||
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 4B39932A6C3
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Thu, 21 Oct 2004 22:51:01 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 78125-02
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
|
||||
Thu, 21 Oct 2004 21:50:48 +0000 (GMT)
|
||||
Received: from news.hub.org (news.hub.org [200.46.204.72])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 27F0632A6C2
|
||||
for <pgsql-admin@postgresql.org>; Thu, 21 Oct 2004 22:50:49 +0100 (BST)
|
||||
Received: from news.hub.org (news.hub.org [200.46.204.72])
|
||||
by news.hub.org (8.12.9/8.12.9) with ESMTP id i9LLojJ7079086
|
||||
for <pgsql-admin@postgresql.org>; Thu, 21 Oct 2004 21:50:45 GMT
|
||||
(envelope-from news@news.hub.org)
|
||||
Received: (from news@localhost)
|
||||
by news.hub.org (8.12.9/8.12.9/Submit) id i9LLnd7p078783
|
||||
for pgsql-admin@postgresql.org; Thu, 21 Oct 2004 21:49:39 GMT
|
||||
From: Gaetano Mendola <mendola@bigfoot.com>
|
||||
X-Newsgroups: comp.databases.postgresql.admin
|
||||
Subject: Re: [ADMIN] replication using WAL archives
|
||||
Date: Thu, 21 Oct 2004 23:49:35 +0200
|
||||
Organization: PYRENET Midi-pyrenees Provider
|
||||
Lines: 216
|
||||
Message-ID: <41782EEF.5040708@bigfoot.com>
|
||||
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel>
|
||||
MIME-Version: 1.0
|
||||
Content-Type: multipart/mixed;
|
||||
boundary="------------060900090803090101060101"
|
||||
X-Complaints-To: abuse@pyrenet.fr
|
||||
cc: iain@mst.co.jp
|
||||
To: Robert Treat <xzilla@users.sourceforge.net>
|
||||
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
|
||||
X-Accept-Language: en-us, en
|
||||
In-Reply-To: <1098384082.15573.14.camel@camel>
|
||||
X-Enigmail-Version: 0.86.1.0
|
||||
X-Enigmail-Supports: pgp-inline, pgp-mime
|
||||
To: pgsql-admin@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Mailing-List: pgsql-admin
|
||||
Precedence: bulk
|
||||
Sender: pgsql-admin-owner@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
|
||||
candle.pha.pa.us
|
||||
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
|
||||
version=2.61
|
||||
Status: OR
|
||||
|
||||
This is a multi-part message in MIME format.
|
||||
--------------060900090803090101060101
|
||||
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
||||
Content-Transfer-Encoding: 7bit
|
||||
|
||||
Robert Treat wrote:
|
||||
> On Thu, 2004-10-21 at 02:44, Iain wrote:
|
||||
>
|
||||
>>Hi,
|
||||
>>
|
||||
>>I thought I read something about this in relation to v8, but I can't
|
||||
>>find any reference to it now... is it (or will it be) possible to do
|
||||
>>master-slave style database replication by transmitting log files to the
|
||||
>>standby server and having it process them?
|
||||
>>
|
||||
>
|
||||
>
|
||||
> I'm not certain if this is 8.0, but some folks have created a working
|
||||
> version against the 8.0 code that will do something like this. Search
|
||||
> the pgsql-hacker mail list archives for more information on it.
|
||||
|
||||
I sent a post on hackers, I put it here:
|
||||
|
||||
=======================================================================
|
||||
Hi all,
|
||||
I seen that Eric Kerin did the work suggested by Tom about
|
||||
how to use the PITR in order to have an hot spare postgres,
|
||||
writing a C program.
|
||||
|
||||
I did the same writing 2 shell scripts, one of them perform
|
||||
the restore the other one deliver the partial filled wal and
|
||||
check if the postmaster is alive ( check if the pid process
|
||||
still exist ).
|
||||
|
||||
With these two scripts I'm able to have an hot spare installation,
|
||||
and the spare one go alive when the first postmaster dies.
|
||||
|
||||
How test it:
|
||||
|
||||
1) Master node:
|
||||
modify postgresql.conf using:
|
||||
|
||||
~ archive_command = 'cp %p /mnt/server/archivedir/%f'
|
||||
|
||||
~ launch postgres and perform a backup as doc
|
||||
|
||||
~ http://developer.postgresql.org/docs/postgres/backup-online.html
|
||||
|
||||
suggest to do
|
||||
|
||||
launch the script:
|
||||
|
||||
partial_wal_deliver.sh <PID> /mnt/server/partialdir <pg_xlog path>
|
||||
|
||||
~ this script will delivery each 10 seconds the "current" wal file,
|
||||
~ and touch the "alive" file in order to notify the spare node that
|
||||
~ the master node is up and running
|
||||
|
||||
|
||||
2) Spare node:
|
||||
create a recovery.conf with the line:
|
||||
|
||||
~ restore_command = 'restore.sh /mnt/server/archivedir/%f %p /mnt/server/partialdir'
|
||||
|
||||
~ replace the content of data directory with the backup performed at point 1,
|
||||
~ remove any file present in the pg_xlog directory ( leaving there the archive_status
|
||||
~ directory ) and remove the postmaster.pid file ( this is necessary if you are running
|
||||
~ the spare postgres on the same hw ).
|
||||
|
||||
~ launch the postmaster, the restore will continue till the "alive" file present in the
|
||||
~ /mnt/server/partialdir directory is not updated for 60 seconds ( you can modify this
|
||||
~ values inside the restore.sh script ).
|
||||
|
||||
Be sure that restore.sh and all directories involved are accessible
|
||||
|
||||
Let me know.
|
||||
|
||||
|
||||
This is a first step, of course, as Eric Kerin did, is better port these script
|
||||
in C and make it more robust.
|
||||
|
||||
Postgres can help this process, as suggested by Tom creating a pg_current_wal()
|
||||
or even better having two new GUC parameters: archive_current_wal_command and
|
||||
archive_current_wal_delay.
|
||||
|
||||
I problem I discover during the tests is that if you shut down the spare node
|
||||
and the restore_command is still waiting for a file then the postmaster will never
|
||||
exit :-(
|
||||
==========================================================================
|
||||
|
||||
I hope that is clear.
|
||||
|
||||
|
||||
|
||||
Regards
|
||||
Gaetano Mendola
|
||||
|
||||
|
||||
|
||||
--------------060900090803090101060101
|
||||
Content-Type: text/plain;
|
||||
name="restore.sh"
|
||||
Content-Transfer-Encoding: 7bit
|
||||
Content-Disposition: inline;
|
||||
filename="restore.sh"
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
|
||||
SOURCE=$1
|
||||
TARGET=$2
|
||||
PARTIAL=$3
|
||||
|
||||
SIZE_EXPECTED=16777216 #bytes 16 MB
|
||||
DIED_TIME=60 #seconds
|
||||
|
||||
function test_existence
|
||||
{
|
||||
if [ -f ${SOURCE} ]
|
||||
then
|
||||
COUNTER=0
|
||||
|
||||
#I have to check if the file is begin copied
|
||||
#I assume that it will reach the right
|
||||
#size in a few seconds
|
||||
|
||||
while [ $(stat -c '%s' ${SOURCE} ) -lt $SIZE_EXPECTED ]
|
||||
do
|
||||
sleep 1
|
||||
let COUNTER+=1
|
||||
if [ 20 -lt $COUNTER ]
|
||||
then
|
||||
exit 1 # BAILING OUT
|
||||
fi
|
||||
done
|
||||
|
||||
cp $SOURCE $TARGET
|
||||
exit 0
|
||||
fi
|
||||
echo ${SOURCE}"> not found"
|
||||
|
||||
#if is looking for a history file and not exist
|
||||
#I have suddenly exit
|
||||
echo $SOURCE | grep history > /dev/null 2>&1 && exit 1
|
||||
}
|
||||
|
||||
|
||||
while [ 1 ]
|
||||
do
|
||||
|
||||
test_existence
|
||||
|
||||
#CHECK IF THE MASTER IS ALIVE
|
||||
DELTA_TIME=$(( $( date +'%s' ) - $( stat -c '%Z' ${PARTIAL}/alive ) ))
|
||||
if [ $DIED_TIME -lt $DELTA_TIME ]
|
||||
then
|
||||
echo "Master is dead..."
|
||||
# Master is dead
|
||||
CURRENT_WAL=$( basename $SOURCE )
|
||||
echo "Partial: " ${PARTIAL}
|
||||
echo "Current wal: " ${CURRENT_WAL}
|
||||
echo "Target: " ${TARGET}
|
||||
cp ${PARTIAL}/${CURRENT_WAL}.partial ${TARGET} > /dev/null 2>&1 && exit 0
|
||||
exit 1
|
||||
fi
|
||||
|
||||
sleep 1
|
||||
|
||||
done
|
||||
|
||||
--------------060900090803090101060101
|
||||
Content-Type: text/plain;
|
||||
name="partial_wal_deliver.sh"
|
||||
Content-Transfer-Encoding: 7bit
|
||||
Content-Disposition: inline;
|
||||
filename="partial_wal_deliver.sh"
|
||||
|
||||
#!/bin/bash
|
||||
|
||||
PID=$1
|
||||
PARTIAL=$2
|
||||
PGXLOG=$3
|
||||
|
||||
function copy_last_wal
|
||||
{
|
||||
FILE=$( ls -t1p $PGXLOG | grep -v / | head -1 )
|
||||
|
||||
echo "Last Wal> " $FILE
|
||||
|
||||
cp ${PGXLOG}/${FILE} ${PARTIAL}/${FILE}.tmp
|
||||
mv ${PARTIAL}/${FILE}.tmp ${PARTIAL}/${FILE}.partial
|
||||
find ${PARTIAL} -name *.partial | grep -v ${FILE} | xargs -i rm -fr {}
|
||||
}
|
||||
|
||||
|
||||
while [ 1 ]
|
||||
do
|
||||
ps --pid $PID > /dev/null 2>&1
|
||||
ALIVE=$?
|
||||
|
||||
if [ "${ALIVE}" == "1" ]
|
||||
then
|
||||
#The process is dead
|
||||
echo "Process dead"
|
||||
copy_last_wal
|
||||
exit 1
|
||||
fi
|
||||
|
||||
#The process still exist
|
||||
touch ${PARTIAL}/alive
|
||||
copy_last_wal
|
||||
|
||||
sleep 10
|
||||
done
|
||||
|
||||
--------------060900090803090101060101
|
||||
Content-Type: text/plain
|
||||
Content-Disposition: inline
|
||||
Content-Transfer-Encoding: 8bit
|
||||
MIME-Version: 1.0
|
||||
|
||||
|
||||
---------------------------(end of broadcast)---------------------------
|
||||
TIP 5: Have you checked our extensive FAQ?
|
||||
|
||||
http://www.postgresql.org/docs/faqs/FAQ.html
|
||||
|
||||
--------------060900090803090101060101--
|
||||
|
||||
From pgsql-admin-owner+M15295=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 06:32:38 2004
|
||||
Return-path: <pgsql-admin-owner+M15295=pgman=candle.pha.pa.us@postgresql.org>
|
||||
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9M9Waf18397
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 05:32:36 -0400 (EDT)
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 9C9A532AC61
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 10:32:32 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 53654-01 for <pgman@candle.pha.pa.us>;
|
||||
Fri, 22 Oct 2004 09:32:26 +0000 (GMT)
|
||||
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 3132D32AC53
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 10:32:32 +0100 (BST)
|
||||
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id DC46E32A095
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 10:23:07 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 49812-03
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
|
||||
Fri, 22 Oct 2004 09:22:52 +0000 (GMT)
|
||||
Received: from cmailg3.svr.pol.co.uk (cmailg3.svr.pol.co.uk [195.92.195.173])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 5AEA6329F2F
|
||||
for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 10:22:57 +0100 (BST)
|
||||
Received: from modem-21.monkey.dialup.pol.co.uk ([217.135.208.21] helo=Nightingale)
|
||||
by cmailg3.svr.pol.co.uk with smtp (Exim 4.41)
|
||||
id 1CKvdM-0005eh-NO; Fri, 22 Oct 2004 10:22:53 +0100
|
||||
Message-ID: <011a01c4b818$b7370a20$06e887d9@Nightingale>
|
||||
From: "Simon Riggs" <simon@2ndquadrant.com>
|
||||
To: "Gaetano Mendola" <mendola@bigfoot.com>,
|
||||
"Robert Treat" <xzilla@users.sourceforge.net>, <pgsql-admin@postgresql.org>
|
||||
cc: <iain@mst.co.jp>
|
||||
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com>
|
||||
Subject: Re: [ADMIN] replication using WAL archives
|
||||
Date: Fri, 22 Oct 2004 10:22:54 +0100
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain;
|
||||
charset="iso-8859-1"
|
||||
Content-Transfer-Encoding: 7bit
|
||||
X-Priority: 3
|
||||
X-MSMail-Priority: Normal
|
||||
X-Mailer: Microsoft Outlook Express 6.00.2800.1409
|
||||
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Mailing-List: pgsql-admin
|
||||
Precedence: bulk
|
||||
Sender: pgsql-admin-owner@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
|
||||
candle.pha.pa.us
|
||||
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
|
||||
version=2.61
|
||||
Status: OR
|
||||
|
||||
> Gaetano Mendola wrote
|
||||
> Postgres can help this process, as suggested by Tom creating a
|
||||
pg_current_wal()
|
||||
> or even better having two new GUC parameters: archive_current_wal_command
|
||||
and
|
||||
> archive_current_wal_delay.
|
||||
|
||||
OK, we can modify the archiver to do this as well as the archive-when-full
|
||||
functionality. I'd already agreed to do something similar for 8.1
|
||||
|
||||
PROPOSAL:
|
||||
By default, archive_max_delay would be 10 seconds.
|
||||
By default, archive_current_wal_command is not set.
|
||||
If archive_current_wal_command is not set, the archiver will archive a file
|
||||
using archive_command only when the file is full.
|
||||
If archive_current_wal_command is set, the archiver would archive a file
|
||||
whichever of these occurs first...
|
||||
- it is full
|
||||
- the archive_max_delay timeout occurs (default: disabled)
|
||||
...as you can see I've renamed archive_current_wal_delay to reflect the fact
|
||||
that there is an interaction between the current mechanism (only when full)
|
||||
and this additional mechanism (no longer than X secs between log files).
|
||||
With that design, if the logs are being created quickly enough, then a
|
||||
partial log file is never created, only full ones.
|
||||
|
||||
When an xlog file is archived because it is full, then it is sent to both
|
||||
archive_current_wal_command and archive_command (in that order). When the
|
||||
timeout occurs and we have a partial xlog file, it would only be sent to
|
||||
archive_current_wal_command. It may also be desirable to not use
|
||||
archive_command at all, only to use archive_current_wal_command. That's not
|
||||
currently possible because archive_command is the switch by which all of the
|
||||
archive functioanlity is enabled, so you can't actually turn this off.
|
||||
|
||||
There is already a timeout feature designed into archiver for safety...so I
|
||||
can make that read the GUCs, above and act accordingly.
|
||||
|
||||
There is an unresolved resilience issue: if the archiver goes down (or
|
||||
whatever does the partial_wal copy functionality) then it it is possible
|
||||
that users will continue writing to the database and creating xlog records.
|
||||
It would be up to the user to define how to handle records that had been
|
||||
committed to the first database in the interim before cutover. It would also
|
||||
be up to the user to shut down the first node from the second - Shoot the
|
||||
Other Node in the Head, as its known. All of that is up to the second node,
|
||||
and as Tom says, is "the hard part"....I'm not proposing to do anything
|
||||
about that at this stage, since it is implementation dependant.
|
||||
|
||||
I was thinking perhaps to move to having variable size xlog files, since
|
||||
their contents are now variable - no padded records at EOF. If we did that,
|
||||
then the archiver could simply issue a "switch logfile" and then the
|
||||
archiver would cut in anyway to copy away the xlog. Having said that it is
|
||||
lots easier just to put a blind timeout in the archiver and copy the file -
|
||||
though I'm fairly uneasy about the point that we'd be ignoring the fact that
|
||||
many people are still writing to it. But I propose doing the easy way....
|
||||
|
||||
Thoughts?
|
||||
|
||||
= - = - =
|
||||
|
||||
Gaetano - skim-reading your script, how do you handle the situation when a
|
||||
new xlog file has been written within 10 seconds? That way the current file
|
||||
number will have jumped by 2, so when your script looks for the "Last wal"
|
||||
using head -1 it will find the N+2 and the intermediate file will never be
|
||||
copied. Looks like a problem to me...
|
||||
|
||||
> I problem I discover during the tests is that if you shut down the spare
|
||||
node
|
||||
> and the restore_command is still waiting for a file then the postmaster
|
||||
will never
|
||||
> exit :-(
|
||||
|
||||
Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...
|
||||
|
||||
Do we consider that to be desirable or not?
|
||||
|
||||
Best Regards, Simon Riggs
|
||||
|
||||
|
||||
|
||||
---------------------------(end of broadcast)---------------------------
|
||||
TIP 2: you can get off all lists at once with the unregister command
|
||||
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
|
||||
|
||||
From pgsql-admin-owner+M15302=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 13:56:14 2004
|
||||
Return-path: <pgsql-admin-owner+M15302=pgman=candle.pha.pa.us@postgresql.org>
|
||||
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MGuCf28637
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 12:56:13 -0400 (EDT)
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 54E77EAEDAA
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 17:55:51 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 86116-09 for <pgman@candle.pha.pa.us>;
|
||||
Fri, 22 Oct 2004 16:55:57 +0000 (GMT)
|
||||
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 0EC98EAEDA7
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 17:55:51 +0100 (BST)
|
||||
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 5DB98EAEDBE
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 17:45:13 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 82473-08
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
|
||||
Fri, 22 Oct 2004 16:45:11 +0000 (GMT)
|
||||
Received: from pns.mm.eutelsat.org (pns.mm.eutelsat.org [194.214.173.227])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id E49F0EAEDB5
|
||||
for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 17:45:00 +0100 (BST)
|
||||
Received: from nts-03.mm.eutelsat.org (localhost [127.0.0.1])
|
||||
by pns.mm.eutelsat.org (8.11.6/linuxconf) with ESMTP id i9MGh0U26124;
|
||||
Fri, 22 Oct 2004 18:43:01 +0200
|
||||
Received: from [127.0.0.1] (accesspoint.mm.eutelsat.org [194.214.173.4])
|
||||
by nts-03.mm.eutelsat.org (8.11.6/linuxconf) with ESMTP id i9MGj5f09681;
|
||||
Fri, 22 Oct 2004 18:45:05 +0200
|
||||
Message-ID: <4179390B.10700@bigfoot.com>
|
||||
Date: Fri, 22 Oct 2004 18:44:59 +0200
|
||||
From: Gaetano Mendola <mendola@bigfoot.com>
|
||||
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
|
||||
X-Accept-Language: en-us, en
|
||||
MIME-Version: 1.0
|
||||
To: Simon Riggs <simon@2ndquadrant.com>
|
||||
cc: Robert Treat <xzilla@users.sourceforge.net>, pgsql-admin@postgresql.org,
|
||||
iain@mst.co.jp
|
||||
Subject: Re: [ADMIN] replication using WAL archives
|
||||
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com> <011a01c4b818$b7370a20$06e887d9@Nightingale>
|
||||
In-Reply-To: <011a01c4b818$b7370a20$06e887d9@Nightingale>
|
||||
X-Enigmail-Version: 0.86.1.0
|
||||
X-Enigmail-Supports: pgp-inline, pgp-mime
|
||||
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
||||
Content-Transfer-Encoding: 7bit
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Mailing-List: pgsql-admin
|
||||
Precedence: bulk
|
||||
Sender: pgsql-admin-owner@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
|
||||
candle.pha.pa.us
|
||||
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
|
||||
version=2.61
|
||||
Status: OR
|
||||
|
||||
-----BEGIN PGP SIGNED MESSAGE-----
|
||||
Hash: SHA1
|
||||
|
||||
Simon Riggs wrote:
|
||||
|>Gaetano Mendola wrote
|
||||
|>Postgres can help this process, as suggested by Tom creating a
|
||||
|
|
||||
| pg_current_wal()
|
||||
|
|
||||
|>or even better having two new GUC parameters: archive_current_wal_command
|
||||
|
|
||||
| and
|
||||
|
|
||||
|>archive_current_wal_delay.
|
||||
|
|
||||
|
|
||||
| OK, we can modify the archiver to do this as well as the archive-when-full
|
||||
| functionality. I'd already agreed to do something similar for 8.1
|
||||
|
|
||||
| PROPOSAL:
|
||||
| By default, archive_max_delay would be 10 seconds.
|
||||
| By default, archive_current_wal_command is not set.
|
||||
| If archive_current_wal_command is not set, the archiver will archive a file
|
||||
| using archive_command only when the file is full.
|
||||
| If archive_current_wal_command is set, the archiver would archive a file
|
||||
| whichever of these occurs first...
|
||||
| - it is full
|
||||
| - the archive_max_delay timeout occurs (default: disabled)
|
||||
| ...as you can see I've renamed archive_current_wal_delay to reflect the fact
|
||||
| that there is an interaction between the current mechanism (only when full)
|
||||
| and this additional mechanism (no longer than X secs between log files).
|
||||
| With that design, if the logs are being created quickly enough, then a
|
||||
| partial log file is never created, only full ones.
|
||||
|
|
||||
| When an xlog file is archived because it is full, then it is sent to both
|
||||
| archive_current_wal_command and archive_command (in that order). When the
|
||||
| timeout occurs and we have a partial xlog file, it would only be sent to
|
||||
| archive_current_wal_command. It may also be desirable to not use
|
||||
| archive_command at all, only to use archive_current_wal_command. That's not
|
||||
| currently possible because archive_command is the switch by which all of the
|
||||
| archive functioanlity is enabled, so you can't actually turn this off.
|
||||
|
||||
It seems good to me, the script behind archive command can be a nop if someone
|
||||
want use the archive_current_wal_command
|
||||
|
||||
|
||||
| = - = - =
|
||||
|
|
||||
| Gaetano - skim-reading your script, how do you handle the situation when a
|
||||
| new xlog file has been written within 10 seconds? That way the current file
|
||||
| number will have jumped by 2, so when your script looks for the "Last wal"
|
||||
| using head -1 it will find the N+2 and the intermediate file will never be
|
||||
| copied. Looks like a problem to me...
|
||||
|
||||
|
||||
Yes, the only window failure I seen ( but I don't know if it's possible )
|
||||
|
||||
Master:
|
||||
~ log N created
|
||||
log N filled
|
||||
archive log N
|
||||
log N+1 created
|
||||
log N+1 filled
|
||||
~ log N+2 created
|
||||
~ <---- the master die here before to archive the log N+1
|
||||
~ archive log N+1
|
||||
|
||||
|
||||
in this case as you underline tha last log archived is the N and the N+2
|
||||
partial wal is added to archived wal collection and in the recovery fase
|
||||
the recovery stop after processing the log N.
|
||||
|
||||
Is it possible that the postmaster create the N+2 file without finish to archive
|
||||
the N+1 ? ( I suspect yes :-( )
|
||||
|
||||
The only cure I see here is to look for not archived WAL ( if possible ).
|
||||
|
||||
|
||||
|>I problem I discover during the tests is that if you shut down the spare
|
||||
|>node and the restore_command is still waiting for a file then the postmaster
|
||||
|>will never exit :-(
|
||||
|
|
||||
|
|
||||
| Hmm....Are you reporting this as a bug for 8.0? It's not on the bug list...
|
||||
|
||||
For me is a behave to avoid.
|
||||
|
||||
|
||||
|
||||
Regards
|
||||
Gaetano Mendola
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
-----BEGIN PGP SIGNATURE-----
|
||||
Version: GnuPG v1.2.5 (MingW32)
|
||||
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
|
||||
|
||||
iD8DBQFBeTkJ7UpzwH2SGd4RAsMxAKCbV7W+wrGBocf2Ftlthm0egAlIWACgp87L
|
||||
KU/YusyHuvT7jSFwZVKpP3M=
|
||||
=rWZx
|
||||
-----END PGP SIGNATURE-----
|
||||
|
||||
|
||||
---------------------------(end of broadcast)---------------------------
|
||||
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
|
||||
|
||||
From pgsql-admin-owner+M15303=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 14:43:36 2004
|
||||
Return-path: <pgsql-admin-owner+M15303=pgman=candle.pha.pa.us@postgresql.org>
|
||||
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MHhZf06453
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 13:43:35 -0400 (EDT)
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 01DD2EADBB7
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 18:43:13 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 01872-03 for <pgman@candle.pha.pa.us>;
|
||||
Fri, 22 Oct 2004 17:43:19 +0000 (GMT)
|
||||
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 9E633EADAD4
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 18:43:12 +0100 (BST)
|
||||
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id C1133EAED89
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 18:31:20 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 97130-03
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
|
||||
Fri, 22 Oct 2004 17:31:17 +0000 (GMT)
|
||||
Received: from cmailm2.svr.pol.co.uk (cmailm2.svr.pol.co.uk [195.92.193.210])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 276CAEADBBD
|
||||
for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 18:31:07 +0100 (BST)
|
||||
Received: from modem-558.snake.dialup.pol.co.uk ([62.137.114.46] helo=[192.168.0.102])
|
||||
by cmailm2.svr.pol.co.uk with esmtp (Exim 4.41)
|
||||
id 1CL3G3-0001Tx-K5; Fri, 22 Oct 2004 18:31:20 +0100
|
||||
Subject: Re: [ADMIN] replication using WAL archives
|
||||
From: Simon Riggs <simon@2ndquadrant.com>
|
||||
To: Gaetano Mendola <mendola@bigfoot.com>
|
||||
cc: Robert Treat <xzilla@users.sourceforge.net>, pgsql-admin@postgresql.org,
|
||||
iain@mst.co.jp
|
||||
In-Reply-To: <4179390B.10700@bigfoot.com>
|
||||
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb>
|
||||
<1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com>
|
||||
<011a01c4b818$b7370a20$06e887d9@Nightingale> <4179390B.10700@bigfoot.com>
|
||||
Content-Type: text/plain
|
||||
Organization: 2nd Quadrant
|
||||
Message-ID: <1098466150.20926.13.camel@localhost.localdomain>
|
||||
MIME-Version: 1.0
|
||||
X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2)
|
||||
Date: Fri, 22 Oct 2004 18:29:10 +0100
|
||||
Content-Transfer-Encoding: 7bit
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Mailing-List: pgsql-admin
|
||||
Precedence: bulk
|
||||
Sender: pgsql-admin-owner@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
|
||||
candle.pha.pa.us
|
||||
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
|
||||
version=2.61
|
||||
Status: OR
|
||||
|
||||
On Fri, 2004-10-22 at 17:44, Gaetano Mendola wrote:
|
||||
> | Gaetano - skim-reading your script, how do you handle the situation when a
|
||||
> | new xlog file has been written within 10 seconds? That way the current file
|
||||
> | number will have jumped by 2, so when your script looks for the "Last wal"
|
||||
> | using head -1 it will find the N+2 and the intermediate file will never be
|
||||
> | copied. Looks like a problem to me...
|
||||
>
|
||||
>
|
||||
> Yes, the only window failure I seen ( but I don't know if it's possible )
|
||||
>
|
||||
> Master:
|
||||
> ~ log N created
|
||||
> log N filled
|
||||
> archive log N
|
||||
> log N+1 created
|
||||
> log N+1 filled
|
||||
> ~ log N+2 created
|
||||
> ~ <---- the master die here before to archive the log N+1
|
||||
> ~ archive log N+1
|
||||
>
|
||||
>
|
||||
> in this case as you underline tha last log archived is the N and the N+2
|
||||
> partial wal is added to archived wal collection and in the recovery fase
|
||||
> the recovery stop after processing the log N.
|
||||
>
|
||||
> Is it possible that the postmaster create the N+2 file without finish to archive
|
||||
> the N+1 ? ( I suspect yes :-( )
|
||||
>
|
||||
> The only cure I see here is to look for not archived WAL ( if possible ).
|
||||
>
|
||||
|
||||
Hmm...well you aren't looking for archived wal, you're just looking at
|
||||
wal...which is a different thing...
|
||||
|
||||
Situation I thought I saw was:
|
||||
|
||||
- copy away current partial filled xlog N
|
||||
- xlog N fills, N+1 starts
|
||||
- xlog N+1 fills, N+2 starts
|
||||
- copy away current partial filled xlog: N+2 (+10 secs later)
|
||||
|
||||
i.e. if time to fill xlog (is ever) < time to copy away current xlog,
|
||||
then you miss one.
|
||||
|
||||
So problem: you can miss one and never know you've missed one until the
|
||||
recovery can't find it, which it never returns from...so it just hangs.
|
||||
|
||||
[Just so we're all clear: we're talking about Gaetano's script, not the
|
||||
PostgreSQL archver. The postgresql archiver doesn't do it that way, so
|
||||
it never misses one.]
|
||||
|
||||
--
|
||||
Best Regards, Simon Riggs
|
||||
|
||||
|
||||
---------------------------(end of broadcast)---------------------------
|
||||
TIP 9: the planner will ignore your desire to choose an index scan if your
|
||||
joining column's datatypes do not match
|
||||
|
||||
From pgsql-admin-owner+M15306=pgman=candle.pha.pa.us@postgresql.org Fri Oct 22 17:56:07 2004
|
||||
Return-path: <pgsql-admin-owner+M15306=pgman=candle.pha.pa.us@postgresql.org>
|
||||
Received: from svr1.postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by candle.pha.pa.us (8.11.6/8.11.6) with ESMTP id i9MKu6f05264
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 16:56:06 -0400 (EDT)
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 4F4C2EAE4AE
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 21:55:41 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 62857-05 for <pgman@candle.pha.pa.us>;
|
||||
Fri, 22 Oct 2004 20:55:48 +0000 (GMT)
|
||||
Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 095CEEAE4AC
|
||||
for <pgman@candle.pha.pa.us>; Fri, 22 Oct 2004 21:55:41 +0100 (BST)
|
||||
X-Original-To: pgsql-admin-postgresql.org@localhost.postgresql.org
|
||||
Received: from localhost (unknown [200.46.204.144])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 3FC9BEAE486
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>; Fri, 22 Oct 2004 21:50:48 +0100 (BST)
|
||||
Received: from svr1.postgresql.org ([200.46.204.71])
|
||||
by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
|
||||
with ESMTP id 62565-02
|
||||
for <pgsql-admin-postgresql.org@localhost.postgresql.org>;
|
||||
Fri, 22 Oct 2004 20:50:48 +0000 (GMT)
|
||||
Received: from news.hub.org (news.hub.org [200.46.204.72])
|
||||
by svr1.postgresql.org (Postfix) with ESMTP id 06C49EAE46B
|
||||
for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 21:50:40 +0100 (BST)
|
||||
Received: from news.hub.org (news.hub.org [200.46.204.72])
|
||||
by news.hub.org (8.12.9/8.12.9) with ESMTP id i9MKolJB062812
|
||||
for <pgsql-admin@postgresql.org>; Fri, 22 Oct 2004 20:50:48 GMT
|
||||
(envelope-from news@news.hub.org)
|
||||
Received: (from news@localhost)
|
||||
by news.hub.org (8.12.9/8.12.9/Submit) id i9MKoRHh062731
|
||||
for pgsql-admin@postgresql.org; Fri, 22 Oct 2004 20:50:27 GMT
|
||||
From: Gaetano Mendola <mendola@bigfoot.com>
|
||||
X-Newsgroups: comp.databases.postgresql.admin
|
||||
Subject: Re: [ADMIN] replication using WAL archives
|
||||
Date: Fri, 22 Oct 2004 22:50:34 +0200
|
||||
Organization: PYRENET Midi-pyrenees Provider
|
||||
Lines: 39
|
||||
Message-ID: <4179729A.5020401@bigfoot.com>
|
||||
References: <002801c4b739$68450870$7201a8c0@mst1x5r347kymb> <1098384082.15573.14.camel@camel> <41782EEF.5040708@bigfoot.com> <011a01c4b818$b7370a20$06e887d9@Nightingale> <4179390B.10700@bigfoot.com> <1098466150.20926.13.camel@localhost.localdomain>
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
|
||||
Content-Transfer-Encoding: 7bit
|
||||
X-Complaints-To: abuse@pyrenet.fr
|
||||
To: Simon Riggs <simon@2ndquadrant.com>
|
||||
User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913)
|
||||
X-Accept-Language: en-us, en
|
||||
In-Reply-To: <1098466150.20926.13.camel@localhost.localdomain>
|
||||
X-Enigmail-Version: 0.86.1.0
|
||||
X-Enigmail-Supports: pgp-inline, pgp-mime
|
||||
To: pgsql-admin@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Mailing-List: pgsql-admin
|
||||
Precedence: bulk
|
||||
Sender: pgsql-admin-owner@postgresql.org
|
||||
X-Virus-Scanned: by amavisd-new at hub.org
|
||||
X-Spam-Checker-Version: SpamAssassin 2.61 (1.212.2.1-2003-12-09-exp) on
|
||||
candle.pha.pa.us
|
||||
X-Spam-Status: No, hits=-4.9 required=5.0 tests=BAYES_00 autolearn=ham
|
||||
version=2.61
|
||||
Status: OR
|
||||
|
||||
Simon Riggs wrote:
|
||||
|
||||
> Situation I thought I saw was:
|
||||
>
|
||||
> - copy away current partial filled xlog N
|
||||
> - xlog N fills, N+1 starts
|
||||
> - xlog N+1 fills, N+2 starts
|
||||
> - copy away current partial filled xlog: N+2 (+10 secs later)
|
||||
>
|
||||
> i.e. if time to fill xlog (is ever) < time to copy away current xlog,
|
||||
> then you miss one.
|
||||
>
|
||||
> So problem: you can miss one and never know you've missed one until the
|
||||
> recovery can't find it, which it never returns from...so it just hangs.
|
||||
|
||||
No. The restore.sh is not smart enough to know the last wal that must be
|
||||
replayed, the only "smart thing" is to copy the supposed "current wal" in the
|
||||
archive directory.
|
||||
|
||||
The script hang (and is a feature not a bug) if and only if the master is alive
|
||||
( at least I'm not seeing any other hang ).
|
||||
|
||||
In your example in the archived directory will be present the files until logN
|
||||
and logN+2 ( the current wal ) is in the partial directory, if the master die,
|
||||
the restore.sh will copy logN+2 in the archived directory, the spare node will
|
||||
execute restore.sh with file logN+1 as argument and if is not found then the
|
||||
restore.sh will exit.
|
||||
|
||||
|
||||
Regards
|
||||
Gaetano Mendola
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
---------------------------(end of broadcast)---------------------------
|
||||
TIP 8: explain analyze is your friend
|
||||
|
1725
doc/TODO.detail/psql
1725
doc/TODO.detail/psql
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
2791
doc/TODO.detail/wal
2791
doc/TODO.detail/wal
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue
Block a user