Document Warm Standby for High Availability
Includes sample standby script. Simon Riggs
This commit is contained in:
parent
075c0caa90
commit
5e550acbc4
@ -1,4 +1,4 @@
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.84 2006/09/15 21:55:07 momjian Exp $ -->
|
||||
<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.85 2006/09/15 22:02:21 momjian Exp $ -->
|
||||
|
||||
<chapter id="backup">
|
||||
<title>Backup and Restore</title>
|
||||
@ -1203,6 +1203,312 @@ restore_command = 'copy /mnt/server/archivedir/%f "%p"' # Windows
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="warm-standby">
|
||||
<title>Warm Standby Servers for High Availability</title>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>Warm Standby</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>PITR Standby</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>Standby Server</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>Log Shipping</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>Witness Server</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>STONITH</primary>
|
||||
</indexterm>
|
||||
|
||||
<indexterm zone="backup">
|
||||
<primary>High Availability</primary>
|
||||
</indexterm>
|
||||
|
||||
<para>
|
||||
Continuous Archiving can be used to create a High Availability (HA)
|
||||
cluster configuration with one or more Standby Servers ready to take
|
||||
over operations in the case that the Primary Server fails. This
|
||||
capability is more widely known as Warm Standby Log Shipping.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The Primary and Standby Server work together to provide this capability,
|
||||
though the servers are only loosely coupled. The Primary Server operates
|
||||
in Continuous Archiving mode, while the Standby Server operates in a
|
||||
continuous Recovery mode, reading the WAL files from the Primary. No
|
||||
changes to the database tables are required to enable this capability,
|
||||
so it offers a low administration overhead in comparison with other
|
||||
replication approaches. This configuration also has a very low
|
||||
performance impact on the Primary server.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Directly moving WAL or "log" records from one database server to another
|
||||
is typically described as Log Shipping. PostgreSQL implements file-based
|
||||
Log Shipping, meaning WAL records are batched one file at a time. WAL
|
||||
files can be shipped easily and cheaply over any distance, whether it be
|
||||
to an adjacent system, another system on the same site or another system
|
||||
on the far side of the globe. The bandwidth required for this technique
|
||||
varies according to the transaction rate of the Primary Server.
|
||||
Record-based Log Shipping is also possible with custom-developed
|
||||
procedures, discussed in a later section. Future developments are likely
|
||||
to include options for synchronous and/or integrated record-based log
|
||||
shipping.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
It should be noted that the log shipping is asynchronous, i.e. the WAL
|
||||
records are shipped after transaction commit. As a result there can be a
|
||||
small window of data loss, should the Primary Server suffer a
|
||||
catastrophic failure. The window of data loss is minimised by the use of
|
||||
the archive_timeout parameter, which can be set as low as a few seconds
|
||||
if required. A very low setting can increase the bandwidth requirements
|
||||
for file shipping.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The Standby server is not available for access, since it is continually
|
||||
performing recovery processing. Recovery performance is sufficiently
|
||||
good that the Standby will typically be only minutes away from full
|
||||
availability once it has been activated. As a result, we refer to this
|
||||
capability as a Warm Standby configuration that offers High
|
||||
Availability. Restoring a server from an archived base backup and
|
||||
rollforward can take considerably longer and so that technique only
|
||||
really offers a solution for Disaster Recovery, not HA.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Other mechanisms for High Availability replication are available, both
|
||||
commercially and as open-source software.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In general, log shipping between servers running different release
|
||||
levels will not be possible. It is the policy of the PostgreSQL Worldwide
|
||||
Development Group not to make changes to disk formats during minor release
|
||||
upgrades, so it is likely that running different minor release levels
|
||||
on Primary and Standby servers will work successfully. However, no
|
||||
formal support for that is offered and you are advised not to allow this
|
||||
to occur over long periods.
|
||||
</para>
|
||||
|
||||
<sect2 id="warm-standby-planning">
|
||||
<title>Planning</title>
|
||||
|
||||
<para>
|
||||
On the Standby server all tablespaces and paths will refer to similarly
|
||||
named mount points, so it is important to create the Primary and Standby
|
||||
servers so that they are as similar as possible, at least from the
|
||||
perspective of the database server. Furthermore, any CREATE TABLESPACE
|
||||
commands will be passed across as-is, so any new mount points must be
|
||||
created on both servers before they are used on the Primary. Hardware
|
||||
need not be the same, but experience shows that maintaining two
|
||||
identical systems is easier than maintaining two dissimilar ones over
|
||||
the whole lifetime of the application and system.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There is no special mode required to enable a Standby server. The
|
||||
operations that occur on both Primary and Standby servers are entirely
|
||||
normal continuous archiving and recovery tasks. The primary point of
|
||||
contact between the two database servers is the archive of WAL files
|
||||
that both share: Primary writing to the archive, Standby reading from
|
||||
the archive. Care must be taken to ensure that WAL archives for separate
|
||||
servers do not become mixed together or confused.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The magic that makes the two loosely coupled servers work together is
|
||||
simply a restore_command that waits for the next WAL file to be archived
|
||||
from the Primary. The restore_command is specified in the recovery.conf
|
||||
file on the Standby Server. Normal recovery processing would request a
|
||||
file from the WAL archive, causing an error if the file was unavailable.
|
||||
For Standby processing it is normal for the next file to be unavailable,
|
||||
so we must be patient and wait for it to appear. A waiting
|
||||
restore_command can be written as a custom script that loops after
|
||||
polling for the existence of the next WAL file. There must also be some
|
||||
way to trigger failover, which should interrupt the restore_command,
|
||||
break the loop and return a file not found error to the Standby Server.
|
||||
This then ends recovery and the Standby will then come up as a normal
|
||||
server.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Sample code for the C version of the restore_command would be be:
|
||||
<programlisting>
|
||||
triggered = false;
|
||||
while (!NextWALFileReady() && !triggered)
|
||||
{
|
||||
sleep(100000L); // wait for ~0.1 sec
|
||||
if (CheckForExternalTrigger())
|
||||
triggered = true;
|
||||
}
|
||||
if (!triggered)
|
||||
CopyWALFileForRecovery();
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
PostgreSQL does not provide the system software required to identify a
|
||||
failure on the Primary and notify the Standby system and then the
|
||||
Standby database server. Many such tools exist and are well integrated
|
||||
with other aspects of a system failover, such as ip address migration.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Triggering failover is an important part of planning and design. The
|
||||
restore_command is executed in full once for each WAL file. The process
|
||||
running the restore_command is therefore created and dies for each file,
|
||||
so there is no daemon or server process and so we cannot use signals and
|
||||
a signal handler. A more permanent notification is required to trigger
|
||||
the failover. It is possible to use a simple timeout facility,
|
||||
especially if used in conjunction with a known archive_timeout setting
|
||||
on the Primary. This is somewhat error prone since a network or busy
|
||||
Primary server might be sufficient to initiate failover. A notification
|
||||
mechanism such as the explicit creation of a trigger file is less error
|
||||
prone, if this can be arranged.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="warm-standby-config">
|
||||
<title>Implementation</title>
|
||||
|
||||
<para>
|
||||
The short procedure for configuring a Standby Server is as follows. For
|
||||
full details of each step, refer to previous sections as noted.
|
||||
<orderedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
Set up Primary and Standby systems as near identically as possible,
|
||||
including two identical copies of PostgreSQL at same release level.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Set up Continuous Archiving from the Primary to a WAL archive located
|
||||
in a directory on the Standby Server. Ensure that both <xref
|
||||
linkend="guc-archive-command"> and <xref linkend="guc-archive-timeout">
|
||||
are set. (See <xref linkend="backup-archiving-wal">)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Make a Base Backup of the Primary Server. (See <xref
|
||||
linkend="backup-base-backup">)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
Begin recovery on the Standby Server from the local WAL archive,
|
||||
using a recovery.conf that specifies a restore_command that waits as
|
||||
described previously. (See <xref linkend="backup-pitr-recovery">)
|
||||
</para>
|
||||
</listitem>
|
||||
</orderedlist>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Recovery treats the WAL Archive as read-only, so once a WAL file has
|
||||
been copied to the Standby system it can be copied to tape at the same
|
||||
time as it is being used by the Standby database server to recover.
|
||||
Thus, running a Standby Server for High Availability can be performed at
|
||||
the same time as files are stored for longer term Disaster Recovery
|
||||
purposes.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For testing purposes, it is possible to run both Primary and Standby
|
||||
servers on the same system. This does not provide any worthwhile
|
||||
improvement on server robustness, nor would it be described as HA.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="warm-standby-failover">
|
||||
<title>Failover</title>
|
||||
|
||||
<para>
|
||||
If the Primary Server fails then the Standby Server should take begin
|
||||
failover procedures.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If the Standby Server fails then no failover need take place. If the
|
||||
Standby Server can be restarted, then the recovery process can also be
|
||||
immediately restarted, taking advantage of Restartable Recovery.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If the Primary Server fails and then immediately restarts, you must have
|
||||
a mechanism for informing it that it is no longer the Primary. This is
|
||||
sometimes known as STONITH (Should the Other Node In The Head), which is
|
||||
necessary to avoid situations where both systems think they are the
|
||||
Primary, which can lead to confusion and ultimately data loss.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Many failover systems use just two systems, the Primary and the Standby,
|
||||
connected by some kind of heartbeat mechanism to continually verify the
|
||||
connectivity between the two and the viability of the Primary. It is
|
||||
also possible to use a third system, known as a Witness Server to avoid
|
||||
some problems of inappropriate failover, but the additional complexity
|
||||
may not be worthwhile unless it is set-up with sufficient care and
|
||||
rigorous testing.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
At the instant that failover takes place to the Standby, we have only a
|
||||
single server in operation. This is known as a degenerate state.
|
||||
The former Standby is now the Primary, but the former Primary is down
|
||||
and may stay down. We must now fully re-create a Standby server,
|
||||
either on the former Primary system when it comes up, or on a third,
|
||||
possibly new, system. Once complete the Primary and Standby can be
|
||||
considered to have switched roles. Some people choose to use a third
|
||||
server to provide additional protection across the failover interval,
|
||||
though clearly this complicates the system configuration and
|
||||
operational processes (and this can also act as a Witness Server).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
So, switching from Primary to Standby Server can be fast, but requires
|
||||
some time to re-prepare the failover cluster. Regular switching from
|
||||
Primary to Standby is encouraged, since it allows the regular downtime
|
||||
one each system required to maintain HA. This also acts as a test of the
|
||||
failover so that it definitely works when you really need it. Written
|
||||
administration procedures are advised.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="warm-standby-record">
|
||||
<title>Implementing Record-based Log Shipping</title>
|
||||
|
||||
<para>
|
||||
The main features for Log Shipping in this release are based around the
|
||||
file-based Log Shipping described above. It is also possible to
|
||||
implement record-based Log Shipping using the pg_xlogfile_name_offset()
|
||||
function, though this requires custom development.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
An external program can call pg_xlogfile_name_offset() to find out the
|
||||
filename and the exact byte offset within it of the latest WAL pointer.
|
||||
If the external program regularly polls the server it can find out how
|
||||
far forward the pointer has moved. It can then access the WAL file
|
||||
directly and copy those bytes across to a less up-to-date copy on a
|
||||
Standby Server.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
||||
<sect1 id="migration">
|
||||
<title>Migration Between Releases</title>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user