Reconfigure failover/replication doc items to be varlist entries, rather

than new sections, so they appear all on the same web page.
This commit is contained in:
Bruce Momjian 2006-11-16 21:43:33 +00:00
parent c7a6046a59
commit a1e5b5c832

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.7 2006/11/16 18:25:58 momjian Exp $ --> <!-- $PostgreSQL: pgsql/doc/src/sgml/failover.sgml,v 1.8 2006/11/16 21:43:33 momjian Exp $ -->
<chapter id="failover"> <chapter id="failover">
<title>Failover, Replication, Load Balancing, and Clustering Options</title> <title>Failover, Replication, Load Balancing, and Clustering Options</title>
@ -76,167 +76,186 @@
and load balancing solutions. and load balancing solutions.
</para> </para>
<sect1 id="shared-disk-failover"> <variablelist>
<title>Shared Disk Failover</title>
<para> <varlistentry>
Shared disk failover avoids synchronization overhead by having only one <term>Shared Disk Failover</term>
copy of the database. It uses a single disk array that is shared by <listitem>
multiple servers. If the main database server fails, the backup server
is able to mount and start the database as though it was recovering from
a database crash. This allows rapid failover with no data loss.
</para>
<para> <para>
Shared hardware functionality is common in network storage devices. One Shared disk failover avoids synchronization overhead by having only one
significant limitation of this method is that if the shared disk array copy of the database. It uses a single disk array that is shared by
fails or becomes corrupt, the primary and backup servers are both multiple servers. If the main database server fails, the backup server
nonfunctional. is able to mount and start the database as though it was recovering from
</para> a database crash. This allows rapid failover with no data loss.
</sect1> </para>
<sect1 id="warm-standby-using-point-in-time-recovery"> <para>
<title>Warm Standby Using Point-In-Time Recovery</title> Shared hardware functionality is common in network storage devices. One
significant limitation of this method is that if the shared disk array
fails or becomes corrupt, the primary and backup servers are both
nonfunctional.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
A warm standby server (see <xref linkend="warm-standby">) can <term>Warm Standby Using Point-In-Time Recovery</term>
be kept current by reading a stream of write-ahead log (WAL) <listitem>
records. If the main server fails, the warm standby contains
almost all of the data of the main server, and can be quickly
made the new master database server. This is asynchronous and
can only be done for the entire database server.
</para>
</sect1>
<sect1 id="continuously-running-replication-server"> <para>
<title>Continuously Running Replication Server</title> A warm standby server (see <xref linkend="warm-standby">) can
be kept current by reading a stream of write-ahead log (WAL)
records. If the main server fails, the warm standby contains
almost all of the data of the main server, and can be quickly
made the new master database server. This is asynchronous and
can only be done for the entire database server.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
A continuously running replication server allows the backup server to <term>Continuously Running Replication Server</term>
answer read-only queries while the master server is running. It <listitem>
receives a continuous stream of write activity from the master server.
Because the backup server can be used for read-only database requests,
it is ideal for data warehouse queries.
</para>
<para> <para>
Slony-I is an example of this type of replication, with per-table A continuously running replication server allows the backup server to
granularity. It updates the backup server in batches, so the replication answer read-only queries while the master server is running. It
is asynchronous and might lose data during a fail over. receives a continuous stream of write activity from the master server.
</para> Because the backup server can be used for read-only database requests,
</sect1> it is ideal for data warehouse queries.
</para>
<sect1 id="data-partitioning"> <para>
<title>Data Partitioning</title> Slony-I is an example of this type of replication, with per-table
granularity. It updates the backup server in batches, so the replication
is asynchronous and might lose data during a fail over.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
Data partitioning splits tables into data sets. Each set can <term>Data Partitioning</term>
be modified by only one server. For example, data can be <listitem>
partitioned by offices, e.g. London and Paris. While London
and Paris servers have all data records, only London can modify
London records, and Paris can only modify Paris records. This
is similar to section <xref
linkend="continuously-running-replication-server"> above, except
that instead of having a read/write server and a read-only server,
each server has a read/write data set and a read-only data
set.
</para>
<para> <para>
Such partitioning provides both failover and load balancing. Failover Data partitioning splits tables into data sets. Each set can
is achieved because the data resides on both servers, and this is an be modified by only one server. For example, data can be
ideal way to enable failover if the servers share a slow communication partitioned by offices, e.g. London and Paris. While London
channel. Load balancing is possible because read requests can go to any and Paris servers have all data records, only London can modify
of the servers, and write requests are split among the servers. Of London records, and Paris can only modify Paris records. This
course, the communication to keep all the servers up-to-date adds is similar to the "Continuously Running Replication Server"
overhead, so ideally the write load should be low, or localized as in item above, except that instead of having a read/write server
the London/Paris example above. and a read-only server, each server has a read/write data set
</para> and a read-only data set.
</para>
<para> <para>
Data partitioning is usually handled by application code, though rules Such partitioning provides both failover and load balancing. Failover
and triggers can be used to keep the read-only data sets current. Slony-I is achieved because the data resides on both servers, and this is an
can also be used in such a setup. While Slony-I replicates only entire ideal way to enable failover if the servers share a slow communication
tables, London and Paris can be placed in separate tables, and channel. Load balancing is possible because read requests can go to any
inheritance can be used to access both tables using a single table name. of the servers, and write requests are split among the servers. Of
</para> course, the communication to keep all the servers up-to-date adds
</sect1> overhead, so ideally the write load should be low, or localized as in
the London/Paris example above.
</para>
<sect1 id="query-broadcast-load-balancing"> <para>
<title>Query Broadcast Load Balancing</title> Data partitioning is usually handled by application code, though rules
and triggers can be used to keep the read-only data sets current. Slony-I
can also be used in such a setup. While Slony-I replicates only entire
tables, London and Paris can be placed in separate tables, and
inheritance can be used to access both tables using a single table name.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
Query broadcast load balancing is accomplished by having a <term>Query Broadcast Load Balancing</term>
program intercept every SQL query and send it to all servers. <listitem>
This is unique because most replication solutions have the write
server propagate its changes to the other servers. With query
broadcasting, each server operates independently. Read-only
queries can be sent to a single server because there is no need
for all servers to process it.
</para>
<para> <para>
One limitation of this solution is that functions like Query broadcast load balancing is accomplished by having a
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and program intercept every SQL query and send it to all servers.
sequences can have different values on different servers. This This is unique because most replication solutions have the write
is because each server operates independently, and because SQL server propagate its changes to the other servers. With query
queries are broadcast (and not actual modified rows). If this broadcasting, each server operates independently. Read-only
is unacceptable, applications must query such values from a queries can be sent to a single server because there is no need
single server and then use those values in write queries. Also, for all servers to process it.
care must be taken that all transactions either commit or abort </para>
on all servers Pgpool is an example of this type of replication.
</para>
</sect1>
<sect1 id="clustering-for-load-balancing"> <para>
<title>Clustering For Load Balancing</title> One limitation of this solution is that functions like
<function>random()</>, <function>CURRENT_TIMESTAMP</>, and
sequences can have different values on different servers. This
is because each server operates independently, and because SQL
queries are broadcast (and not actual modified rows). If this
is unacceptable, applications must query such values from a
single server and then use those values in write queries. Also,
care must be taken that all transactions either commit or abort
on all servers Pgpool is an example of this type of replication.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
In clustering, each server can accept write requests, and modified <term>Clustering For Load Balancing</term>
data is transmitted from the original server to every other <listitem>
server before each transaction commits. Heavy write activity
can cause excessive locking, leading to poor performance. In
fact, write performance is often worse than that of a single
server. Read requests can be sent to any server. Clustering
is best for mostly read workloads, though its big advantage is
that any server can accept write requests &mdash; there is no need
to partition workloads between read/write and read-only servers.
</para>
<para> <para>
Clustering is implemented by <productname>Oracle</> in their In clustering, each server can accept write requests, and modified
<productname><acronym>RAC</></> product. <productname>PostgreSQL</> data is transmitted from the original server to every other
does not offer this type of load balancing, though server before each transaction commits. Heavy write activity
<productname>PostgreSQL</> two-phase commit (<xref can cause excessive locking, leading to poor performance. In
linkend="sql-prepare-transaction" fact, write performance is often worse than that of a single
endterm="sql-prepare-transaction-title"> and <xref server. Read requests can be sent to any server. Clustering
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">) is best for mostly read workloads, though its big advantage is
can be used to implement this in application code or middleware. that any server can accept write requests &mdash; there is no need
</para> to partition workloads between read/write and read-only servers.
</sect1> </para>
<sect1 id="clustering-for-parallel-query-execution"> <para>
<title>Clustering For Parallel Query Execution</title> Clustering is implemented by <productname>Oracle</> in their
<productname><acronym>RAC</></> product. <productname>PostgreSQL</>
does not offer this type of load balancing, though
<productname>PostgreSQL</> two-phase commit (<xref
linkend="sql-prepare-transaction"
endterm="sql-prepare-transaction-title"> and <xref
linkend="sql-commit-prepared" endterm="sql-commit-prepared-title">)
can be used to implement this in application code or middleware.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
This allows multiple servers to work concurrently on a single <term>Clustering For Parallel Query Execution</term>
query. One possible way this could work is for the data to be <listitem>
split among servers and for each server to execute its part of
the query and results sent to a central server to be combined
and returned to the user. There currently is no
<productname>PostgreSQL</> open source solution for this.
</para>
</sect1>
<sect1 id="commercial-solutions"> <para>
<title>Commercial Solutions</title> This allows multiple servers to work concurrently on a single
query. One possible way this could work is for the data to be
split among servers and for each server to execute its part of
the query and results sent to a central server to be combined
and returned to the user. There currently is no
<productname>PostgreSQL</> open source solution for this.
</para>
</listitem>
</varlistentry>
<para> <varlistentry>
Because <productname>PostgreSQL</> is open source and easily <term>Commercial Solutions</term>
extended, a number of companies have taken <productname>PostgreSQL</> <listitem>
and created commercial closed-source solutions with unique
failover, replication, and load balancing capabilities. <para>
</para> Because <productname>PostgreSQL</> is open source and easily
</sect1> extended, a number of companies have taken <productname>PostgreSQL</>
and created commercial closed-source solutions with unique
failover, replication, and load balancing capabilities.
</para>
</listitem>
</varlistentry>
</variablelist>
</chapter> </chapter>