Improvements to the backup & restore documentation.
This commit is contained in:
parent
e3391133ae
commit
2ff4e44043
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.38 2004/03/09 16:57:46 neilc Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.39 2004/04/22 07:02:35 neilc Exp $
|
||||
-->
|
||||
<chapter id="backup">
|
||||
<title>Backup and Restore</title>
|
||||
@ -30,7 +30,7 @@ $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.38 2004/03/09 16:57:46 neilc Exp
|
||||
commands that, when fed back to the server, will recreate the
|
||||
database in the same state as it was at the time of the dump.
|
||||
<productname>PostgreSQL</> provides the utility program
|
||||
<application>pg_dump</> for this purpose. The basic usage of this
|
||||
<xref linkend="app-pgdump"> for this purpose. The basic usage of this
|
||||
command is:
|
||||
<synopsis>
|
||||
pg_dump <replaceable class="parameter">dbname</replaceable> > <replaceable class="parameter">outfile</replaceable>
|
||||
@ -126,10 +126,11 @@ psql <replaceable class="parameter">dbname</replaceable> < <replaceable class
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Once restored, it is wise to run <command>ANALYZE</> on each
|
||||
database so the optimizer has useful statistics. You
|
||||
can also run <command>vacuumdb -a -z</> to <command>ANALYZE</> all
|
||||
databases.
|
||||
Once restored, it is wise to run <xref linkend="sql-analyze"
|
||||
endterm="sql-analyze-title"> on each database so the optimizer has
|
||||
useful statistics. You can also run <command>vacuumdb -a -z</> to
|
||||
<command>VACUUM ANALYZE</> all databases; this is equivalent to
|
||||
running <command>VACUUM ANALYZE</command> manually.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -153,13 +154,11 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
|
||||
</para>
|
||||
</important>
|
||||
|
||||
<tip>
|
||||
<para>
|
||||
Restore performance can be improved by increasing the
|
||||
configuration parameter <xref
|
||||
linkend="guc-maintenance-work-mem">.
|
||||
</para>
|
||||
</tip>
|
||||
<para>
|
||||
For advice on how to load large amounts of data into
|
||||
<productname>PostgreSQL</productname> efficiently, refer to <xref
|
||||
linkend="populate">.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="backup-dump-all">
|
||||
@ -167,12 +166,11 @@ pg_dump -h <replaceable>host1</> <replaceable>dbname</> | psql -h <replaceable>h
|
||||
|
||||
<para>
|
||||
The above mechanism is cumbersome and inappropriate when backing
|
||||
up an entire database cluster. For this reason the
|
||||
<application>pg_dumpall</> program is provided.
|
||||
up an entire database cluster. For this reason the <xref
|
||||
linkend="app-pg-dumpall"> program is provided.
|
||||
<application>pg_dumpall</> backs up each database in a given
|
||||
cluster, and also preserves cluster-wide data such as
|
||||
users and groups. The call sequence for
|
||||
<application>pg_dumpall</> is simply
|
||||
cluster, and also preserves cluster-wide data such as users and
|
||||
groups. The basic usage of this command is:
|
||||
<synopsis>
|
||||
pg_dumpall > <replaceable>outfile</>
|
||||
</synopsis>
|
||||
@ -195,7 +193,7 @@ psql template1 < <replaceable class="parameter">infile</replaceable>
|
||||
Since <productname>PostgreSQL</productname> allows tables larger
|
||||
than the maximum file size on your system, it can be problematic
|
||||
to dump such a table to a file, since the resulting file will likely
|
||||
be larger than the maximum size allowed by your system. As
|
||||
be larger than the maximum size allowed by your system. Since
|
||||
<application>pg_dump</> can write to the standard output, you can
|
||||
just use standard Unix tools to work around this possible problem.
|
||||
</para>
|
||||
@ -274,7 +272,7 @@ pg_dump -Fc <replaceable class="parameter">dbname</replaceable> > <replaceable c
|
||||
For reasons of backward compatibility, <application>pg_dump</>
|
||||
does not dump large objects by default.<indexterm><primary>large
|
||||
object</primary><secondary>backup</secondary></indexterm> To dump
|
||||
large objects you must use either the custom or the TAR output
|
||||
large objects you must use either the custom or the tar output
|
||||
format, and use the <option>-b</> option in
|
||||
<application>pg_dump</>. See the reference pages for details. The
|
||||
directory <filename>contrib/pg_dumplo</> of the
|
||||
@ -315,11 +313,12 @@ tar -cf backup.tar /usr/local/pgsql/data
|
||||
<para>
|
||||
The database server <emphasis>must</> be shut down in order to
|
||||
get a usable backup. Half-way measures such as disallowing all
|
||||
connections will not work as there is always some buffering
|
||||
going on. Information about stopping the server can be
|
||||
found in <xref linkend="postmaster-shutdown">. Needless to say
|
||||
that you also need to shut down the server before restoring the
|
||||
data.
|
||||
connections will <emphasis>not</emphasis> work
|
||||
(<command>tar</command> and similar tools do not take an atomic
|
||||
snapshot of the state of the filesystem at a point in
|
||||
time). Information about stopping the server can be found in
|
||||
<xref linkend="postmaster-shutdown">. Needless to say that you
|
||||
also need to shut down the server before restoring the data.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
|
@ -1,5 +1,5 @@
|
||||
<!--
|
||||
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.43 2004/03/25 18:57:57 tgl Exp $
|
||||
$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.44 2004/04/22 07:02:36 neilc Exp $
|
||||
-->
|
||||
|
||||
<chapter id="performance-tips">
|
||||
@ -28,8 +28,8 @@ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.43 2004/03/25 18:57:57 tgl Exp
|
||||
plan</firstterm> for each query it is given. Choosing the right
|
||||
plan to match the query structure and the properties of the data
|
||||
is absolutely critical for good performance. You can use the
|
||||
<command>EXPLAIN</command> command to see what query plan the system
|
||||
creates for any query.
|
||||
<xref linkend="sql-explain" endterm="sql-explain-title"> command
|
||||
to see what query plan the system creates for any query.
|
||||
Plan-reading is an art that deserves an extensive tutorial, which
|
||||
this is not; but here is some basic information.
|
||||
</para>
|
||||
@ -638,30 +638,51 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
|
||||
</indexterm>
|
||||
|
||||
<para>
|
||||
Turn off autocommit and just do one commit at
|
||||
the end. (In plain SQL, this means issuing <command>BEGIN</command>
|
||||
at the start and <command>COMMIT</command> at the end. Some client
|
||||
libraries may do this behind your back, in which case you need to
|
||||
make sure the library does it when you want it done.)
|
||||
If you allow each insertion to be committed separately,
|
||||
<productname>PostgreSQL</productname> is doing a lot of work for each
|
||||
row that is added.
|
||||
An additional benefit of doing all insertions in one transaction
|
||||
is that if the insertion of one row were to fail then the
|
||||
insertion of all rows inserted up to that point would be rolled
|
||||
back, so you won't be stuck with partially loaded data.
|
||||
Turn off autocommit and just do one commit at the end. (In plain
|
||||
SQL, this means issuing <command>BEGIN</command> at the start and
|
||||
<command>COMMIT</command> at the end. Some client libraries may
|
||||
do this behind your back, in which case you need to make sure the
|
||||
library does it when you want it done.) If you allow each
|
||||
insertion to be committed separately,
|
||||
<productname>PostgreSQL</productname> is doing a lot of work for
|
||||
each row that is added. An additional benefit of doing all
|
||||
insertions in one transaction is that if the insertion of one row
|
||||
were to fail then the insertion of all rows inserted up to that
|
||||
point would be rolled back, so you won't be stuck with partially
|
||||
loaded data.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If you are issuing a large sequence of <command>INSERT</command>
|
||||
commands to bulk load some data, also consider using <xref
|
||||
linkend="sql-prepare" endterm="sql-prepare-title"> to create a
|
||||
prepared <command>INSERT</command> statement. Since you are
|
||||
executing the same command multiple times, it is more efficient to
|
||||
prepare the command once and then use <command>EXECUTE</command>
|
||||
as many times as required.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="populate-copy-from">
|
||||
<title>Use <command>COPY FROM</command></title>
|
||||
<title>Use <command>COPY</command></title>
|
||||
|
||||
<para>
|
||||
Use <command>COPY FROM STDIN</command> to load all the rows in one
|
||||
command, instead of using a series of <command>INSERT</command>
|
||||
commands. This reduces parsing, planning, etc. overhead a great
|
||||
deal. If you do this then it is not necessary to turn off
|
||||
autocommit, since it is only one command anyway.
|
||||
Use <xref linkend="sql-copy" endterm="sql-copy-title"> to load
|
||||
all the rows in one command, instead of using a series of
|
||||
<command>INSERT</command> commands. The <command>COPY</command>
|
||||
command is optimized for loading large numbers of rows; it is less
|
||||
flexible than <command>INSERT</command>, but incurs significantly
|
||||
less overhead for large data loads. Since <command>COPY</command>
|
||||
is a single command, there is no need to disable autocommit if you
|
||||
use this method to populate a table.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that loading a large number of rows using
|
||||
<command>COPY</command> is almost always faster than using
|
||||
<command>INSERT</command>, even if multiple
|
||||
<command>INSERT</command> commands are batched into a single
|
||||
transaction.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
@ -678,11 +699,12 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
|
||||
|
||||
<para>
|
||||
If you are augmenting an existing table, you can drop the index,
|
||||
load the table, then recreate the index. Of
|
||||
course, the database performance for other users may be adversely
|
||||
affected during the time that the index is missing. One should also
|
||||
think twice before dropping unique indexes, since the error checking
|
||||
afforded by the unique constraint will be lost while the index is missing.
|
||||
load the table, and then recreate the index. Of course, the
|
||||
database performance for other users may be adversely affected
|
||||
during the time that the index is missing. One should also think
|
||||
twice before dropping unique indexes, since the error checking
|
||||
afforded by the unique constraint will be lost while the index is
|
||||
missing.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
@ -701,16 +723,39 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="populate-checkpoint-segments">
|
||||
<title>Increase <varname>checkpoint_segments</varname></title>
|
||||
|
||||
<para>
|
||||
Temporarily increasing the <xref
|
||||
linkend="guc-checkpoint-segments"> configuration variable can also
|
||||
make large data loads faster. This is because loading a large
|
||||
amount of data into <productname>PostgreSQL</productname> can
|
||||
cause checkpoints to occur more often than the normal checkpoint
|
||||
frequency (specified by the <varname>checkpoint_timeout</varname>
|
||||
configuration variable). Whenever a checkpoint occurs, all dirty
|
||||
pages must be flushed to disk. By increasing
|
||||
<varname>checkpoint_segments</varname> temporarily during bulk
|
||||
data loads, the number of checkpoints that are required can be
|
||||
reduced.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="populate-analyze">
|
||||
<title>Run <command>ANALYZE</command> Afterwards</title>
|
||||
|
||||
<para>
|
||||
It's a good idea to run <command>ANALYZE</command> or <command>VACUUM
|
||||
ANALYZE</command> anytime you've added or updated a lot of data,
|
||||
including just after initially populating a table. This ensures that
|
||||
the planner has up-to-date statistics about the table. With no statistics
|
||||
or obsolete statistics, the planner may make poor choices of query plans,
|
||||
leading to bad performance on queries that use your table.
|
||||
Whenever you have significantly altered the distribution of data
|
||||
within a table, running <xref linkend="sql-analyze"
|
||||
endterm="sql-analyze-title"> is strongly recommended. This
|
||||
includes when bulk loading large amounts of data into
|
||||
<productname>PostgreSQL</productname>. Running
|
||||
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
|
||||
ensures that the planner has up-to-date statistics about the
|
||||
table. With no statistics or obsolete statistics, the planner may
|
||||
make poor decisions during query planning, leading to poor
|
||||
performance on any tables with inaccurate or nonexistent
|
||||
statistics.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
|
Loading…
x
Reference in New Issue
Block a user