Use CREATE DATABASE ... STRATEGY = FILE_COPY in pg_upgrade.

While this strategy is ordinarily quite costly because it requires
performing two checkpoints, testing shows that it tends to be a
faster choice than WAL_LOG during pg_upgrade, presumably because
fsync is turned off.  Furthermore, we can skip the checkpoints
altogether because the problems they are intended to prevent don't
apply to pg_upgrade.  Instead, we just need to CHECKPOINT once in
the new cluster after making any changes to template0 and before
restoring the rest of the databases.  This ensures that said
template0 changes are written out to disk prior to creating the
databases via FILE_COPY.

Co-authored-by: Matthias van de Meent
Reviewed-by: Ranier Vilela, Dilip Kumar, Robert Haas, Michael Paquier
Discussion: https://postgr.es/m/Zl9ta3FtgdjizkJ5%40nathan
This commit is contained in:
Nathan Bossart 2024-07-08 16:18:00 -05:00
parent 4b4b931bcd
commit 64f34eb2e2
3 changed files with 34 additions and 4 deletions

View File

@ -563,9 +563,14 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, Oid src_tsid,
* happened while we're copying files, a file might be deleted just when
* we're about to copy it, causing the lstat() call in copydir() to fail
* with ENOENT.
*
* In binary upgrade mode, we can skip this checkpoint because pg_upgrade
* is careful to ensure that template0 is fully written to disk prior to
* any CREATE DATABASE commands.
*/
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL);
if (!IsBinaryUpgrade)
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
CHECKPOINT_WAIT | CHECKPOINT_FLUSH_ALL);
/*
* Iterate through all tablespaces of the template database, and copy each
@ -657,10 +662,17 @@ CreateDatabaseUsingFileCopy(Oid src_dboid, Oid dst_dboid, Oid src_tsid,
* seem to be much we can do about that except document it as a
* limitation.
*
* In binary upgrade mode, we can skip this checkpoint because neither of
* these problems applies: we don't ever replay the WAL generated during
* pg_upgrade, and we don't support taking base backups during pg_upgrade
* (not to mention that we don't concurrently modify template0, either).
*
* See CreateDatabaseUsingWalLog() for a less cheesy CREATE DATABASE
* strategy that avoids these problems.
*/
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
if (!IsBinaryUpgrade)
RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE |
CHECKPOINT_WAIT);
}
/*

View File

@ -3144,10 +3144,16 @@ dumpDatabase(Archive *fout)
* since those can't be altered later. Other DB properties are left to
* the DATABASE PROPERTIES entry, so that they can be applied after
* reconnecting to the target DB.
*
* For binary upgrade, we use the FILE_COPY strategy because testing has
* shown it to be faster. When the server is in binary upgrade mode, it
* will also skip the checkpoints this strategy ordinarily performs.
*/
if (dopt->binary_upgrade)
{
appendPQExpBuffer(creaQry, "CREATE DATABASE %s WITH TEMPLATE = template0 OID = %u",
appendPQExpBuffer(creaQry,
"CREATE DATABASE %s WITH TEMPLATE = template0 "
"OID = %u STRATEGY = FILE_COPY",
qdatname, dbCatId.oid);
}
else

View File

@ -534,9 +534,21 @@ static void
create_new_objects(void)
{
int dbnum;
PGconn *conn_new_template1;
prep_status_progress("Restoring database schemas in the new cluster");
/*
* Ensure that any changes to template0 are fully written out to disk
* prior to restoring the databases. This is necessary because we use the
* FILE_COPY strategy to create the databases (which testing has shown to
* be faster), and when the server is in binary upgrade mode, it skips the
* checkpoints this strategy ordinarily performs.
*/
conn_new_template1 = connectToServer(&new_cluster, "template1");
PQclear(executeQueryOrDie(conn_new_template1, "CHECKPOINT"));
PQfinish(conn_new_template1);
/*
* We cannot process the template1 database concurrently with others,
* because when it's transiently dropped, connection attempts would fail.