Fix GetStrictOldestNonRemovableTransactionId() on standby

e85662df44 implemented GetStrictOldestNonRemovableTransactionId() function
for computation of xid horizon that avoid reporting of false errors.
However, GetStrictOldestNonRemovableTransactionId() uses
GetRunningTransactionData() even on standby leading to an assertion failure.

Given that we decided to ignore KnownAssignedXids and standby can't have
own running xids, we switch to use TransamVariables->nextXid as a xid horizon.

Also, revise the comment regarding ignoring KnownAssignedXids with more
detailed reasoning provided by Heikki.

Reported-by: Heikki Linnakangas
Discussion: https://postgr.es/m/42218c4f-2c8d-40a3-8743-4d34dd0e4cce%40iki.fi
Reviewed-by: Heikki Linnakangas
This commit is contained in:
Alexander Korotkov 2024-08-16 00:17:59 +03:00
parent 9e9a2b7031
commit e2ed7e3227
2 changed files with 40 additions and 5 deletions

View File

@ -546,11 +546,21 @@ collect_visibility_data(Oid relid, bool include_pd)
*
* 1. Ignore processes xmin's, because they consider connection to other
* databases that were ignored before.
* 2. Ignore KnownAssignedXids, because they are not database-aware. At the
* same time, the primary could compute its horizons database-aware.
* 2. Ignore KnownAssignedXids, as they are not database-aware. Although we
* now perform minimal checking on a standby by always using nextXid, this
* approach is better than nothing and will at least catch extremely broken
* cases where a xid is in the future.
* 3. Ignore walsender xmin, because it could go backward if some replication
* connections don't use replication slots.
*
* While it might seem like we could use KnownAssignedXids for shared
* catalogs, since shared catalogs rely on a global horizon rather than a
* database-specific one - there are potential edge cases. For example, a
* transaction may crash on the primary without writing a commit/abort record.
* This would lead to a situation where it appears to still be running on the
* standby, even though it has already ended on the primary. For this reason,
* it's safer to ignore KnownAssignedXids, even for shared catalogs.
*
* As a result, we're using only currently running xids to compute the horizon.
* Surely these would significantly sacrifice accuracy. But we have to do so
* to avoid reporting false errors.
@ -560,7 +570,17 @@ GetStrictOldestNonRemovableTransactionId(Relation rel)
{
RunningTransactions runningTransactions;
if (rel == NULL || rel->rd_rel->relisshared || RecoveryInProgress())
if (RecoveryInProgress())
{
TransactionId result;
/* As we ignore KnownAssignedXids on standby, just pick nextXid */
LWLockAcquire(XidGenLock, LW_SHARED);
result = XidFromFullTransactionId(TransamVariables->nextXid);
LWLockRelease(XidGenLock);
return result;
}
else if (rel == NULL || rel->rd_rel->relisshared)
{
/* Shared relation: take into account all running xids */
runningTransactions = GetRunningTransactionData();

View File

@ -10,11 +10,18 @@ use PostgreSQL::Test::Utils;
use Test::More;
# Initialize the primary node
my $node = PostgreSQL::Test::Cluster->new('main');
$node->init;
$node->init(allows_streaming => 1);
$node->start;
# Initialize the streaming standby
my $backup_name = 'my_backup';
$node->backup($backup_name);
my $standby = PostgreSQL::Test::Cluster->new('standby');
$standby->init_from_backup($node, $backup_name, has_streaming => 1);
$standby->start;
# Setup another database
$node->safe_psql("postgres", "CREATE DATABASE other_database;\n");
my $bsession = $node->background_psql('other_database');
@ -39,9 +46,17 @@ my $result = $node->safe_psql("postgres",
# There should be no false negatives
ok($result eq "", "pg_check_visible() detects no errors");
# Run pg_check_visible() on standby
$result = $standby->safe_psql("postgres",
"SELECT * FROM pg_check_visible('vacuum_test');");
# There should be no false negatives either
ok($result eq "", "pg_check_visible() detects no errors");
# Shutdown
$bsession->query_safe("COMMIT;");
$bsession->quit;
$node->stop;
$standby->stop;
done_testing();