There are two Asserts in nodeMergejoin.c that are reachable if
the input data is not in the expected order. This seems way too
fragile. Alexander Lakhin reported a case where the assertions
could be triggered with misconfigured foreign-table partitions,
and bitter experience with unstable operating system collation
definitions suggests another easy route to hitting them. Neither
Assert is in a place where we can't afford one more test-and-branch,
so replace 'em with plain test-and-elog logic.
Per bug #17395. While the reported symptom is relatively recent,
collation changes could happen anytime, so back-patch to all
supported branches.
Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
We assume that direct-modify ForeignScan nodes cannot be re-evaluated
during EvalPlanQual processing, but the rework for inherited
UPDATE/DELETE in commit 86dc90056 changed things, without considering
that, so that such ForeignScan nodes get called as part of the
EvalPlanQual subtree during EvalPlanQual processing in the case of an
inherited UPDATE/DELETE where the inheritance set contains foreign
target relations. To avoid re-evaluating such ForeignScan nodes during
EvalPlanQual processing, commit c3928b467 modified nodeForeignscan.c,
but the assumption made there that ExecForeignScan() should never be
called for such ForeignScan nodes during EvalPlanQual processing turned
out to be wrong in some cases, leading to a segmentation fault or a
"cannot re-evaluate a Foreign Update or Delete during EvalPlanQual"
error.
Fix by modifying nodeForeignscan.c further to avoid re-evaluating such
ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan()
during EvalPlanQual processing. Since this makes non-reachable the
test-and-elog added to ForeignNext() by commit c3928b467 that produced
the aforesaid error, convert the test-and-elog to an Assert.
Per bug #17355 from Alexander Lakhin. Back-patch to v14 where both
commits came in.
Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote.
Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org
Also apply the changes suggested by running
perl ppport.h --compat-version=5.8.0
And remove some no-longer-required NEED_foo declarations.
Dagfinn Ilmari Mannsåker
Back-patch of commit 05798c9f7 into all supported branches.
At the time we thought this update was mostly cosmetic, but the
lack of it has caused trouble, while the patch itself hasn't.
Discussion: https://postgr.es/m/87y278s6iq.fsf@wibble.ilmari.org
Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
ppport.h was only updated in 05798c9f7f0 (master). Unfortunately my commit
c89f409749c uses PERL_VERSION_LT which came in with that update. Breaking most
buildfarm animals.
I should have noticed that...
We might want to backpatch the ppport update instead, but for now lets get the
buildfarm green again.
Discussion: https://postgr.es/m/20220131015130.shn6wr2fzuymerf6@alap3.anarazel.de
Backpatch: 10-14, master doesn't need it
For older versions we need our own copy of perl's setlocale(), because it was
not exposed (why we need the setlocale in the first place is explained in
plperl_init_interp) . The copy stopped working in 5.28, as some of the used
macros are not public anymore. But Perl_setlocale is available in 5.28, so
use that.
Author: Victor Wagner <vitus@wagner.pp.ru>
Reviewed-By: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Discussion: https://postgr.es/m/20200501134711.08750c5f@antares.wagner.home
Backpatch: all versions
Although select_common_type() has a failure-return convention, an
apparent successful return just provides a type OID that *might* work
as a common supertype; we've not validated that the required casts
actually exist. In the mainstream use-cases that doesn't matter,
because we'll proceed to invoke coerce_to_common_type() on each input,
which will fail appropriately if the proposed common type doesn't
actually work. However, a few callers didn't read the (nonexistent)
fine print, and thought that if they got back a nonzero OID then the
coercions were sure to work.
This affects in particular the recently-added "anycompatible"
polymorphic types; we might think that a function/operator using
such types matches cases it really doesn't. A likely end result
of that is unexpected "ambiguous operator" errors, as for example
in bug #17387 from James Inform. Another, much older, case is that
the parser might try to transform an "x IN (list)" construct to
a ScalarArrayOpExpr even when the list elements don't actually have
a common supertype.
It doesn't seem desirable to add more checking to select_common_type
itself, as that'd just slow down the mainstream use-cases. Instead,
write a separate function verify_common_type that performs the
missing checks, and add a call to that where necessary. Likewise add
verify_common_type_from_oids to go with select_common_type_from_oids.
Back-patch to v13 where the "anycompatible" types came in. (The
symptom complained of in bug #17387 doesn't appear till v14, but
that's just because we didn't get around to converting || to use
anycompatible till then.) In principle the "x IN (list)" fix could
go back all the way, but I'm not currently convinced that it makes
much difference in real-world cases, so I won't bother for now.
Discussion: https://postgr.es/m/17387-5dfe54b988444963@postgresql.org
c532d15 has split the logic of COPY commands into multiple files, one
change being to move the internals of BeginCopy() to BeginCopyTo().
Originally the code was written so as we'd switch back-and-forth between
the current execution memory context and the dedicated memory context
for the COPY command, and this refactoring has introduced an extra
switch to the current memory context from the COPY context once
BeginCopyTo() is done with the past logic coming from BeginCopy().
The code was correctly doing the analyze, rewrite and planning phases in
the COPY context, but it was not assigning "copy_file" (FILE* used when
copying to a source file) and "filename" in the COPY context, making the
COPY status data inconsistent.
Author: Bharath Rupireddy
Reviewed-by: Japin Li
Discussion: https://postgr.es/m/CALj2ACWvVa69foi9jhHFY=2BuHxAoYboyE+vXQTARwxZcJnVrQ@mail.gmail.com
Backpatch-through: 14
When pg_log_backend_memory_contexts() is executed, the target backend
should use LOG_SERVER_ONLY to log its memory contexts, to prevent them
from being sent to its connected client regardless of client_min_messages.
But previously the backend unexpectedly used LOG to log the message
"logging memory contexts of PID %d" and it could be sent to the client.
This is a bug in memory context logging.
To fix the bug, this commit changes that message so that it's logged with
LOG_SERVER_ONLY.
Back-patch to v14 where pg_log_backend_memory_contexts() was added.
Author: Fujii Masao
Reviewed-by: Bharath Rupireddy, Atsushi Torikoshi
Discussion: https://postgr.es/m/82c12f36-86f7-5e72-79af-7f5c37f6cad7@oss.nttdata.com
Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs
before adding them to KnownAssignedXids. But the XIDs are sorted using
xidComparator, which compares the XIDs simply as uint32 values, not
logically. KnownAssignedXidsAdd() however expects XIDs in logical order,
and calls TransactionIdFollowsOrEquals() to enforce that. If there are
XIDs for which the two orderings disagree, an error is raised and the
recovery fails/restarts.
Hitting this issue is fairly easy - you just need two transactions, one
started before the 4B limit (e.g. XID 4294967290), the other sometime
after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when
compared using xidComparator we try to add them in the opposite order.
Which makes KnownAssignedXidsAdd() fail with an error like this:
ERROR: out-of-order XID insertion in KnownAssignedXids
This only happens during replica startup, while processing RUNNING_XACTS
records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we
skip these records. So this does not affect already running replicas,
but if you restart (or create) a replica while there are transactions
with XIDs for which the two orderings disagree, you may hit this.
Long-running transactions and frequent replica restarts increase the
likelihood of hitting this issue. Once the replica gets into this state,
it can't be started (even if the old transactions are terminated).
Fixed by sorting the XIDs logically - this is fine because we're dealing
with normal XIDs (because it's XIDs assigned to backends) and from the
same wraparound epoch (otherwise the backends could not be running at
the same time on the primary node). So there are no problems with the
triangle inequality, which is why xidComparator compares raw values.
Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me.
This issue is present in all releases since 9.4, however releases up to
9.6 are EOL already so backpatch to 10 only.
Reviewed-by: Abhijit Menon-Sen
Reviewed-by: Alvaro Herrera
Backpatch-through: 10
Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com
Perl instances on some msys toolchains (e.g. UCRT64) have their
configured osname set to 'MSWin32' rather than 'msys'. The test for
the msys2 platform is adjusted accordingly.
Backpatch to release 14.
Buildfarm members kittiwake, tadarida and snapper began to fail
frequently when commits 3cd9c3b921977272e6650a5efbeade4203c4bca2 and
f47ed79cc8a0cfa154dc7f01faaf59822552363f added tests of concurrency, but
the problem was reachable before those commits. Back-patch to v10 (all
supported versions).
Discussion: https://postgr.es/m/20220116210241.GC756210@rfd.leadboat.com
For authentication method cert, clientcert=verify-full is implied. But
the pg_hba_file_rules entry would incorrectly show clientcert=verify-ca.
Per bug #17354
Reported-By: Feike Steenbergen
Reviewed-By: Jonathan Katz
Backpatch-through: 12
This reverts commits 6051857fc and ed52c3707, but only in the back
branches. Further testing has shown that while those changes do fix
some things, they also break others; in particular, it looks like
walreceivers fail to detect walsender-initiated connection close
reliably if the walsender shuts down this way. We'll keep trying to
improve matters in HEAD, but it now seems unwise to push these changes
into stable releases.
Discussion: https://postgr.es/m/CA+hUKG+OeoETZQ=Qw5Ub5h3tmwQhBmDA=nuNO3KG=zWfUypFAw@mail.gmail.com
8edd0e794 added some code to remove Append and MergeAppend nodes when they
contained a single child node. As it turned out, this was unsafe to do
when the Append/MergeAppend was parallel_aware and the child node was not.
Removing the Append/MergeAppend, in this case, could lead to the child plan
being called multiple times by parallel workers when it was unsafe to do
so.
Here we fix this by just not removing the Append/MergeAppend when the
parallel_aware flag of the parent and child node don't match.
Reported-by: Yura Sokolov
Bug: #17335
Discussion: https://postgr.es/m/b59605fecb20ba9ea94e70ab60098c237c870628.camel%40postgrespro.ru
Backpatch-through: 12, where 8edd0e794 was first introduced
In logical replication mode, a WalSender is supposed to be able
to execute any regular SQL command, as well as the special
replication commands. Poor design of the replication-command
parser caused it to fail in various cases, notably:
* semicolons embedded in a command, or multiple SQL commands
sent in a single message;
* dollar-quoted literals containing odd numbers of single
or double quote marks;
* commands starting with a comment.
The basic problem here is that we're trying to run repl_scanner.l
across the entire input string even when it's not a replication
command. Since repl_scanner.l does not understand all of the
token types known to the core lexer, this is doomed to have
failure modes.
We certainly don't want to make repl_scanner.l as big as scan.l,
so instead rejigger stuff so that we only lex the first token of
a non-replication command. That will usually look like an IDENT
to repl_scanner.l, though a comment would end up getting reported
as a '-' or '/' single-character token. If the token is a replication
command keyword, we push it back and proceed normally with repl_gram.y
parsing. Otherwise, we can drop out of exec_replication_command()
without examining the rest of the string.
(It's still theoretically possible for repl_scanner.l to fail on
the first token; but that could only happen if it's an unterminated
single- or double-quoted string, in which case you'd have gotten
largely the same error from the core lexer too.)
In this way, repl_gram.y isn't involved at all in handling general
SQL commands, so we can get rid of the SQLCmd node type. (In
the back branches, we can't remove it because renumbering enum
NodeTag would be an ABI break; so just leave it sit there unused.)
I failed to resist the temptation to clean up some other sloppy
coding in repl_scanner.l while at it. The only externally-visible
behavior change from that is it now accepts \r and \f as whitespace,
same as the core lexer.
Per bug #17379 from Greg Rychlewski. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org
Without this, we get odd behavior when the previous cycle of
lexing exited in a non-default exclusive state. Every other
copy of this code is aware that it has to do BEGIN(INITIAL),
but repl_scanner.l did not get that memo.
The real-world impact of this is probably limited, since most
replication clients would abandon their connection after getting
a syntax error. Still, it's a bug.
This mistake is old, so back-patch to all supported branches.
Discussion: https://postgr.es/m/1874781.1643035952@sss.pgh.pa.us
Commit 6df7a9698 wrote appendPQExpBuffer where it should have
written printfPQExpBuffer. This resulted in re-issuing the
previous query along with the desired one, which very accidentally
had no negative consequences except for some wasted cycles.
Back-patch to v14 where that came in.
Discussion: https://postgr.es/m/1714711.1642962663@sss.pgh.pa.us
In the normal configuration where GEQO_DEBUG isn't defined,
recent clang versions have started to complain that geqo_main.c
accumulates the edge_failures count but never does anything
with it. As a minimal back-patchable fix, insert a void cast
to silence this warning. (I'd speculated about ripping out the
GEQO_DEBUG logic altogether, but I don't think we'd wish to
back-patch that.)
Per recently-established project policy, this is a candidate
for back-patching into out-of-support branches: it suppresses
an annoying compiler warning but changes no behavior. Hence,
back-patch all the way to 9.2.
Discussion: https://postgr.es/m/CA+hUKGLTSZQwES8VNPmWO9AO0wSeLt36OCPDAZTccT1h7Q7kTQ@mail.gmail.com
In sort_inner_and_outer we iterate a list of PathKey elements, but the
variable is declared as (List *). This mistake is benign, because we
only pass the pointer to lcons() and never dereference it.
This exists since ~2004, but it's confusing. So fix and backpatch to all
supported branches.
Backpatch-through: 10
Discussion: https://postgr.es/m/bf3a6ea1-a7d8-7211-0669-189d5c169374%40enterprisedb.com
The syscache lookup may return NULL even for valid OID, for example due
to a concurrent DROP STATISTICS, so a HeapTupleIsValid is necessary.
Without it, it may fail with a segfault.
Reported by Alexander Lakhin, patch by me. Backpatch to 13, where ALTER
STATISTICS ... SET STATISTICS was introduced.
Backpatch-through: 13
Discussion: https://postgr.es/m/17372-bf3b6e947e35ae77%40postgresql.org
Previously, unless we had to add a NOT NULL constraint to the column,
this command resulted in updating only the index's relcache entry.
That's problematic when replication behavior is being driven off the
existence of a primary key: other sessions (and ours too for that
matter) failed to recalculate their opinion of whether the table can
be replicated. Add a relcache invalidation to fix it.
This has been broken since pg_class.relhaspkey was removed in v11.
Before that, updating the table's relhaspkey value sufficed to cause
a cache flush. Hence, backpatch to v11.
Report and patch by Hou Zhijie
Discussion: https://postgr.es/m/OS0PR01MB5716EBE01F112C62F8F9B786947B9@OS0PR01MB5716.jpnprd01.prod.outlook.com
In libpq and ecpglib, multiple threads can concurrently enter the
initialization logic for message localization. Since we set the
its-done flag before actually doing the work, it'd be possible
for some threads to reach gettext() before anyone has called
bindtextdomain(). Barring bugs in libintl itself, this would not
result in anything worse than failure to localize some early
messages. Nonetheless, it's a bug, and an easy one to fix.
Noted while investigating bug #17299 from Clemens Zeidler
(much thanks to Liam Bowen for followup investigation on that).
It currently appears that that actually *is* a bug in libintl itself,
but that doesn't let us off the hook for this bit.
Back-patch to all supported versions.
Discussion: https://postgr.es/m/17299-7270741958c0b1ab@postgresql.org
Discussion: https://postgr.es/m/CAE7q7Eit4Eq2=bxce=Fm8HAStECjaXUE=WBQc-sDDcgJQ7s7eg@mail.gmail.com
While individual logical rewrite files were synced to disk, the directory was
not. On some filesystems that could lead to loosing directory entries after a
crash.
Reported-By: Tom Lane <tgl@sss.pgh.pa.us>
Author: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/867F2E29-2782-4869-970E-B984C6D35A8F@amazon.com
Backpatch: 10-
The logic in charge of writing commit timestamps (enabled with
track_commit_timestamp) for subtransactions had a one-bug bug,
where it would be possible that commit timestamps go missing for the
last subtransaction committed.
While on it, simplify a bit the iteration logic in the loop writing the
commit timestamps, as per suggestions from Kyotaro Horiguchi and Tom
Lane, so as some variable initializations are not part of the loop
itself.
Issue introduced in 73c986a.
Analyzed-by: Alex Kingsborough
Author: Alex Kingsborough, Kyotaro Horiguchi
Discussion: https://postgr.es/m/73A66172-4050-4F2A-B7F1-13508EDA2144@amazon.com
Backpatch-through: 10
Commits 6c4a8903b et al. had a couple of deficiencies:
* The logic I added to Cluster::start to see if a PID file is present
could be fooled by a stale PID file left over from a previous
postmaster. To fix, if we're not sure whether we expect to find a
running postmaster or not, validate the PID using "kill 0".
* 017_shm.pl has a loop in which it just issues repeated Cluster::start
calls; this will fail if some invocation fails but leaves self->_pid
set. Per buildfarm results, the above fix is not enough to make this
safe: we might have "validated" a PID for a postmaster that exits
immediately after we look. Hence, match each failed start call with
a stop call that will get us back to the self->_pid == undef state.
Add a fail_ok option to Cluster::stop to make this work.
Discussion: https://postgr.es/m/CA+hUKGKV6fOHvfiPt8=dOKzvswjAyLoFoJF1iQXMNpi7+hD1JQ@mail.gmail.com
Since the test requires reproducible behavior from VACUUM, and since
DISABLE_PAGE_SKIPPING doesn't actually disable all forms of page
skipping, let's use a temporary table to avoid contention.
Back-patch to 12, like commit 3414099c.
Discussion: https://postgr.es/m/20220120052404.sonrhq3f3qgplpzj%40alap3.anarazel.de
"pg_ctl start" might start a new postmaster and then return failure
anyway, for example if PGCTLTIMEOUT is exceeded. If there is a
postmaster there, it's still incumbent on us to shut it down at
script end, so check for the PID file even though we are about
to fail.
This has been broken all along, so back-patch to all supported branches.
Discussion: https://postgr.es/m/647439.1642622744@sss.pgh.pa.us
Where we test vacuum_truncate's effects, sometimes this is failing to
truncate as expected on the build farm. That could be explained by page
skipping, so disable it explicitly, with the theory that commit fe246d1c
didn't go far enough.
Back-patch to 12, where the vacuum_truncate tests were added.
Discussion: https://postgr.es/m/CA%2BhUKGLT2UL5_JhmBzUgkdyKfc%3D5J-gJSQJLysMs4rqLUKLAzw%40mail.gmail.com
The original coding (from c33869cc3) failed with "more than one row
returned by a subquery used as an expression" if there were unrelated
triggers of the same tgname on parent partitioned tables. (That's
possible because statement-level triggers don't get inherited.) Fix
by applying LIMIT 1 after sorting the candidates by inheritance level.
Also, wrap the subquery in a CASE so that we don't have to execute it at
all when the trigger is visibly non-inherited. Aside from saving some
cycles, this avoids the need for a confusing and undocumented NULLIF().
While here, tweak the format of the emitted query to look a bit
nicer for "psql -E", and add some explanation of this subquery,
because it badly needs it.
Report and patch by Justin Pryzby (with some editing by me).
Back-patch to v13 where the faulty code came in.
Discussion: https://postgr.es/m/20211217154356.GJ17618@telsasoft.com
It seems highly unlikely that gettext() can be relied on to be
async-signal-safe. psql used to understand that, but someone got
it wrong long ago in the src/bin/scripts/ version of handle_sigint,
and then the bad idea was perpetuated when those two versions were
unified into src/fe_utils/cancel.c.
I'm unsure why there have not been field complaints about this
... maybe gettext() is signal-safe once it's translated at least
one message? But we have no business assuming any such thing.
In cancel.c (v13 and up), I preserved our ability to localize
"Cancel request sent" messages by invoking gettext() before
the signal handler is set up. In earlier branches I just made
src/bin/scripts/ not localize those messages, as psql did then.
(Just for extra unsafety, the src/bin/scripts/ version was
invoking fprintf() from a signal handler. Sigh.)
Noted while fixing signal-safety issues in PQcancel() itself.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
PQcancel() is supposed to be safe to call from a signal handler,
and indeed psql uses it that way. All of the library functions
it uses are specified to be async-signal-safe by POSIX ...
except for strerror. Neither plain strerror nor strerror_r
are considered safe. When this code was written, back in the
dark ages, we probably figured "oh, strerror will just index
into a constant array of strings" ... but in any locale except C,
that's unlikely to be true. Probably the reason we've not heard
complaints is that (a) this error-handling code is unlikely to be
reached in normal use, and (b) in many scenarios, localized error
strings would already have been loaded, after which maybe it's
safe to call strerror here. Still, this is clearly unacceptable.
The best we can do without relying on strerror is to print the
decimal value of errno, so make it do that instead. (This is
probably not much loss of user-friendliness, given that it is
hard to get a failure here.)
Back-patch to all supported branches.
Discussion: https://postgr.es/m/2937814.1641960929@sss.pgh.pa.us
The need for this was foreseen long ago, but when record_eq
actually became hashable (in commit 01e658fa7), we missed updating
this spot.
Per bug #17363 from Elvis Pranskevichus. Back-patch to v14 where
the faulty commit came in.
Discussion: https://postgr.es/m/17363-f6d42fd0d726be02@postgresql.org
Since enum labels have to be single-quoted, this part of the
tab completion machinery got side-swiped by commit cd69ec66c.
A side-effect of that commit is that (at least with some versions
of Readline) the text string passed for completion will omit the
leading quote mark of the enum label literal. Libedit still acts
the same as before, though, so adapt COMPLETE_WITH_ENUM_VALUE so
that it can cope with either convention.
Also, when we fail to find any valid completion, set
rl_completion_suppress_quote = 1. Otherwise readline will
go ahead and append a closing quote, which is unwanted.
Per report from Peter Eisentraut. Back-patch to v13 where
cd69ec66c came in.
Discussion: https://postgr.es/m/8ca82d89-ec3d-8b28-8291-500efaf23b25@enterprisedb.com
Commit 859b3003de disabled building of extended stats for inheritance
trees, to prevent updating the same catalog row twice. While that
resolved the issue, it also means there are no extended stats for
declaratively partitioned tables, because there are no data in the
non-leaf relations.
That also means declaratively partitioned tables were not affected by
the issue 859b3003de addressed, which means this is a regression
affecting queries that calculate estimates for the whole inheritance
tree as a whole (which includes e.g. GROUP BY queries).
But because partitioned tables are empty, we can invert the condition
and build statistics only for the case with inheritance, without losing
anything. And we can consider them when calculating estimates.
It may be necessary to run ANALYZE on partitioned tables, to collect
proper statistics. For declarative partitioning there should no prior
statistics, and it might take time before autoanalyze is triggered. For
tables partitioned by inheritance the statistics may include data from
child relations (if built 859b3003de), contradicting the current code.
Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).
Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
Since commit 859b3003de we only build extended statistics for individual
relations, ignoring the child relations. This resolved the issue with
updating catalog tuple twice, but we still tried to use the statistics
when calculating estimates for the whole inheritance tree. When the
relations contain very distinct data, it may produce bogus estimates.
This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we
fix it the same way - by ignoring extended statistics when calculating
estimates for the inheritance tree as a whole. We still consider
extended statistics when calculating estimates for individual child
relations, of course.
This may result in plan changes due to different estimates, but if the
old statistics were not describing the inheritance tree particularly
well it's quite likely the new plans is actually better.
Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).
Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
Since dc7420c2c92 the horizon used for pruning is determined "lazily". A more
accurate horizon is built on-demand, rather than in GetSnapshotData(). If a
horizon computation is triggered between two HeapTupleSatisfiesVacuum() calls
for the same tuple, the result can change from RECENTLY_DEAD to DEAD.
heap_page_prune() can process the same tid multiple times (once following an
update chain, once "directly"). When the result of HeapTupleSatisfiesVacuum()
of a tuple changes from RECENTLY_DEAD during the first access, to DEAD in the
second, the "tuple is DEAD and doesn't chain to anything else" path in
heap_prune_chain() can end up marking the target of a LP_REDIRECT ItemId
unused.
Initially not easily visible,
Once the target of a LP_REDIRECT ItemId is marked unused, a new tuple version
can reuse it. At that point the corruption may become visible, as index
entries pointing to the "original" redirect item, now point to a unrelated
tuple.
To fix, compute HTSV for all tuples on a page only once. This fixes the entire
class of problems of HTSV changing inside heap_page_prune(). However,
visibility changes can obviously still occur between HTSV checks inside
heap_page_prune() and outside (e.g. in lazy_scan_prune()).
The computation of HTSV is now done in bulk, in heap_page_prune(), rather than
on-demand in heap_prune_chain(). Besides being a bit simpler, it also is
faster: Memory accesses can happen sequentially, rather than in the order of
HOT chains.
There are other causes of HeapTupleSatisfiesVacuum() results changing between
two visibility checks for the same tuple, even before dc7420c2c92. E.g.
HEAPTUPLE_INSERT_IN_PROGRESS can change to HEAPTUPLE_DEAD when a transaction
aborts between the two checks. None of the these other visibility status
changes are known to cause corruption, but heap_page_prune()'s approach makes
it hard to be confident.
A patch implementing a more fundamental redesign of heap_page_prune(), which
fixes this bug and simplifies pruning substantially, has been proposed by
Peter Geoghegan in
https://postgr.es/m/CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com
However, that redesign is larger change than desirable for backpatching. As
the new design still benefits from the batched visibility determination
introduced in this commit, it makes sense to commit this narrower fix to 14
and master, and then commit Peter's improvement in master.
The precise sequence required to trigger the bug is complicated and hard to do
exercise in an isolation test (until we have wait points). Due to that the
isolation test initially posted at
https://postgr.es/m/20211119003623.d3jusiytzjqwb62p%40alap3.anarazel.de
and updated in
https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb%40alap3.anarazel.de
isn't committable.
A followup commit will introduce additional assertions, to detect problems
like this more easily.
Bug: #17255
Reported-By: Alexander Lakhin <exclusion@gmail.com>
Debugged-By: Andres Freund <andres@anarazel.de>
Debugged-By: Peter Geoghegan <pg@bowt.ie>
Author: Andres Freund <andres@andres@anarazel.de>
Reviewed-By: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/20211122175914.ayk6gg6nvdwuhrzb@alap3.anarazel.de
Backpatch: 14-, the oldest branch containing dc7420c2c92
This reverts commits ab27df2, af8d530 and 3a0cced, that introduced
pg_cryptohash_error(). In order to make the core code able to pass down
the new error types that this introduced, some of the MD5-related
routines had to be reworked, causing an ABI breakage, but we found that
some external extensions rely on them. Maintaining compatibility
outweights the error report benefits, so just revert the change in v14.
Reported-by: Laurenz Albe
Discussion: https://postgr.es/m/9f0c0a96d28cf14fc87296bbe67061c14eb53ae8.camel@cybertec.at
Commit 7745bc352 intended to ensure that whole-row Vars would be
printed with "::type" decoration in all contexts where plain
"var.*" notation would result in star-expansion, notably in
ROW() and VALUES() constructs. However, it missed the case of
INSERT with a single-row VALUES, as reported by Timur Khanjanov.
Nosing around ruleutils.c, I found a second oversight: the
code for RowCompareExpr generates ROW() notation without benefit
of an actual RowExpr, and naturally it wasn't in sync :-(.
(The code for FieldStore also does this, but we don't expect that
to generate strictly parsable SQL anyway, so I left it alone.)
Back-patch to all supported branches.
Discussion: https://postgr.es/m/efaba6f9-4190-56be-8ff2-7a1674f9194f@intrans.baku.az
Commit 9dc718bd added a "logically unchanged by UPDATE" hinting
mechanism, which is currently used within nbtree indexes only (see
commit d168b666). This mechanism determined whether or not the incoming
item is a logically unchanged duplicate (a duplicate needed only for
MVCC versioning purposes) once per row updated per non-HOT update. This
approach led to memory leaks which were noticeable with an UPDATE
statement that updated sufficiently many rows, at least on tables that
happen to have an expression index.
On HEAD, fix the issue by adding a cache to the executor's per-index
IndexInfo struct.
Take a different approach on Postgres 14 to avoid an ABI break: simply
pass down the hint to all indexes unconditionally with non-HOT UPDATEs.
This is deemed acceptable because the hint is currently interpreted
within btinsert() as "perform a bottom-up index deletion pass if and
when the only alternative is splitting the leaf page -- prefer to delete
any LP_DEAD-set items first". nbtree must always treat the hint as a
noisy signal about what might work, as a strategy of last resort, with
costs imposed on non-HOT updaters. (The same thing might not be true
within another index AM that applies the hint, which is why the original
behavior is preserved on HEAD.)
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com>
Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us
Backpatch: 14-, where the hinting mechanism was added.
Experimenting with FIPS mode enabled, I saw
regression=# \password joe
Enter new password for user "joe":
Enter it again:
could not encrypt password: disabled for FIPS
out of memory
because PQencryptPasswordConn was still of the opinion that "out of
memory" is always appropriate to print.
Minor oversight in b69aba745. Like that one, back-patch to v14.
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.
The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.
This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error(). The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation. The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.
The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details. This way, any future changes would warn if attempting to
free these strings. The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.
The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility. Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.
Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
This addresses some problems in the describe queries used for extended
statistics:
- Two schema qualifications for the text type were missing for \dX.
- The list of extended statistics listed for a table through \d was
ordered based on the object OIDs, but it is more consistent with the
other commands to order by namespace and then by object name.
- A couple of aliases were not used in \d. These are removed.
This is similar to commits 1f092a3 and 07f8a9e.
Author: Justin Pryzby
Discussion: https://postgr.es/m/20220107022235.GA14051@telsasoft.com
Backpatch-through: 14