With its current design, a careless use of pg_cryptohash_final() could
would result in an out-of-bound write in memory as the size of the
destination buffer to store the result digest is not known to the
cryptohash internals, without the caller knowing about that. This
commit adds a new argument to pg_cryptohash_final() to allow such sanity
checks, and implements such defenses.
The internals of SCRAM for HMAC could be tightened a bit more, but as
everything is based on SCRAM_KEY_LEN with uses particular to this code
there is no need to complicate its interface more than necessary, and
this comes back to the refactoring of HMAC in core. Except that, this
minimizes the uses of the existing DIGEST_LENGTH variables, relying
instead on sizeof() for the result sizes. In ossp-uuid, this also makes
the code more defensive, as it already relied on dce_uuid_t being at
least the size of a MD5 digest.
This is in philosophy similar to cfc40d3 for base64.c and aef8948 for
hex.c.
Reported-by: Ranier Vilela
Author: Michael Paquier, Ranier Vilela
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/CAEudQAoqEGmcff3J4sTSV-R_16Monuz-UpJFbf_dnVH=APr02Q@mail.gmail.com
When REG_DEBUG is defined, ensure that an un-filled "struct cnfa"
is all-zeroes, not just that it has nstates == 0. This is mainly
so that looking at "struct subre" structs in gdb doesn't distract
one with a lot of garbage fields during regex compilation.
Adjust some places that print debug output to have suitable fflush
calls afterwards.
In passing, correct an erroneous ancient comment: the concatenation
subre-s created by parsebranch() have op == '.' not ','.
Noted while fooling around with some regex performance improvements.
Given a regex pattern with a very long fixed prefix (approaching 500
characters), the result of pow(FIXED_CHAR_SEL, fixed_prefix_len) can
underflow to zero. Typically the preceding selectivity calculation
would have underflowed as well, so that we compute 0/0 and get NaN.
In released branches this leads to an assertion failure later on.
That doesn't happen in HEAD, for reasons I've not explored yet,
but it's surely still a bug.
To fix, just skip the division when the pow() result is zero, so
that we'll (most likely) return a zero selectivity estimate. In
the edge cases where "sel" didn't yet underflow, perhaps this
isn't desirable, but I'm not sure that the case is worth spending
a lot of effort on. The results of regex_selectivity_sub() are
barely worth the electrons they're written on anyway :-(
Per report from Alexander Lakhin. Back-patch to all supported versions.
Discussion: https://postgr.es/m/6de0a0c3-ada9-cd0c-3e4e-2fa9964b41e3@gmail.com
Modern gcc and clang compilers offer alignment sanitizers, which help to detect
pointer misalignment. However, our codebase already contains x86-specific
crc32 computation code, which uses unalignment access. Thankfully, those
compilers also support the attribute, which disables alignment sanitizers at
the function level. This commit adds pg_attribute_no_sanitize_alignment(),
which wraps this attribute, and applies it to pg_comp_crc32c_sse42() function.
Discussion: https://postgr.es/m/CAPpHfdsne3%3DT%3DfMNU45PtxdhSL_J2PjLTeS8rwKnJzUR4YNd4w%40mail.gmail.com
Discussion: https://postgr.es/m/475514.1612745257%40sss.pgh.pa.us
Author: Alexander Korotkov, revised by Tom Lane
Reviewed-by: Tom Lane
We want to test the variants of Alter Subscription that are not allowed in
the transaction block but for that, we don't need to create a subscription
that tries to connect to the publisher. As such, there is no problem with
this test but it is good to allow such tests to run with
wal_level = minimal and max_wal_senders = 0 so as to keep them consistent
with other tests.
Reported by buildfarm.
Author: Amit Kapila
Reviewed-by: Ajin Cherian
Discussion: https://postgr.es/m/CAA4eK1Lw0V+e1JPGHDq=+hVACv=14H8sR+2eJ1k3PEgwKmU-jQ@mail.gmail.com
For the initial table data synchronization in logical replication, we use
a single transaction to copy the entire table and then synchronize the
position in the stream with the main apply worker.
There are multiple downsides of this approach: (a) We have to perform the
entire copy operation again if there is any error (network breakdown,
error in the database operation, etc.) while we synchronize the WAL
position between tablesync worker and apply worker; this will be onerous
especially for large copies, (b) Using a single transaction in the
synchronization-phase (where we can receive WAL from multiple
transactions) will have the risk of exceeding the CID limit, (c) The slot
will hold the WAL till the entire sync is complete because we never commit
till the end.
This patch solves all the above downsides by allowing multiple
transactions during the tablesync phase. The initial copy is done in a
single transaction and after that, we commit each transaction as we
receive. To allow recovery after any error or crash, we use a permanent
slot and origin to track the progress. The slot and origin will be removed
once we finish the synchronization of the table. We also remove slot and
origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or
ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not
finished.
The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and
ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true
cannot be executed inside a transaction block because they can now drop
the slots for which we have no provision to rollback.
This will also open up the path for logical replication of 2PC
transactions on the subscriber side. Previously, we can't do that because
of the requirement of maintaining a single transaction in tablesync
workers.
Bump catalog version due to change of state in the catalog
(pg_subscription_rel).
Author: Peter Smith, Amit Kapila, and Takamichi Osumi
Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila
Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
The pages_removed field is no longer used for anything. It hasn't been
possible for an index to physically shrink since old-style VACUUM FULL
was removed by commit 0a469c87.
The stanza in ECPGconnect() that intended to allow specification of a
Unix socket directory path in place of a port has never executed since
it was committed, nearly two decades ago; the preceding strrchr()
already found the last colon so there cannot be another one. The lack
of complaints about that is doubtless related to the fact that no
user-facing documentation suggested it was possible.
Rather than try to fix that up, let's just remove the unreachable
code, and instead document the way that does work to write a socket
directory path, namely specifying it as a "host" option.
In support of that, make another pass at clarifying the syntax
documentation for ECPG connection targets, particularly documenting
which things are parsed as identifiers and where to use double quotes.
Rearrange some things that seemed poorly ordered, and fix a couple of
minor doc errors.
Kyotaro Horiguchi, per gripe from Shenhao Wang
(docs changes mostly by me)
Discussion: https://postgr.es/m/ae52a416bbbf459c96bab30b3038e06c@G08CNEXMBPEKD06.g08.fujitsu.local
Explicitly testing for INT_MIN and INT_MAX isn't particularly good
style; it's tedious and may draw useless compiler warnings on
machines where int and long are the same width. We invented
strtoint() precisely for this usage, so use that instead.
While here, remove gratuitous variations in the way the tests for
did-strtoint-succeed were spelled. Also, avoid attempting to
negate INT_MIN; that would probably work given that the result
is implicitly cast to uint32, but I think it's nominally undefined
behavior.
Per gripe from Ranier Vilela, though this isn't his proposed patch.
Discussion: https://postgr.es/m/CAEudQAqge3QfzoBRhe59QrB_5g+NmQUj2QpzqZ9Nc7QepXGAEw@mail.gmail.com
In the wake of c028faf2a, this is no longer needed. I left it
out of that patch since the API change would be undesirable in
a released branch; but there's no reason not to do it in HEAD.
This commit makes more generic some comments and code related to the
compilation with OpenSSL and SSL in general to ease the addition of more
SSL implementations in the future. In libpq, some OpenSSL-only code is
moved under USE_OPENSSL and not USE_SSL.
While on it, make a comment more consistent in libpq-fe.h.
Author: Daniel Gustafsson
Discussion: https://postgr.es/m/5382CB4A-9CF3-4145-BA46-C802615935E0@yesql.se
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS. This data was lost on the new index after REINDEX
CONCURRENTLY.
The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable. Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.
Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
Currently, we get the origin id from the name and then drop the origin by
taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent
sessions can get the id from the name at the same time and then when they
try to drop the origin, one of the sessions will get the either
"tuple concurrently deleted" or "cache lookup failed for replication
origin ..".
To prevent this race condition we do the entire operation under lock. This
obviates the need for replorigin_drop() API and we have removed it so if
any extension authors are using it they need to instead use
replorigin_drop_by_name. See it's usage in pg_replication_origin_drop().
Author: Peter Smith
Reviewed-by: Amit Kapila, Euler Taveira, Petr Jelinek, and Alvaro
Herrera
Discussion: https://www.postgresql.org/message-id/CAHut%2BPuW8DWV5fskkMWWMqzt-x7RPcNQOtJQBp6SdwyRghCk7A%40mail.gmail.com
This commit adds new column "waitstart" into pg_locks view. This column
reports the time when the server process started waiting for the lock
if the lock is not held. This information is useful, for example, when
examining the amount of time to wait on a lock by subtracting
"waitstart" in pg_locks from the current time, and identify the lock
that the processes are waiting for very long.
This feature uses the current time obtained for the deadlock timeout
timer as "waitstart" (i.e., the time when this process started waiting
for the lock). Since getting the current time newly can cause overhead,
we reuse the already-obtained time to avoid that overhead.
Note that "waitstart" is updated without holding the lock table's
partition lock, to avoid the overhead by additional lock acquisition.
This can cause "waitstart" in pg_locks to become NULL for a very short
period of time after the wait started even though "granted" is false.
This is OK in practice because we can assume that users are likely to
look at "waitstart" when waiting for the lock for a long time.
Bump catalog version.
Author: Atsushi Torikoshi
Reviewed-by: Ian Lawrence Barwick, Robert Haas, Justin Pryzby, Fujii Masao
Discussion: https://postgr.es/m/a96013dc51cdc56b2a2b84fa8a16a993@oss.nttdata.com
This option controls if toast tables associated with a relation are
vacuumed or not when running a manual VACUUM. It was already possible
to trigger a manual VACUUM on a toast relation without processing its
main relation, but a manual vacuum on a main relation always forced a
vacuum on its toast table. This is useful in scenarios where the level
of bloat or transaction age of the main and toast relations differs a
lot.
This option is an extension of the existing VACOPT_SKIPTOAST that was
used by autovacuum to control if toast relations should be skipped or
not. This internal flag is renamed to VACOPT_PROCESS_TOAST for
consistency with the new option.
A new option switch, called --no-process-toast, is added to vacuumdb.
Author: Nathan Bossart
Reviewed-by: Kirk Jamison, Michael Paquier, Justin Pryzby
Discussion: https://postgr.es/m/BA8951E9-1524-48C5-94AF-73B1F0D7857F@amazon.com
scanNSItemForColumn, expandNSItemAttrs, and ExpandSingleTable would
pass the wrong RTE to markVarForSelectPriv when dealing with a join
ParseNamespaceItem: they'd pass the join RTE, when what we need to
mark is the base table that the join column came from. The end
result was to not fill the base table's selectedCols bitmap correctly,
resulting in an understatement of the set of columns that are read
by the query. The executor would still insist on there being at
least one selectable column; but with a correctly crafted query,
a user having SELECT privilege on just one column of a table would
nonetheless be allowed to read all its columns.
To fix, make markRTEForSelectPriv fetch the correct RTE for itself,
ignoring the possibly-mismatched RTE passed by the caller. Later,
we'll get rid of some now-unused RTE arguments, but that risks
API breaks so we won't do it in released branches.
This problem was introduced by commit 9ce77d75c, so back-patch
to v13 where that came in. Thanks to Sven Klemm for reporting
the problem.
Security: CVE-2021-20229
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
GlobalVisIsRemovableFullXid() is now GlobalVisCheckRemovableFullXid().
This is consistent with the general convention for FullTransactionId
equivalents of functions that deal with TransactionId values. It now
matches the nearby GlobalVisCheckRemovableXid() function, which performs
the same check for callers that use TransactionId values.
Oversight in commit dc7420c2c92.
Discussion: https://postgr.es/m/CAH2-Wzmes12jFNDcVgpU89Vp=r6uLFrE-MT0fjSWGsE70UiNaA@mail.gmail.com
This reverts commit ed290896335414c6c069b9ccae1f3dcdd2fac6ba and
equivalent back-branch commits. The issue is subtler than I thought,
and it's far from new, so just before a release deadline is no time
to be fooling with it. We'll consider what to do at a bit more
leisure.
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
The manual did not mention whether its return value is (first arg -
second arg) or (second arg - first arg). The order matters because the
return value could have a sign. Fix the manual so that it mentions the
function returns (first arg - second arg).
Patch reviewed by Tom Lane.
Back-patch through v13. Older version's doc format is difficult to add
more description.
Discussion: https://postgr.es/m/flat/20210206.151125.960423226279810864.t-ishii%40sraoss.co.jp
rewriteRuleAction() neglected this step, although it was careful to
propagate other similar flags such as hasSubLinks or hasRowSecurity.
Omitting to transfer hasRecursive is just cosmetic at the moment,
but omitting hasModifyingCTE is a live bug, since the executor
certainly looks at that.
The proposed test case only fails back to v10, but since the executor
examines hasModifyingCTE in 9.x as well, I suspect that a test case
could be devised that fails in older branches. Given the nearness
of the release deadline, though, I'm not going to spend time looking
for a better test.
Report and patch by Greg Nancarrow, cosmetic changes by me
Discussion: https://postgr.es/m/CAJcOf-fAdj=nDKMsRhQzndm-O13NY4dL6xGcEvdX5Xvbbi0V7g@mail.gmail.com
Generally, members of inheritance trees must be plain tables (or,
in more recent versions, foreign tables). ALTER TABLE INHERIT
rejects creating an inheritance relationship that has a view at
either end. When DefineQueryRewrite attempts to convert a relation
to a view, it already had checks prohibiting doing so for partitioning
parents or children as well as traditional-inheritance parents ...
but it neglected to check that a traditional-inheritance child wasn't
being converted. Since the planner assumes that any inheritance
child is a table, this led to making plans that tried to do a physical
scan on a view, causing failures (or even crashes, in recent versions).
One could imagine trying to support such a case by expanding the view
normally, but since the rewriter runs before the planner does
inheritance expansion, it would take some very fundamental refactoring
to make that possible. There are probably a lot of other parts of the
system that don't cope well with such a situation, too. For now,
just forbid it.
Per bug #16856 from Yang Lin. Back-patch to all supported branches.
(In versions before v10, this includes back-patching the portion of
commit 501ed02cf that added has_superclass(). Perhaps the lack of
that infrastructure partially explains the missing check.)
Discussion: https://postgr.es/m/16856-0363e05c6e1612fd@postgresql.org
The parallel slots infrastructure (which implements client-side
multiplexing of server connections doing similar things, not
threading or multiple processes or anything like that) are moved from
src/bin/scripts/scripts_parallel.c to src/fe_utils/parallel_slot.c.
The functions consumeQueryResult() and processQueryResult() which were
previously part of src/bin/scripts/common.c are now moved into that
file as well, becoming static helper functions. This might need to be
changed in the future, but currently they're not used for anything
else.
Some other functions from src/bin/scripts/common.c are moved to to
src/fe_utils and are split up among several files. connectDatabase(),
connectMaintenanceDatabase(), and disconnectDatabase() are moved to
connect_utils.c. executeQuery(), executeCommand(), and
executeMaintenanceCommand() are move to query_utils.c.
handle_help_version_opts() is moved to option_utils.c.
Mark Dilger, reviewed by me. The larger patch series of which this is
a part has also had review from Peter Geoghegan, Andres Freund, Álvaro
Herrera, Michael Paquier, and Amul Sul, but I don't know whether any
of them have reviewed this bit specifically.
Discussion: http://postgr.es/m/12ED3DA8-25F0-4B68-937D-D907CFBF08E7@enterprisedb.com
Discussion: http://postgr.es/m/5F743835-3399-419C-8324-2D424237E999@enterprisedb.com
Discussion: http://postgr.es/m/70655DF3-33CE-4527-9A4D-DDEB582B6BA0@enterprisedb.com
If a multi-byte character is escaped with a backslash in TEXT mode input,
and the encoding is one of the client-only encodings where the bytes after
the first one can have an ASCII byte "embedded" in the char, we didn't
skip the character correctly. After a backslash, we only skipped the first
byte of the next character, so if it was a multi-byte character, we would
try to process its second byte as if it was a separate character. If it
was one of the characters with special meaning, like '\n', '\r', or
another '\\', that would cause trouble.
One such exmple is the byte sequence '\x5ca45c2e666f6f' in Big5 encoding.
That's supposed to be [backslash][two-byte character][.][f][o][o], but
because the second byte of the two-byte character is 0x5c, we incorrectly
treat it as another backslash. And because the next character is a dot, we
parse it as end-of-copy marker, and throw an "end-of-copy marker corrupt"
error.
Backpatch to all supported versions.
Reviewed-by: John Naylor, Kyotaro Horiguchi
Discussion: https://www.postgresql.org/message-id/a897f84f-8dca-8798-3139-07da5bb38728%40iki.fi
Commit 08d2d58a2 added an assertion assuming that the retrieved_rows
estimate for a foreign relation, which is re-used to cost pre-sorted
foreign paths with local stats, is set to at least one row in
estimate_path_cost_size(), which isn't correct because if the relation
is a foreign table with tuples=0, the estimate would be set to 0 there
when not using remote estimates.
Per bug #16807 from Alexander Lakhin. Back-patch to v13 where the
aforementioned commit went in.
Author: Etsuro Fujita
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/16807-9fe4e08fbaa5c7ce%40postgresql.org
Commit 230230223 taught nodeAgg.c that, when spilling tuples from
memory in an oversized hash aggregation, it only needed to spill
input columns referenced in the node's tlist and quals. Unfortunately,
that's wrong: we also have to save the grouping columns. The error
is masked in common cases because the grouping columns also appear
in the tlist, but that's not necessarily true. The main category
of plans where it's not true seem to come from semijoins ("WHERE
outercol IN (SELECT innercol FROM innertable)") where the innercol
needs an implicit promotion to make it comparable to the outercol.
The grouping column will be "innercol::promotedtype", but that
expression appears nowhere in the Agg node's own tlist and quals;
only the bare "innercol" is found in the tlist.
I spent quite a bit of time looking for a suitable regression test
case for this, without much success. If the number of distinct
values of the innercol is large enough to make spilling happen,
the planner tends to prefer a non-HashAgg plan, at least for
problem sizes that are reasonable to use in the regression tests.
So, no new regression test. However, this patch does demonstrably
fix the originally-reported test case.
Per report from s.p.e (at) gmx-topmail.de. Backpatch to v13
where the troublesome code came in.
Discussion: https://postgr.es/m/trinity-1c565d44-159f-488b-a518-caf13883134f-1611835701633@3c-app-gmx-bap78
switchToPresortedPrefixMode() did the wrong thing if it detected
a batch boundary just at the last tuple of a fullsort group.
The initially-reported symptom was a "retrieved too many tuples in a
bounded sort" error, but the test case added here just silently gives
the wrong answer without this patch.
I (tgl) am not really happy about committing this patch without review
from the incremental-sort authors, but they seem AWOL and we are hard
against a release deadline. This does demonstrably make some cases
better, anyway.
Per bug #16846 from Yoran Heling. Back-patch to v13 where incremental
sort was introduced.
Neil Chen
Discussion: https://postgr.es/m/16846-ae49f51ac379a4cb@postgresql.org
Add some additional defensive checks in the second phase of index
deletion to detect and report index corruption during VACUUM, and to
avoid having VACUUM become stuck in more cases. The code is still not
robust in the presence of a circular chain of sibling links, though it's
not clear whether that really matters. This is follow-up work to commit
3a01f68e.
The new defensive checks rely on the assumption that there can be no
more than one VACUUM operation running for an index at any given time.
Remove an old comment suggesting that multiple concurrent VACUUMs need
to be considered here. This concern now seems highly unlikely to have
any real validity, since we clearly rely on the same assumption in
several other places. For example, there are much more recent comments
that appear in the same function (added by commit efada2b8e92) that make
the same assumption.
Also add a CHECK_FOR_INTERRUPTS() to the relevant code path. Contrary
to comments added by commit 3a01f68e, it is actually possible to handle
interrupts here, at least in the common case where processing takes
place at the leaf level. We only hold a pin on leafbuf/target page when
stepping right at the leaf level.
No backpatch due to the lack of complaints following hardening added to
the same area by commit 3a01f68e.
The # of bytes processed was accumulated slightly incorrectly. After
loading more data to the input buffer, we added the number of bytes in
the buffer to the sum. But in case of multi-byte characters or escapes,
there can be a few unprocessed bytes left over from previous load in the
buffer. Those bytes got counted twice.
In the error messages referring to the user right "Lock pages in
memory", this is a term from the Windows OS, so it should be
translated in accordance with the OS localization. Refactor the error
messages so this is easier and clearer. Also fix the capitalization
to match the existing capitalization in the OS.
This patch adds the possibility to move indexes to a new tablespace
while rebuilding them. Both the concurrent and the non-concurrent cases
are supported, and the following set of restrictions apply:
- When using TABLESPACE with a REINDEX command that targets a
partitioned table or index, all the indexes of the leaf partitions are
moved to the new tablespace. The tablespace references of the non-leaf,
partitioned tables in pg_class.reltablespace are not changed. This
requires an extra ALTER TABLE SET TABLESPACE.
- Any index on a toast table rebuilt as part of a parent table is kept
in its original tablespace.
- The operation is forbidden on system catalogs, including trying to
directly move a toast relation with REINDEX. This results in an error
if doing REINDEX on a single object. REINDEX SCHEMA, DATABASE and
SYSTEM skip system relations when TABLESPACE is used.
Author: Alexey Kondratov, Michael Paquier, Justin Pryzby
Reviewed-by: Álvaro Herrera, Michael Paquier
Discussion: https://postgr.es/m/8a8f5f73-00d3-55f8-7583-1375ca8f6a91@postgrespro.ru
If a portal is used to run a prepared CALL or DO statement that
contains a ROLLBACK, PortalRunMulti fails because the portal's
statement list gets cleared by the rollback. (Since the grammar
doesn't allow CALL/DO in PREPARE, the only easy way to get to this is
via extended query protocol, which treats all inputs as prepared
statements.) It's difficult to avoid resetting the portal early
because of resource-management issues, so work around this by teaching
PortalRunMulti to be wary of portal->stmts having suddenly become NIL.
The crash has only been seen to occur in v13 and HEAD (as a
consequence of commit 1cff1b95a having added an extra touch of
portal->stmts). But even before that, the code involved touching a
List that the portal no longer has any claim on. In the test case at
hand, the List will still exist because of another refcount on the
cached plan; but I'm far from convinced that it's impossible for the
cached plan to have been dropped by the time control gets back to
PortalRunMulti. Hence, backpatch to v11 where nested transactions
were added.
Thomas Munro and Tom Lane, per bug #16811 from James Inform
Discussion: https://postgr.es/m/16811-c1b599b2c6c2d622@postgresql.org