I happened to notice that if compiled --with-gssapi, 9.6's
contrib/pgcrypto tests report memory stomps for some SHA operations.
Both MEMORY_CONTEXT_CHECKING and valgrind agree there's a problem,
though nothing crashes; it appears that the buffer overrun
only extends into alignment padding, at least on 64-bit hardware.
Investigation found that pgcrypto's references to SHA224_Init
et al were being captured by the system OpenSSL library, which
of course has slightly incompatible definitions of those functions.
We long ago noticed this problem with respect to the sibling
functions SHA256_Init and so on, and commit 56f44784f introduced
renaming macros to dodge the problem for those. However, it didn't
cover the SHA224 family because we didn't use that at the time.
When commit 1abf76e82 added those awhile later, it neglected to add
a similar renaming macro. Better late than never, so do so now.
This appears to affect all branches 8.2 - 9.6, so it's surprising
nobody noticed before now. Maybe the effect is somehow specific
to the way RHEL8 intertwines its GSS and SSL libraries? Anyway,
we refactored all this stuff in v10, so newer branches don't have
the problem.
For no obvious reason, isolationtester has always insisted that
session and step names be written with double quotes. This is
fairly tedious and does little for test readability, especially
since the names that people actually choose almost always look
like normal identifiers. Hence, let's tweak the lexer to allow
SQL-like identifiers not only double-quoted strings.
(They're SQL-like, not exactly SQL, because I didn't add any
case-folding logic. Also there's no provision for U&"..." names,
not that anyone's likely to care.)
There is one incompatibility introduced by this change: if you write
"foo""bar" with no space, that used to be taken as two identifiers,
but now it's just one identifier with an embedded quote mark.
I converted all the src/test/isolation/ specfiles to remove
unnecessary double quotes, but stopped there because my
eyes were glazing over already.
Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.
Discussion: https://postgr.es/m/759113.1623861959@sss.pgh.pa.us
The syntax summaries for CREATE FUNCTION and allied commands
made it look like LEAKPROOF is an alternative to
IMMUTABLE/STABLE/VOLATILE, when of course it is an orthogonal
option. Improve that.
Per gripe from aazamrafeeque0. Thanks to David Johnston for
suggestions.
Discussion: https://postgr.es/m/162444349581.694.5818572718530259025@wrigleys.postgresql.org
Our uses of gss_display_status() and gss_display_name() assumed
that the gss_buffer_desc strings returned by those functions are
null-terminated. It appears that they generally are, given the
lack of field complaints up to now. However, the available
documentation does not promise this, and some man pages
for gss_display_status() show examples that rely on the
gss_buffer_desc.length field instead of expecting null
termination. Also, we now have a report that on some
implementations, clang's address sanitizer is of the opinion
that the byte after the specified length is undefined.
Hence, change the code to rely on the length field instead.
This might well be cosmetic rather than fixing any real bug, but
it's hard to be sure, so back-patch to all supported branches.
While here, also back-patch the v12 changes that made pg_GSS_error
deal honestly with multiple messages available from
gss_display_status.
Per report from Sudheer H R.
Discussion: https://postgr.es/m/5372B6D4-8276-42C0-B8FB-BD0918826FC3@tekenlight.com
Previously, isolationtester displayed SQL query results using some
ad-hoc code that clearly hadn't had much effort expended on it.
Field values longer than 14 characters weren't separated from
the next field, and usually caused misalignment of the columns
too. Also there was no visual separation of a query's result
from subsequent isolationtester output. This made test result
files confusing and hard to read.
To improve matters, let's use libpq's PQprint() function. Although
that's long since unused by psql, it's still plenty good enough
for the purpose here.
Like 741d7f104, back-patch to all supported branches, so that this
isn't a stumbling block for back-patching isolation test changes.
Discussion: https://postgr.es/m/582362.1623798221@sss.pgh.pa.us
We've long contended with isolation test results that aren't entirely
stable. Some test scripts insert long delays to try to force stable
results, which is not terribly desirable; but other erratic failure
modes remain, causing unrepeatable buildfarm failures. I've spent a
fair amount of time trying to solve this by improving the server-side
support code, without much success: that way is fundamentally unable
to cope with diffs that stem from chance ordering of arrival of
messages from different server processes.
We can improve matters on the client side, however, by annotating
the test scripts themselves to show the desired reporting order
of events that might occur in different orders. This patch adds
three types of annotations to deal with (a) test steps that might or
might not complete their waits before the isolationtester can see them
waiting; (b) test steps in different sessions that can legitimately
complete in either order; and (c) NOTIFY messages that might arrive
before or after the completion of a step in another session. We might
need more annotation types later, but this seems to be enough to deal
with the instabilities we've seen in the buildfarm. It also lets us
get rid of all the long delays that were previously used, cutting more
than a minute off the runtime of the isolation tests.
Back-patch to all supported branches, because the buildfarm
instabilities affect all the branches, and because it seems desirable
to keep isolationtester's capabilities the same across all branches
to simplify possible future back-patching of tests.
Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
Ordinarily, a pg_policy.polroles array wouldn't list the same role
more than once; but CREATE POLICY does not prevent that. If we
perform DROP OWNED BY on a role that is listed more than once,
RemoveRoleFromObjectPolicy either suffered an assertion failure
or encountered a tuple-updated-by-self error. Rewrite it to cope
correctly with duplicate entries, and add a CommandCounterIncrement
call to prevent the other problem.
Per discussion, there's other cleanup that ought to happen here,
but this seems like the minimum essential fix.
Per bug #17062 from Alexander Lakhin. It's been broken all along,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/17062-11f471ae3199ca23@postgresql.org
This works fine in the "simple Query" code path; but if the
statement is in the plan cache then it's corrupted for future
re-execution. Apply copyObject() to protect the original
tree from modification, as we've done elsewhere.
This narrow fix is applied only to the back branches. In HEAD,
the problem was fixed more generally by commit 7c337b6b5; but
that changed ProcessUtility's API, so it's infeasible to
back-patch.
Per bug #17053 from Charles Samborski.
Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us
Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
The original patch only targeted Python 2.6 and newer, since that is
what we have supported in PostgreSQL 13 and newer. For older
branches, we need to fix it up for older Python versions.
One of the error paths left *members uninitialized. That's not a live
bug, because most callers don't look at *members when the function
returns -1, but let's be tidy. One caller, in heap_lock_tuple(), does
"if (members != NULL) pfree(members)", but AFAICS it never passes an
invalid 'multi' value so it should not reach that error case.
The callers are also a bit inconsistent in their expectations.
heap_lock_tuple() pfrees the 'members' array if it's not-NULL, others
pfree() it if "nmembers >= 0", and others if "nmembers > 0". That's
not a live bug either, because the function should never return 0, but
add an Assert for that to make it more clear. I left the callers alone
for now.
I also moved the line where we set *nmembers. It wasn't wrong before,
but I like to do that right next to the 'return' statement, to make it
clear that it's always set on return.
Also remove one unreachable return statement after ereport(ERROR), for
brevity and for consistency with the similar if-block right after it.
Author: Greg Nancarrow with the additional changes by me
Backpatch-through: 9.6, all supported versions
In a synchronous logical setup, locking [user] catalog tables can cause
deadlock. This is because logical decoding of transactions can lock
catalog tables to access them so exclusively locking those in transactions
can lead to deadlock. To avoid this users must refrain from having
exclusive locks on catalog tables.
Author: Takamichi Osumi
Reviewed-by: Vignesh C, Amit Kapila
Backpatch-through: 9.6
Discussion: https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky%40alap3.anarazel.de
This is useful for developers to find out if an isolation spec is
over-engineered or if it needs more work by warning at the end of a
test run if a step is not used, generating a failure with extra diffs.
While on it, clean up all the specs which include steps not used in any
permutations to simplify them.
This is a backpatch of 989d23b and 06fdc4e, as it is becoming useful to
make all the branches consistent for an upcoming patch that will improve
the output generated by isolationtester.
Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6
The original purpose of the dry-run mode is to be able to print all the
possible permutations from a spec file, but it has become less useful
since isolation tests have improved regarding deadlock detection as one
step not wanted by the author could block indefinitely now (originally
the step blocked would have been detected rather quickly). Per
discussion, let's remove it.
This is a backpatch of 9903338 for 9.6~12. It is proving to become
useful to have on those branches so as the code gets consistent across
all supported versions, as a matter of improving the output generated by
isolationtester.
Author: Michael Paquier
Reviewed-by: Asim Praveen, Melanie Plageman
Discussion: https://postgr.es/m/20190819080820.GG18166@paquier.xyz
Discussion: https://postgr.es/m/794820.1623872009@sss.pgh.pa.us
Backpatch-through: 9.6
When stuffing a plan from the plancache into a Portal, one is
not supposed to risk throwing an error between GetCachedPlan and
PortalDefineQuery; if that happens, the plan refcount incremented
by GetCachedPlan will be leaked. I managed to break this rule
while refactoring code in 9dbf2b7d7. There is no visible
consequence other than some memory leakage, and since nobody is
very likely to trigger the relevant error conditions many times
in a row, it's not surprising we haven't noticed. Nonetheless,
it's a bug, so rearrange the order of operations to remove the
hazard.
Noted on the way to looking for a better fix for bug #17053.
This mistake is pretty old, so back-patch to all supported
branches.
TestLib::perl2host can take a file argument as well as a directory
argument, so that code becomes substantially simpler. Also add comments
on why we're using forward slashes, and why we're setting
PERL_BADLANG=0.
Discussion: https://postgr.es/m/e9947bcd-20ee-027c-f0fe-01f736b7e345@dunslane.net
During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.
The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.
Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
Recent glibc versions have made mktime() fail if tm_isdst is
inconsistent with the prevailing timezone; in particular it fails for
tm_isdst = 1 when the zone is UTC. (This seems wildly inconsistent
with the POSIX-mandated treatment of "incorrect" values for the other
fields of struct tm, so if you ask me it's a bug, but I bet they'll
say it's intentional.) This has been observed to cause cosmetic
problems when pg_restore'ing an archive created in a different
timezone.
To fix, do mktime() using the field values from the archive, and if
that fails try again with tm_isdst = -1. This will give a result
that's off by the UTC-offset difference from the original zone, but
that was true before, too. It's not terribly critical since we don't
do anything with the result except possibly print it. (Someday we
should flush this entire bit of logic and record a standard-format
timestamp in the archive instead. That's not okay for a back-patched
bug fix, though.)
Also, guard our only other use of mktime() by having initdb's
build_time_t() set tm_isdst = -1 not 0. This case could only have
an issue in zones that are DST year-round; but I think some do exist,
or could in future.
Per report from Wells Oliver. Back-patch to all supported
versions, since any of them might need to run with a newer glibc.
Discussion: https://postgr.es/m/CAOC+FBWDhDHO7G-i1_n_hjRzCnUeFO+H-Czi1y10mFhRWpBrew@mail.gmail.com
Translate path slashes on target directory path. This was confusing old
branches, but is applied to all branches for the sake of uniformity.
Perl is perfectly able to understand paths with forward slashes.
Along the way, restore the previous archive_wait query, for the sake of
uniformity with other tests, per gripe from Tom Lane.
This is similar to the work done in 8279f68 for TestLib.pm, where
environment variables set may cause unwanted failures if using a
temporary installation with pg_regress. The list of variables reset is
adjusted in each stable branch depending on what is supported.
Comments are added to remember that the lists in TestLib.pm and
pg_regress.c had better be kept in sync.
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/YMNR9GYDn+fHlMta@paquier.xyz
Backpatch-through: 9.6
Previously, a zero value for the relfilenode resulted in
a confusing error message about "unexpected duplicate".
This function returns NULL for other invalid relfilenode
values, so zero should be treated likewise.
It's been like this all along, so back-patch to all supported
branches.
Justin Pryzby
Discussion: https://postgr.es/m/20210612023324.GT16435@telsasoft.com
Commit caba8f0d43 wasn't quite right for msys, as demonstrated by
several buildfarm animals, including jacana and fairywren. We need to
use the msys perl in the archive command, but call it in such a way that
Windows will understand the path. Furthermore, inside the copy script we
need to convert a Windows path to an msys path.
This variable was present in the list added by 9d660670, but it is not
supported by this branch. Issue noticed while diving into a similar
change for pg_regress.c.
Backpatch-through: 9.6
This only happens if (1) the new standby has no WAL available locally,
(2) the new standby is starting from the old timeline, (3) the promotion
happened in the WAL segment from which the new standby is starting,
(4) the timeline history file for the new timeline is available from
the archive but the WAL files for are not (i.e. this is a race),
(5) the WAL files for the new timeline are available via streaming,
and (6) recovery_target_timeline='latest'.
Commit ee994272ca introduced this
logic and was an improvement over the previous code, but it mishandled
this case. If recovery_target_timeline='latest' and restore_command is
set, validateRecoveryParameters() can change recoveryTargetTLI to be
different from receiveTLI. If streaming is then tried afterward,
expectedTLEs gets initialized with the history of the wrong timeline.
It's supposed to be a list of entries explaining how to get to the
target timeline, but in this case it ends up with a list of entries
explaining how to get to the new standby's original timeline, which
isn't right.
Dilip Kumar and Robert Haas, reviewed by Kyotaro Horiguchi.
Discussion: http://postgr.es/m/CAFiTN-sE-jr=LB8jQuxeqikd-Ux+jHiXyh4YDiZMPedgQKup0g@mail.gmail.com
The 'lsn' and 'wait_for_catchup' methods only exist in v10 and
higher, but are needed in order to support a test planned test
case for a bug that exists all the way back to v9.6. To minimize
cross-branch differences in the test case, back-port these
methods.
Discussion: http://postgr.es/m/CA+TgmoaG5dmA_8Xc1WvbvftPjtwx5uzkGEHxE7MiJ+im9jynmw@mail.gmail.com
The set of subcommands supported by \dAp, \do and \dy was described
incorrectly in psql's --help. The documentation was already consistent
with the code.
Reported-by: inoas, from IRC
Author: Matthijs van der Vleuten
Reviewed-by: Neil Chen
Discussion: https://postgr.es/m/6a984e24-2171-4039-9050-92d55e7b23fe@www.fastmail.com
Backpatch-through: 9.6
An incorrectly-encoded multibyte character near the end of a string
could cause various processing loops to run past the string's
terminating NUL, with results ranging from no detectable issue to
a program crash, depending on what happens to be in the following
memory.
This isn't an issue in the server, because we take care to verify
the encoding of strings before doing any interesting processing
on them. However, that lack of care leaked into client-side code
which shouldn't assume that anyone has validated the encoding of
its input.
Although this is certainly a bug worth fixing, the PG security team
elected not to regard it as a security issue, primarily because
any untrusted text should be sanitized by PQescapeLiteral or
the like before being incorporated into a SQL or psql command.
(If an app fails to do so, the same technique can be used to
cause SQL injection, with probably much more dire consequences
than a mere client-program crash.) Those functions were already
made proof against this class of problem, cf CVE-2006-2313.
To fix, invent PQmblenBounded() which is like PQmblen() except it
won't return more than the number of bytes remaining in the string.
In HEAD we can make this a new libpq function, as PQmblen() is.
It seems imprudent to change libpq's API in stable branches though,
so in the back branches define PQmblenBounded as a macro in the files
that need it. (Note that just changing PQmblen's behavior would not
be a good idea; notably, it would completely break the escaping
functions' defense against this exact problem. So we just want a
version for those callers that don't have any better way of handling
this issue.)
Per private report from houjingyi. Back-patch to all supported branches.
Back-patch a minimal subset of commits fffd651e8 and 46912d9b1,
to support strnlen() on all platforms without adding any callers.
This will be needed by a following bug fix.
The Msys shell mangles certain patterns in its command line, so avoid
handing arbitrary SQL to psql on the command line and instead use
IPC::Run's redirection facility for stdin. This pattern is already
mostly whats used, but query_poll_until() was not doing the right thing.
Problem discovered on the buildfarm when a new TAP test failed on msys.
The internal SQL queries used by REFRESH MATERIALIZED VIEW CONCURRENTLY
include some aliases for its diff and temporary relations with
rather-generic names: diff, newdata, newdata2 and mv. Depending on the
queries used for the materialized view, using CONCURRENTLY could lead to
some internal failures if the query and those internal aliases conflict.
Those names have been chosen in 841c29c8. This commit switches instead
to a naming pattern which is less likely going to cause conflicts, based
on an idea from Thomas Munro, by appending _$ to those aliases. This is
not perfect as those new names could still conflict, but at least it has
the advantage to keep the code readable and simple while reducing the
likelihood of conflicts to be close to zero.
Reported-by: Mathis Rudolf
Author: Bharath Rupireddy
Reviewed-by: Bernd Helmle, Thomas Munro, Michael Paquier
Discussion: https://postgr.es/m/109c267a-10d2-3c53-b60e-720fcf44d9e8@credativ.de
Backpatch-through: 9.6
Various environment variables were not getting reset in the TAP tests,
which would cause failures depending on the tests or the environment
variables involved. For example, PGSSL{MAX,MIN}PROTOCOLVERSION could
cause failures in the SSL tests. Even worse, a junk value of
PGCLIENTENCODING makes a server startup fail. The list of variables
reset is adjusted in each stable branch depending on what is supported.
While on it, simplify a bit the code per a suggestion from Andrew
Dunstan, using a list of variables instead of doing single deletions.
Reviewed-by: Andrew Dunstan, Daniel Gustafsson
Discussion: https://postgr.es/m/YLbjjRpucIeZ78VQ@paquier.xyz
Backpatch-through: 9.6
This case should be disallowed, just as FOR UPDATE with a plain
GROUP BY is disallowed; FOR UPDATE only makes sense when each row
of the query result can be identified with a single table row.
However, we missed teaching CheckSelectLocking() to check
groupingSets as well as groupClause, so that it would allow
degenerate grouping sets. That resulted in a bad plan and
a null-pointer dereference in the executor.
Looking around for other instances of the same bug, the only one
I found was in examine_simple_variable(). That'd just lead to
silly estimates, but it should be fixed too.
Per private report from Yaoguang Chen.
Back-patch to all supported branches.
The deliverables of upstream Kerberos on Windows are installed with
paths that do not match our MSVC scripts. First, the include folder was
named "inc/" in our scripts, but the upstream MSIs use "include/".
Second, the build would fail with 64-bit environments as the libraries
are named differently.
This commit adjusts the MSVC scripts to be compatible with the latest
installations of upstream, and I have checked that the compilation was
able to work with the 32-bit and 64-bit installations.
Special thanks to Kondo Yuta for the help in investigating the situation
in hamerkop, which had an incorrect configuration for the GSS
compilation.
Reported-by: Brian Ye
Discussion: https://postgr.es/m/162128202219.27274.12616756784952017465@wrigleys.postgresql.org
Backpatch-through: 9.6
The following parameters have been imprecise, or incorrect, about their
description (PGC_POSTMASTER or PGC_SIGHUP):
- autovacuum_work_mem (docs, as of 9.6~)
- huge_page_size (docs, as of 14~)
- max_logical_replication_workers (docs, as of 10~)
- max_sync_workers_per_subscription (docs, as of 10~)
- min_dynamic_shared_memory (docs, as of 14~)
- recovery_init_sync_method (postgresql.conf.sample, as of 14~)
- remove_temp_files_after_crash (docs, as of 14~)
- restart_after_crash (docs, as of 9.6~)
- ssl_min_protocol_version (docs, as of 12~)
- ssl_max_protocol_version (docs, as of 12~)
This commit adjusts the description of all these parameters to be more
consistent with the practice used for the others.
Revewed-by: Justin Pryzby
Discussion: https://postgr.es/m/YK2ltuLpe+FbRXzA@paquier.xyz
Backpatch-through: 9.6
SSL renegotiation is already disabled as of 48d23c72, however this does
not prevent the server to comply with a client willing to use
renegotiation. In the last couple of years, renegotiation had its set
of security issues and flaws (like the recent CVE-2021-3449), and it
could be possible to crash the backend with a client attempting
renegotiation.
This commit takes one extra step by disabling renegotiation in the
backend in the same way as SSL compression (f9264d15) or tickets
(97d3a0b0). OpenSSL 1.1.0h has added an option named
SSL_OP_NO_RENEGOTIATION able to achieve that. In older versions
there is an option called SSL3_FLAGS_NO_RENEGOTIATE_CIPHERS that
was undocumented, and could be set within the SSL object created when
the TLS connection opens, but I have decided not to use it, as it feels
trickier to rely on, and it is not official. Note that this option is
not usable in OpenSSL < 1.1.0h as the internal contents of the *SSL
object are hidden to applications.
SSL renegotiation concerns protocols up to TLSv1.2.
Per original report from Robert Haas, with a patch based on a suggestion
by Andres Freund.
Author: Michael Paquier
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/YKZBXx7RhU74FlTE@paquier.xyz
Backpatch-through: 9.6
"typename" is a C++ keyword, so pg_upgrade.h fails to compile in C++.
Fortunately, there seems no likely reason for somebody to need to
do that. Nonetheless, it's project policy that all .h files should
pass cpluspluscheck, so rename the argument to fix that.
Oversight in 57c081de0; back-patch as that was. (The policy requiring
pg_upgrade.h to pass cpluspluscheck only goes back to v12, but it
seems best to keep this code looking the same in all branches.)
ForgetBackgroundWorker lacked any memory barrier at all, while
BackgroundWorkerStateChange had one but unaccountably did
additional manipulation of the slot after the barrier. AFAICS,
the rule must be that the barrier is immediately before setting
or clearing slot->in_use.
It looks like back in 9.6 when ForgetBackgroundWorker was first
written, there might have been some case for not needing a
barrier there, but I'm not very convinced of that --- the fact
that the load of bgw_notify_pid is in the caller doesn't seem
to guarantee no memory ordering problem. So patch 9.6 too.
It's likely that this doesn't fix any observable bug on Intel
hardware, but machines with weaker memory ordering rules could
have problems here.
Discussion: https://postgr.es/m/4046084.1620244003@sss.pgh.pa.us
Formerly we just relied on operator classes that assert longValuesOK
to eventually shorten the leaf value enough to fit on an index page.
That fails since the introduction of INCLUDE-column support (commit
09c1c6ab4), because the INCLUDE columns might alone take up more
than a page, meaning no amount of leaf-datum compaction will get
the job done. At least with spgtextproc.c, that leads to an infinite
loop, since spgtextproc.c won't throw an error for not being able
to shorten the leaf datum anymore.
To fix without breaking cases that would otherwise work, add logic
to spgdoinsert() to verify that the leaf tuple size is decreasing
after each "choose" step. Some opclasses might not decrease the
size on every single cycle, and in any case, alignment roundoff
of the tuple size could obscure small gains. Therefore, allow
up to 10 cycles without additional savings before throwing an
error. (Perhaps this number will need adjustment, but it seems
quite generous right now.)
As long as we've developed this logic, let's back-patch it.
The back branches don't have INCLUDE columns to worry about, but
this seems like a good defense against possible bugs in operator
classes. We already know that an infinite loop here is pretty
unpleasant, so having a defense seems to outweigh the risk of
breaking things. (Note that spgtextproc.c is actually the only
known opclass with longValuesOK support, so that this is all moot
for known non-core opclasses anyway.)
Per report from Dilip Kumar.
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com
Knowing that a buggy opclass could cause an infinite insertion loop,
spgdoinsert() intended to allow its loop to be interrupted by query
cancel. However, that never actually worked, because in iterations
after the first, we'd be holding buffer lock(s) which would cause
InterruptHoldoffCount to be positive, preventing servicing of the
interrupt.
To fix, check if an interrupt is pending, and if so fall out of
the insertion loop and service the interrupt after we've released
the buffers. If it was indeed a query cancel, that's the end of
the matter. If it was a non-canceling interrupt reason, make use
of the existing provision to retry the whole insertion. (This isn't
as wasteful as it might seem, since any upper-level index tuples we
already created should be usable in the next attempt.)
While there's no known instance of such a bug in existing release
branches, it still seems like a good idea to back-patch this to
all supported branches, since the behavior is fairly nasty if a
loop does happen --- not only is it uncancelable, but it will
quickly consume memory to the point of an OOM failure. In any
case, this code is certainly not working as intended.
Per report from Dilip Kumar.
Discussion: https://postgr.es/m/CAFiTN-uxP_soPhVG840tRMQTBmtA_f_Y8N51G7DKYYqDh7XN-A@mail.gmail.com
Split up CHECK_FOR_INTERRUPTS() to provide an additional macro
INTERRUPTS_PENDING_CONDITION(), which just tests whether an
interrupt is pending without attempting to service it. This is
useful in situations where the caller knows that interrupts are
blocked, and would like to find out if it's worth the trouble
to unblock them.
Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether
CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt.
This commit doesn't actually add any uses of the new macros,
but a follow-on bug fix will do so. Back-patch to all supported
branches to provide infrastructure for that fix.
Alvaro Herrera and Tom Lane
Discussion: https://postgr.es/m/20210513155351.GA7848@alvherre.pgsql
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns. That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.
This had escaped notice through a confluence of missing sanity checks,
including
* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype. While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.
* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers. Add assertions there too.
* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist. That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.
In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns. This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.
In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot. (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10. In 9.6, projections work much differently and
we can't cheaply give them such an option. The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.
In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table. Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated. Fix by preserving
resjunk columns in that routine. (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)
Per report from Andres Freund. Back-patch to all supported branches.
Security: CVE-2021-32028
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked. This allows situations where
lower_bound + dimension overflows an integer. It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.
Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow. (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX. In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)
Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps. Fix that too.
Security: CVE-2021-32027