When working with an online source cluster, pg_rewind gets a list of all
the files in the source data directory using a WITH RECURSIVE query,
returning a NULL result for a file's metadata if it gets removed between
the moment it is listed in a directory and the moment its metadata is
obtained with pg_stat_file() (say a recycled WAL segment). The query
result was processed in such a way that for each tuple we checked only
that the first file's metadata was NULL. This could have two
consequences, both resulting in a failure of the rewind:
- If the first tuple referred to a removed file, all files from the
source would be ignored.
- Any file actually missing would not be considered as such.
While on it, rework slightly the code so as no values are saved if we
know that a file is going to be skipped.
Issue introduced by b36805f, so backpatch down to 9.5.
Author: Justin Pryzby, Michael Paquier
Reviewed-by: Daniel Gustafsson, Masahiko Sawada
Discussion: https://postgr.es/m/20200713061010.GC23581@telsasoft.com
Backpatch-through: 9.5
An ALTER TABLE to validate a foreign key in which another subcommand
already caused a pending table rewrite could fail due to ALTER TABLE
attempting to validate the foreign key before the actual table rewrite
takes place. This situation could result in an error such as:
ERROR: could not read block 0 in file "base/nnnnn/nnnnn": read only 0 of 8192 bytes
The failure here was due to the SPI call which validates the foreign key
trying to access an index which is yet to be rebuilt.
Similarly, we also incorrectly tried to validate CHECK constraints before
the heap had been rewritten.
The fix for both is to delay constraint validation until phase 3, after
the table has been rewritten. For CHECK constraints this means a slight
behavioral change. Previously ALTER TABLE VALIDATE CONSTRAINT on
inheritance tables would be validated from the bottom up. This was
different from the order of evaluation when a new CHECK constraint was
added. The changes made here aligns the VALIDATE CONSTRAINT evaluation
order for inheritance tables to be the same as ADD CONSTRAINT, which is
generally top-down.
Reported-by: Nazli Ugur Koyluoglu, using SQLancer
Discussion: https://postgr.es/m/CAApHDvp%3DZXv8wiRyk_0rWr00skhGkt8vXDrHJYXRMft3TjkxCA%40mail.gmail.com
Backpatch-through: 9.5 (all supported versions)
The qual pushdown logic assumed that all Vars in a restriction clause
must be Vars referencing subquery outputs; but since we introduced
LATERAL, it's possible for such a Var to be a lateral reference instead.
This led to an assertion failure in debug builds. In a non-debug
build, there might be no ill effects (if qual_is_pushdown_safe decided
the qual was unsafe anyway), or we could get failures later due to
construction of an invalid plan. I've not gone to much length to
characterize the possible failures, but at least segfaults in the
executor have been observed.
Given that this has been busted since 9.3 and it took this long for
anybody to notice, I judge that the case isn't worth going to great
lengths to optimize. Hence, fix by just teaching qual_is_pushdown_safe
that such quals are unsafe to push down, matching the previous behavior
when it accidentally didn't fail.
Per report from Tom Ellis. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20200713175124.GQ8220@cloudinit-builder
Parallel pg_restore has always supposed that ACL items for different
objects are independent and can be restored in parallel without
conflicts. However, there is one case where this fails: because
REVOKE on a table is defined to also revoke the privilege(s) at
column level, we can't restore per-column ACLs till after we restore
any table-level privileges on their table. Failure to honor this
restriction can lead to "tuple concurrently updated" errors during
parallel restore, or even to the per-column ACLs silently disappearing
because the table-level REVOKE is executed afterwards.
To fix, add a dependency from each column-level ACL item to its table's
ACL item, if there is one. Note that this doesn't fix the hazard
for pre-existing archive files, only for ones made with a corrected
pg_dump. Given that the bug's been there quite awhile without
field reports, I think this is acceptable.
This requires changing the API of pg_dump's dumpACL() function.
To keep its argument list from getting even longer, I removed the
"CatalogId objCatId" argument, which has been unused for ages.
Per report from Justin Pryzby. Back-patch to all supported branches.
Discussion: https://postgr.es/m/20200706050129.GW4107@telsasoft.com
Re-point comp.ai.genetic FAQ link to a more stable address.
Remove stale links to AIX documentation; we don't really need to
tell AIX users how to use their systems.
Remove stale links to HP documentation about SSL. We've had to
update those twice before, making it increasingly obvious that
HP does not intend them to be stable landing points. They're
not particularly authoritative, either. (This change effectively
reverts bbd3bdba3.)
Daniel Gustafsson and Álvaro Herrera, per a gripe from
Kyotaro Horiguchi. Back-patch, since these links are
just as dead in the back branches.
Discussion: https://postgr.es/m/20200709.161226.204639179120026914.horikyota.ntt@gmail.com
After running GetForeignRelSize for a foreign table, adjust rel->tuples
to be at least as large as rel->rows. This prevents bizarre behavior
in estimate_num_groups() and perhaps other places, especially in the
scenario where rel->tuples is zero because pg_class.reltuples is
(suggesting that ANALYZE has never been run for the table). As things
stood, we'd end up estimating one group out of any GROUP BY on such a
table, whereas the default group-count estimate is more likely to result
in a sane plan.
Also, clarify in the documentation that GetForeignRelSize has the option
to override the rel->tuples value if it has a better idea of what to use
than what is in pg_class.reltuples.
Per report from Jeff Janes. Back-patch to all supported branches.
Patch by me; thanks to Etsuro Fujita for review
Discussion: https://postgr.es/m/CAMkU=1xNo9cnan+Npxgz0eK7394xmjmKg-QEm8wYG9P5-CcaqQ@mail.gmail.com
Commit ecd9e9f0b fixed the problem in the wrong place, causing unwanted
side-effects on the behavior of GetNextTempTableSpace(). Instead,
let's make SharedFileSetInit() responsible for subbing in the value
of MyDatabaseTableSpace when the default tablespace is called for.
The convention about what is in the tempTableSpaces[] array is
evidently insufficiently documented, so try to improve that.
It also looks like SharedFileSetInit() is doing the wrong thing in the
case where temp_tablespaces is empty. It was hard-wiring use of the
pg_default tablespace, but it seems like using MyDatabaseTableSpace
is more consistent with what happens for other temp files.
Back-patch the reversion of PrepareTempTablespaces()'s behavior to
9.5, as ecd9e9f0b was. The changes in SharedFileSetInit() go back
to v11 where that was introduced. (Note there is net zero code change
before v11 from these two patch sets, so nothing to release-note.)
Magnus Hagander and Tom Lane
Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com
A likely copy/paste error in 98e8b48053 from back in 2004 would
cause temp tablespace to be reset to InvalidOid if temp_tablespaces
was set to the same value as the primary tablespace in the database.
This would cause shared filesets (such as for parallel hash joins)
to ignore them, putting the temporary files in the default tablespace
instead of the configured one. The bug is in the old code, but it
appears to have been exposed only once we had shared filesets.
Reviewed-By: Daniel Gustafsson
Discussion: https://postgr.es/m/CABUevExg5YEsOvqMxrjoNvb3ApVyH+9jggWGKwTDFyFCVWczGQ@mail.gmail.com
Backpatch-through: 9.5
Warnings start 10M transactions before xidStopLimit, which is 11M
transactions before wraparound. The sample WARNING output showed a
value greater than 11M, and its HINT message predated commit
25ec228ef7. Hence, the sample was
impossible. Back-patch to 9.5 (all supported versions).
I forgot that INT64_FORMAT can't be used with sscanf on Windows.
Use the same trick of sscanf'ing into a temp variable as we do in
some other places in zic.c.
The upstream IANA code avoids the portability problem by relying on
<inttypes.h>'s SCNdFAST64 macro. Once we're requiring C99 in all
branches, we should do likewise and drop this set of diffs from
upstream. For now, though, a hack seems fine, since we do not
actually care about leapseconds anyway.
Discussion: https://postgr.es/m/4e5d1a5b-143e-e70e-a99d-a3b01c1ae7c3@2ndquadrant.com
explain_get_index_name() applied quote_identifier() to the index name.
This is fine for text output, but the non-text output formats all have
their own quoting conventions and would much rather start from the
actual index name. For example in JSON you'd get something like
"Index Name": "\"My Index\"",
which is surely not desirable, especially when the same does not
happen for table names. Hence, move the responsibility for applying
quoting out to the callers, where it can go into already-existing
special code paths for text format.
This changes the API spec for users of explain_get_index_name_hook:
before, they were supposed to apply quote_identifier() if necessary,
now they should not. Research suggests that the only publicly
available user of the hook is hypopg, and it actually forgot to
apply quoting anyway, so it's fine. (In any case, there's no
behavioral change for the output of a hook as seen in non-text
EXPLAIN formats, so this won't break any case that programs should
be relying on.)
Digging in the commit logs, it appears that quoting was included in
explain_get_index_name's duties when commit 604ffd280 invented it;
and that was fine at the time because we only had text output format.
This should have been rethought when non-text formats were invented,
but it wasn't.
This is a fairly clear bug for users of non-text EXPLAIN formats,
so back-patch to all supported branches.
Per bug #16502 from Maciek Sakrejda. Patch by me (based on
investigation by Euler Taveira); thanks to Julien Rouhaud for review.
Discussion: https://postgr.es/m/16502-57bd1c9f913ed1d1@postgresql.org
A few places calling fwrite and gzwrite were not setting errno to ENOSPC
when reporting errors, as is customary; this led to some failures being
reported as
"could not write file: Success"
which makes us look silly. Make a few of these places in pg_dump and
pg_basebackup use our customary pattern.
Backpatch-to: 9.5
Author: Justin Pryzby <pryzby@telsasoft.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Discussion: https://postgr.es/m/20200611153753.GU14879@telsasoft.com
The IANA time zone folk have deprecated use of a "posixrules" file in
the tz database. While for now it's our choice whether to keep
supplying one in our own builds, installations built with
--with-system-tzdata will soon be needing to cope with that file not
being present, at least on some platforms.
This causes a problem for the horology test, which expected the
nonstandard POSIX zone spec "CST7CDT" to apply pre-2007 US daylight
savings rules. That does happen if the posixrules file supplies such
information, but otherwise the test produces undesired results.
To fix, add an explicit transition date rule that matches 2005 practice.
(We could alternatively have switched the test to use some real time
zone, but it seems useful to have coverage of this type of zone spec.)
While at it, update a documentation example that also relied on
"CST7CDT"; use a real-world zone name instead. Also, document why
the zone names EST5EDT, CST6CDT, MST7MDT, PST8PDT aren't subject to
similar failures when "posixrules" is missing.
Back-patch to all supported branches, since the hazard is the same
for all.
Discussion: https://postgr.es/m/1665379.1592581287@sss.pgh.pa.us
This was a danger only for --disable-spinlocks in combination with
atomic operations unsupported by the current platform.
While atomics.c was careful to signal that a separate semaphore ought
to be used when spinlock emulation is active, spin.c didn't actually
implement that mechanism. That's my (Andres') fault, it seems to have
gotten lost during the development of the atomic operations support.
Fix that issue and add test for nesting atomic operations inside a
spinlock.
Author: Andres Freund
Discussion: https://postgr.es/m/20200605023302.g6v3ydozy5txifji@alap3.anarazel.de
Backpatch: 9.5-
As s_lock_test, the already existing test for spinlocks, isn't run in
an automated fashion (and doesn't test a normal backend environment),
adding tests that are run as part of a normal regression run is a good
idea. Particularly in light of several recent and upcoming spinlock
related fixes.
Currently the new tests are run as part of the pre-existing
test_atomic_ops() test. That perhaps can be quibbled about, but for
now seems ok.
The only operations that s_lock_test tests but the new tests don't are
the detection of a stuck spinlock and S_LOCK_FREE (which is otherwise
unused, not implemented on all platforms, and will be removed).
This currently contains a test for more than INT_MAX spinlocks (only
run with --disable-spinlocks), to ensure the recent commit fixing a
bug with more than INT_MAX spinlock initializations is correct. That
test is somewhat slow, so we might want to disable it after a few
days.
It might be worth retiring s_lock_test after this. The added coverage
of a stuck spinlock probably isn't worth the added complexity?
Author: Andres Freund
Discussion: https://postgr.es/m/20200606023103.avzrctgv7476xj7i@alap3.anarazel.de
We'd glossed over most of this complexity for years, but it's hard
to avoid writing it all down now, so that we can explain what happens
when there's no "posixrules" file in the IANA time zone database.
That was at best a tiny minority situation till now, but it's likely
to become quite common in the future, so we'd better explain it.
Nonetheless, we don't really encourage people to use POSIX zone specs;
picking a named zone is almost always what you really want, unless
perhaps you're stuck with an out-of-date zone database. Therefore,
let's shove all this detail into an appendix.
Patch by me; thanks to Robert Haas for help with some awkward wording.
Discussion: https://postgr.es/m/1390.1562258309@sss.pgh.pa.us
This absorbs a leap-second-related bug fix in localtime.c, and
teaches zic to handle an expiration marker in the leapseconds file.
Neither are of any interest to us (for the foreseeable future
anyway), but we need to stay more or less in sync with upstream.
Also adjust some over-eager changes in the README from commit 957338418.
I have no intention of making changes that require C99 in this code,
until such time as all the live back branches require C99. Otherwise
back-patching will get too exciting.
For the same reason, absorb assorted whitespace and other cosmetic
changes from HEAD into the back branches; mostly this reflects use of
improved versions of pgindent.
All in all then, quite a boring update. But I figured I'd get it
done while I was looking at this code.
Convert buffile.c error handling to use ereport. This fixes cases where
I/O errors were indistinguishable from EOF or not reported. Also remove
"%m" from error messages where errno would be bogus. While we're
modifying those strings, add block numbers and short read byte counts
where appropriate.
Back-patch to all supported releases.
Reported-by: Amit Khandekar <amitdkhan.pg@gmail.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reviewed-by: Ibrar Ahmed <ibrar.ahmad@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKGJE04G%3D8TLK0DLypT_27D9dR8F1RQgNp0jK6qR0tZGWOw%40mail.gmail.com
Non-zero vacuum_defer_cleanup_age values cause pg_upgrade freezing of
the system catalogs to be incomplete, or do nothing. This will cause
the upgrade to fail in confusing ways.
Reported-by: Laurenz Albe
Discussion: https://postgr.es/m/7d6f6c22ba05ce0c526e9e8b7bfa8105e7da45e6.camel@cybertec.at
Backpatch-through: 9.5
Our documentation failed to point out that REPEATABLE READ is really
snapshot isolation, which might be important to some users. Point to
the standard reference paper for this complicated topic.
Likewise, add a reference to the VLDB paper about PostgreSQL SSI, for
technical information about our SSI implementation and how it compares
to S2PL.
While here, add a note about catalog access using a lower isolation
level, per recent user complaint.
Back-patch to all releases.
Reported-by: Kyle Kingsbury <aphyr@jepsen.io>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Tatsuo Ishii <ishii@sraoss.co.jp>
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443%40jepsen.io
Discussion: https://postgr.es/m/16454-9408996bb1750faf%40postgresql.org
SSI's HeapCheckForSerializableConflictOut() test failed to correctly
handle conditions involving a concurrently inserted tuple which is later
concurrently updated by a separate transaction . A SELECT statement
that called HeapCheckForSerializableConflictOut() could end up using the
same XID (updater's XID) for both the original tuple, and the successor
tuple, missing the XID of the xact that created the original tuple
entirely. This only happened when neither tuple from the chain was
visible to the transaction's MVCC snapshot.
The observable symptoms of this bug were subtle. A pair of transactions
could commit, with the later transaction failing to observe the effects
of the earlier transaction (because of the confusion created by the
update to the non-visible row). This bug dates all the way back to
commit dafaa3ef, which added SSI.
To fix, make sure that we check the xmin of concurrently inserted tuples
that happen to also have been updated concurrently.
Author: Peter Geoghegan
Reported-By: Kyle Kingsbury
Reviewed-By: Thomas Munro
Discussion: https://postgr.es/m/db7b729d-0226-d162-a126-8a8ab2dc4443@jepsen.io
Backpatch: All supported versions
Previously we used pg_atomic_write_64_impl inside
pg_atomic_init_u64. That works correctly, but on platforms without
64bit single copy atomicity it could trigger spurious valgrind errors
about uninitialized memory, because we use compare_and_swap for atomic
writes on such platforms.
I previously suppressed one instance of this problem (6c878edc1d),
but as Tom reports that wasn't enough. As the atomic variable cannot
yet be concurrently accessible during initialization, it seems better
to have pg_atomic_init_64_impl set the value directly.
Change pg_atomic_init_u32_impl for symmetry.
Reported-By: Tom Lane
Author: Andres Freund
Discussion: https://postgr.es/m/1714601.1591503815@sss.pgh.pa.us
Backpatch: 9.5-
The redo routines for XLOG_CHECKPOINT_{ONLINE,SHUTDOWN} must acquire
ControlFileLock before modifying ControlFile->checkPointCopy, or the
checkpointer could write out a control file with a bad checksum.
Likewise, XLogReportParameters() must acquire ControlFileLock before
modifying ControlFile and calling UpdateControlFile().
Back-patch to all supported releases.
Author: Nathan Bossart <bossartn@amazon.com>
Author: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/70BF24D6-DC51-443F-B55A-95735803842A%40amazon.com
Commit 7be5d8df1f surfaced the logic
error, which had no functional implications, by adding "use warnings".
The buildfarm always customizes PROVE_FLAGS, so the warning did not
appear there. Back-patch to 9.5 (all supported versions).
This issue has been present since the introduction of this code as of
a3519a2 from 2002, and has been found by buildfarm member prion that
uses RELCACHE_FORCE_RELEASE via the tests introduced recently in
e786be5.
Discussion: https://postgr.es/m/20200601022055.GB4121@paquier.xyz
Backpatch-through: 9.5
The repeat() function loops for potentially a long time without
ever checking for interrupts. This prevents, for example, a query
cancel from interrupting until the work is all done. Fix by
inserting a CHECK_FOR_INTERRUPTS() into the loop.
Backpatch to all supported versions.
Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com
The target failed, tested $PATH binaries, or tested a stale temporary
installation. Commit c66b438db6 missed
this. Back-patch to 9.5 (all supported versions).
When installing binaries and libraries using the MSVC installation
routines, the operation gets done after moving to the root folder, whose
location is detected by checking if "configure" exists two times in a
row. So, calling the installation script from src/tools/msvc/ with an
extra "configure" file four levels up the root path of the code tree
causes the execution to go further up, leading to a failure in finding
the builds. This commit fixes the issue by moving to the root folder of
the code tree only once, when necessary.
Author: Arnold Müller
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/16343-f638f67e7e52b86c@postgresql.org
Backpatch-through: 9.5
The description missed a comma and lacked an explanation of what happens
with REPLICA IDENTITY USING INDEX when the dependent index is dropped.
Author: Marina Polyakova
Reviewed-by: Daniel Gustafsson, Michael Paquier
Discussion: https://postgr.es/m/ad1a0badc32658b1bbb07aa312346a1d@postgrespro.ru
Backpatch-through: 9.5
The previous coding zeroed out offsetof(ReplicationStateCtl, states)
more bytes than it was entitled to, as a consequence of starting the
zeroing from the wrong pointer (or, if you prefer, using the wrong
calculation of how much to zero).
It's unsurprising that this has not caused any reported problems,
since it can be expected that the newly-allocated block is at the end
of what we've used in shared memory, and we always make the shmem
block substantially bigger than minimally necessary. Nonetheless,
this is wrong and it could bite us someday; plus it's a dangerous
model for somebody to copy.
This dates back to the introduction of this code (commit 5aa235042),
so back-patch to all supported branches.
_bt_killitems marks btree items dead when a scan leaves the page where
they live, but it does so with only share lock (to improve concurrency).
This was historicall okay, since killing a dead item has no
consequences. However, with the advent of data checksums and
wal_log_hints, this action incurs a WAL full-page-image record of the
page. Multiple concurrent processes would write the same page several
times, leading to WAL bloat. The probability of this happening can be
reduced by only killing items if they're not already dead, so change the
code to do that.
The problem could eliminated completely by having _bt_killitems upgrade
to exclusive lock upon seeing a killable item, but that would reduce
concurrency so it's considered a cure worse than the disease.
Backpatch all the way back to 9.5, since wal_log_hints was introduced in
9.4.
Author: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>
Discussion: https://postgr.es/m/CA+fd4k6PeRj2CkzapWNrERkja5G0-6D-YQiKfbukJV+qZGFZ_Q@mail.gmail.com