Depending on the system used, t/*.pl may not be expanded into a list of
tests which can be consumed by prove when attempting to run TAP tests on
a given path. Fix that by using glob() directly in the script, to make
sure that a complete list of tests is provided. This has not proved to
be an issue with MSVC as the list was properly expanded, but it is on
Linux with perl's system().
This is extracted from a larger patch.
Author: Tom Lane
Discussion: https://postgr.es/m/6628.1567958876@sss.pgh.pa.us
Backpatch-through: 9.4
For its entire existence, isolationtester.c has forced client_min_messages
to WARNING, but that seems like a very poor choice of test design. It
should be up to individual test scripts to manage whether they emit notices
and to ensure that the results are stable. (There were no NOTICE messages
in the original set of isolation tests, so this was certainly dead code
when committed, but perhaps it was needed at some earlier point.)
It's possible that the original motivation was due to platform-dependent
variations in the timing of stdout vs. stderr output. That should be
moot since commits 73bcb76b7/6eda3e9c2, but just in case, adjust
isotesterNoticeProcessor to print to stdout not stderr. (stderr seems
like the wrong thing anyway: it should be for error printouts not expected
test output.)
Back-patch of commit ebd499282 into v12. I'll separately push this
into older branches, but this is as much change as v12 needs.
Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us
Discussion: https://postgr.es/m/E1i7IqC-0000Uc-5H@gemulon.postgresql.org
If a test sends a notice just before blocking, it's possible on
slow machines for isolationtester to detect the blocked state before
it's consumed the notice. (For this to happen, the notice would have
to arrive after isolationtester has waited for data for 10ms, so on
fast/lightly-loaded machines it's hard to reproduce the failure.)
But, if we have seen the backend as blocked, it's certainly already
sent any notices it's going to send. Therefore, one more round of
PQconsumeInput and PQisBusy should be enough to collect and process
any such notices.
Back-patch of 30717637c into v12. We're still discussing whether
to back-patch this further and/or back-patch some other recent
isolationtester fixes, but this much is provably necessary to
make the test cases added by 27cc7cd2b stable in v12.
Discussion: https://postgr.es/m/14616.1564251339@sss.pgh.pa.us
Discussion: https://postgr.es/m/E1i7IqC-0000Uc-5H@gemulon.postgresql.org
Don't just assume that the next port is free; it might not be, or
if we're really unlucky it might even be out of the TCP range.
Do it honestly with two get_free_port() calls instead.
This is surely a pretty low-probability problem, but I think it
explains a buildfarm failure seen today, so let's fix it.
Back-patch to v11 where this script was added.
Discussion: https://postgr.es/m/25124.1568052346@sss.pgh.pa.us
Modern versions of msys2 have changed the treatment of "cmd /c" so that
the runtime will try to convert the switch to a native file path. This
patch adds a setting to inhibit that behaviour.
Discussion: https://postgr.es/m/3227042f-cfcc-745a-57dd-fb8c471f8ddf@2ndQuadrant.com
Backpatch to all live branches.
In ad0bda5d24ea I changed the EvalPlanQual machinery to store
substitution tuples in slot, instead of using plain HeapTuples. The
main motivation for that was that using HeapTuples will be inefficient
for future tableams. But it turns out that that conversion was buggy
for non-locking rowmarks - the wrong tuple descriptor was used to
create the slot.
As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ
earlier, to allow to fetch the locked rows directly into the EPQ
slots, instead of having to copy tuples around. Unfortunately, as Tom
complained, that forces some expensive initialization to happen
earlier.
As a third issue, the test coverage for EPQ was clearly insufficient.
Fixing the first issue is unfortunately not trivial: Non-locked row
marks were fetched at the start of EPQ, and we don't have the type
information for the rowmarks available at that point. While we could
change that, it's not easy. It might be worthwhile to change that at
some point, but to fix this bug, it seems better to delay fetching
non-locking rowmarks when they're actually needed, rather than
eagerly. They're referenced at most once, and in cases where EPQ
fails, might never be referenced. Fetching them when needed also
increases locality a bit.
To be able to fetch rowmarks during execution, rather than
initialization, we need to be able to access the active EPQState, as
that contains necessary data. To do so move EPQ related data from
EState to EPQState, and, only for EStates creates as part of EPQ,
reference the associated EPQState from EState.
To fix the second issue, change EPQ initialization to allow use of
EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but
obviously still requiring EvalPlanQualInit() to have been done).
As these changes made struct EState harder to understand, e.g. by
adding multiple EStates, significantly reorder the members, and add a
lot more comments.
Also add a few more EPQ tests, including one that fails for the first
issue above. More is needed.
Reported-By: yi huang
Author: Andres Freund
Reviewed-By: Tom Lane
Discussion:
https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.comhttps://postgr.es/m/24530.1562686693@sss.pgh.pa.us
Backpatch: 12-, where the EPQ changes were introduced
f2e40380 introduces support of non-key attributes in GiST indexes. Then if
get_index_column_opclass() is asked by gistproperty() to get an opclass of
non-key column, it returns garbage past oidvector value. This commit fixes
that by making get_index_column_opclass() return InvalidOid in this case.
Discussion: https://postgr.es/m/20190902231948.GA5343%40alvherre.pgsql
Author: Nikita Glukhov, Alexander Korotkov
Backpatch-through: 12
Some of these are quite old, but that doesn't make them not bugs.
We'd rather report a failure via elog than SIGSEGV.
While at it, uniformly spell the error check as !RelationIsValid(rel)
rather than a bare rel == NULL test. The machine code is the same
but it seems better to be consistent.
Coverity complained about this today, not sure why, because the
mistake is in fact old.
In order to implement NULL LAST semantic GiST previously assumed distance to
the NULL value to be Inf. However, our distance functions can return Inf and
NaN for non-null values. In such cases, NULL LAST semantic appears to be
broken. This commit fixes that by introducing separate array of null flags for
distances.
Backpatch to all supported versions.
Discussion: https://postgr.es/m/CAPpHfdsNvNdA0DBS%2BwMpFrgwT6C3-q50sFVGLSiuWnV3FqOJuQ%40mail.gmail.com
Author: Alexander Korotkov
Backpatch-through: 9.4
Previously plain float comparison was used in GiST pairing heap. Such
comparison doesn't provide proper ordering for value sets containing Inf and Nan
values. This commit fixes that by usage of float8_cmp_internal(). Note, there
is remaining problem with NULL distances, which are represented as Inf in
pairing heap. It would be fixes in subsequent commit.
Backpatch to all supported versions.
Reported-by: Andrey Borodin
Discussion: https://postgr.es/m/CAPpHfdsNvNdA0DBS%2BwMpFrgwT6C3-q50sFVGLSiuWnV3FqOJuQ%40mail.gmail.com
Author: Alexander Korotkov
Reviewed-by: Heikki Linnakangas
Backpatch-through: 9.4
When using COMMIT AND CHAIN or ROLLBACK AND CHAIN not in an explicit
transaction block, the previous implementation would leave a
transaction block active in the ROLLBACK case but not the COMMIT case.
To fix for now, error out when using these commands not in an explicit
transaction block. This restriction could be lifted if a sensible
definition and implementation is found.
Bug: #15977
Author: fn ln <emuser20140816@gmail.com>
Reviewed-by: Fabien COELHO <coelho@cri.ensmp.fr>
Commit 6f6b99d13 stuck an INFO message into the fast path for
checking partition constraints, for no very good reason except
that it made it easy for the regression tests to verify that
that path was taken. Assorted later patches did likewise,
increasing the unsuppressable-chatter level from ALTER TABLE
even more. This isn't good for the user experience, so let's
drop these messages down to DEBUG1 where they belong. So as
not to have a loss of test coverage, create a TAP test that
runs the relevant queries with client_min_messages = DEBUG1
and greps for the expected messages.
This testing method is a bit brute-force --- in particular,
it duplicates the execution of a fair amount of the core
create_table and alter_table tests. We experimented with
other solutions, but running any significant amount of
standard testing with client_min_messages = DEBUG1 seems
to have a lot of output-stability pitfalls, cf commits
bbb96c370 and 5655565c0. Possibly at some point we'll look
into whether we can reduce the amount of test duplication.
Backpatch into v12, because some of these messages are new
in v12 and we don't really want to ship it that way.
Sergei Kornilov
Discussion: https://postgr.es/m/81911511895540@web58j.yandex.ru
Discussion: https://postgr.es/m/4859321552643736@myt5-02b80404fd9e.qloud-c.yandex.net
Copy-edit or rewrite some items, add some items that I thought
deserved documenting, remove some others that didn't (notably,
I'm not on board with documenting documentation changes), fix
some poorly-pointed links, move some items to other sections,
etc etc.
I still want to review the rest of the notes, but this seems like the
minimum work required to prepare for beta4.
Major-enhancements text, and a couple of other fixes, from Jonathan Katz
(with minor copy-editing by me).
The old code didn't differentiate between a read error and a
concurrent truncation. fread reports both of these by returning 0;
you have to use feof() or ferror() to distinguish between them,
which this code did not do.
It might be a better idea to use read() rather than fread() here,
so that we can display a less-generic error message, but I'm not
sure that would qualify as a back-patchable bug fix, so just do
this much for now.
Jeevan Chalke, reviewed by Jeevan Ladhe and by me.
Discussion: http://postgr.es/m/CA+TgmobG4ywMzL5oQq2a8YKp8x2p3p1LOMMcGqpS7aekT9+ETA@mail.gmail.com
Previously even if postmaster died and WaitLatch() woke up with that event
while pg_promote() was waiting for the standby promotion to finish,
pg_promote() did nothing special and kept waiting until timeout occurred.
This could cause a busy loop.
This patch make pg_promote() return false immediately when postmaster
dies, to avoid such a busy loop.
Back-patch to v12 where pg_promote() was added.
Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAHGQGwEs9ROgSp+QF+YdDU+xP8W=CY1k-_Ov-d_Z3JY+to3eXA@mail.gmail.com
The logic ending progress reporting for a backend entry introduced by
b6fb647 causes callers of pgstat_progress_end_command() to do some extra
work when track_activities is enabled as the process fields are reset in
the backend entry even if no command were started for reporting.
This resets the fields only if a command is registered for progress
reporting, and only if track_activities is enabled.
Author: Masahiho Sawada
Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com
Backpatch-through: 9.6
Since the addition of fsync requests in bc34223 to make base backup data
consistent on disk once pg_basebackup finishes, each tablespace tar file
is individually flushed once completed, with an additional flush of the
parent directory when the base backup finishes. While holding a
connection to the server, a fsync request taking a long time may cause a
failure of the base backup, which is annoying for any integration. A
recent example of breakage can involve tcp_user_timeout, but
wal_sender_timeout can cause similar problems.
While reviewing the code, there was a second issue causing too many
fsync requests to be done for the same WAL data. As recursive fsyncs
are done at the end of the backup for both the plain and tar formats
from the base target directory where everything is written, it is fine
to disable fsyncs when fetching or streaming WAL.
Reported-by: Ryohei Takahashi
Author: Michael Paquier
Reviewed-by: Ryohei Takahashi
Discussion: https://postgr.es/m/OSBPR01MB4550DAE2F8C9502894A45AAB82BE0@OSBPR01MB4550.jpnprd01.prod.outlook.com
Backpatch-through: 10
Clarify in the help output and documentation that -n, -t etc. take a
"pattern" rather than a "schema" or "table" etc. This was especially
confusing now that the new pg_dumpall --exclude-database option was
documented with "pattern" and the others not, even though they all
behave the same.
Discussion: https://www.postgresql.org/message-id/flat/b85f3fa1-b350-38d1-1893-4f7911bd7310%402ndquadrant.com
The leak happens in str_tolower, str_toupper and str_initcap, which are
used in several places including their equivalent SQL-level functions,
and can only be triggered when using an ICU-provided collation when
converting the input string.
b615920 fixed a similar leak. Backpatch down 10 where ICU collations
have been introduced.
Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/94c0ad0a-cbc2-e4a3-7829-2bdeaf9146db@postgrespro.ru
Backpatch-through: 10
In what seems like a fit of misplaced optimization,
ExtractReplicaIdentity() accessed the relation's replica-identity
index without taking any lock on it. Usually, the surrounding query
already holds some lock so this is safe enough ... but in the case
of a previously-planned delete, there might be no existing lock.
Given a suitable test case, this is exposed in v12 and HEAD by an
assertion added by commit b04aeb0a0.
The whole thing's rather poorly thought out anyway; rather than
looking directly at the index, we should use the index-attributes
bitmap that's held by the parent table's relcache entry, as the
caller functions do. This is more consistent and likely a bit
faster, since it avoids a cache lookup. Hence, change to doing it
that way.
While at it, rather than blithely assuming that the identity
columns are non-null (with catastrophic results if that's wrong),
add assertion checks that they aren't null. Possibly those should
be actual test-and-elog, but I'll leave it like this for now.
In principle, this is a bug that's been there since this code was
introduced (in 9.4). In practice, the risk seems quite low, since
we do have a lock on the index's parent table, so concurrent
changes to the index's catalog entries seem unlikely. Given the
precedent that commit 9c703c169 wasn't back-patched, I won't risk
back-patching this further than v12.
Per report from Hadi Moshayedi.
Discussion: https://postgr.es/m/CAK=1=Wrek44Ese1V7LjKiQS-Nd-5LgLi_5_CskGbpggKEf3tKQ@mail.gmail.com
After an unexpected connection loss and successful reconnection,
psql neglected to resynchronize its internal state about the server,
such as server version. Ordinarily we'd be reconnecting to the same
server and so this isn't really necessary, but there are scenarios
where we do need to update --- one example is where we have a list
of possible connection targets and they're not all alike.
Define "resynchronize" as including connection_warnings(), so that
this case acts the same as \connect. This seems useful; for example,
if the server version did change, the user might wish to know that.
An attuned user might also notice that the new connection isn't
SSL-encrypted, for example, though this approach isn't especially
in-your-face about such changes. Although this part is a behavioral
change, it only affects interactive sessions, so it should not break
any applications.
Also, in do_connect, make sure that we desynchronize correctly when
abandoning an old connection in non-interactive mode.
These problems evidently are the result of people patching only one
of the two places where psql deals with connection changes, so insert
some cross-referencing comments in hopes of forestalling future bugs
of the same ilk.
Lastly, in Windows builds, issue codepage mismatch warnings only at
startup, not during reconnections. psql's codepage can't change
during a reconnect, so complaining about it again seems like useless
noise.
Peter Billen and Tom Lane. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAMTXbE8e6U=EBQfNSe01Ej17CBStGiudMAGSOPaw-ALxM-5jXg@mail.gmail.com
Section 16.2 pointed to platform-specific FAQ files that we removed
way back in 8.4. Section 16.7 contained a bunch of information about
AIX and HPUX bugs that were squashed decades ago, plus discussions of
old compiler versions that are certainly moot now that we require C99
support. Since we're obviously not maintaining this stuff carefully,
just remove it. The HPUX sub-section seems like it can go away
entirely, since everything it said that was still applicable was
redundant with material elsewhere in the chapter.
In passing, I couldn't resist the temptation to do a small amount
of copy-editing on nearby text.
Back-patch to v12, since this stuff is surely obsolete in any
branch that requires C99.
Discussion: https://postgr.es/m/15538.1567042743@sss.pgh.pa.us
The comment did not match what the code actually did for integers with
the 43rd bit set. You get an integer like that, if you have a posting
list with two adjacent TIDs that are more than 2^31 blocks apart.
According to the comment, we would store that in 6 bytes, with no
continuation bit on the 6th byte, but in reality, the code encodes it
using 7 bytes, with a continuation bit on the 6th byte as normal.
The decoding routine also handled these 7-byte integers correctly, except
for an overflow check that assumed that one integer needs at most 6 bytes.
Fix the overflow check, and fix the comment to match what the code
actually does. Also fix the comment that claimed that there are 17 unused
bits in the 64-bit representation of an item pointer. In reality, there
are 64-32-11=21.
Fitting any item pointer into max 6 bytes was an important property when
this was written, because in the old pre-9.4 format, item pointers were
stored as plain arrays, with 6 bytes for every item pointer. The maximum
of 6 bytes per integer in the new format guaranteed that we could convert
any page from the old format to the new format after upgrade, so that the
new format was never larger than the old format. But we hardly need to
worry about that anymore, and running into that problem during upgrade,
where an item pointer is expanded from 6 to 7 bytes such that the data
doesn't fit on a page anymore, is implausible in practice anyway.
Backpatch to all supported versions.
This also includes a little test module to test these large distances
between item pointers, without requiring a 16 TB table. It is not
backpatched, I'm including it more for the benefit of future development
of new posting list formats.
Discussion: https://www.postgresql.org/message-id/33bfc20a-5c86-f50c-f5a5-58e9925d05ff%40iki.fi
Reviewed-by: Masahiko Sawada, Alexander Korotkov
RelationAllowsEarlyPruning() performed a catalog scan, but is used
in two contexts where that was a bad idea:
1. In heap_page_prune_opt(), which runs very frequently in some large
scans. This caused major performance problems in a field report
that was easy to reproduce.
2. In TestForOldSnapshot(), which runs while we hold a buffer content
lock. It's not clear if this was guaranteed to be free of buffer
deadlock risk.
The check was introduced in commit 2cc41acd8 and defended against a
real problem: 9.6's hash indexes have no page LSN and so we can't
allow early pruning (ie the snapshot-too-old feature). We can remove
the check from all later releases though: hash indexes are now logged,
and there is no way to create UNLOGGED indexes on regular logged
tables.
If a future release allows such a combination, it might need to put
a similar check in place, but it'll need some more thought.
Back-patch to 10.
Author: Thomas Munro
Reviewed-by: Tom Lane, who spotted the second problem
Discussion: https://postgr.es/m/CA%2BhUKGKT8oTkp5jw_U4p0S-7UG9zsvtw_M47Y285bER6a2gD%2Bg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1%2BWy%2BN4eE5zPm765h68LrkWc3Biu_8rzzi%2BOYX4j%2BiHRw%40mail.gmail.com
In this case, the transfer uses a libpq connection, which is subject to
the timeout parameters set at system level, and this can make the rewind
operation suddenly canceled which is not good for automation. One
workaround to such issues would be to use PGOPTIONS to enforce the
wanted timeout parameters, but that's annoying, and for example pg_dump,
which can run potentially long-running queries disables all types of
timeouts.
lock_timeout and statement_timeout are the ones which can cause problems
now. Note that pg_rewind does not use transactions, so disabling
idle_in_transaction_session_timeout is optional, but it feels safer to
do so for the future.
This is back-patched down to 9.5. idle_in_transaction_session_timeout
is only present since 9.6.
Author: Alexander Kukushkin
Discussion: https://postgr.es/m/CAFh8B=krcVXksxiwVQh1SoY+ziJ-JC=6FcuoBL3yce_40Es5_g@mail.gmail.com
Backpatch-through: 9.5
Give it an explanatory para like the other default roles have.
Don't imply that it can send any signal whatever.
In passing, reorder the table entries and explanatory paras
for the default roles into some semblance of consistency.
Ian Barwick, tweaked a bit by me.
Discussion: https://postgr.es/m/89907e32-76f3-7282-a89c-ea19c722fe5d@2ndquadrant.com
Turns out that returning "unrecognized signal" is confusing.
Make it explicit that the platform lacks any support for signal names.
(At least of the machines in the buildfarm, only HPUX lacks it.)
Back-patch to v12 where we invented this function.
Discussion: https://postgr.es/m/3067.1566870481@sss.pgh.pa.us
Section 4.2.7 says that unless otherwise specified, built-in
aggregates ignore rows in which any input is null. This is
not true of the JSON aggregates, but it wasn't documented.
Fix that.
Of the other entries in table 9.55, some were explicit about
ignoring nulls, and some weren't; for consistency and
self-contained-ness, make them all say it explicitly.
Per bug #15884 from Tim Möhlmann. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/15884-c32d848f787fcae3@postgresql.org
An empty file name or subdirectory name leads join_path_components() to
just produce the parent directory name, which leads to weird failures or
recursive inclusions. Let's throw a specific error for that. It takes
only slightly more code to detect all-blank names, so do so.
Also, detect direct recursion, ie a file calling itself. As coded
this will also detect recursion via "include_dir '.'", which is
perhaps more likely than explicitly including the file itself.
Detecting indirect recursion would require API changes for guc-file.l
functions, which seems not worth it since extensions might call them.
The nesting depth limit will catch such cases eventually, just not
with such an on-point error message.
In passing, adjust the example usages in postgresql.conf.sample
to perhaps eliminate the problem at the source: there's no reason
for the examples to suggest that an empty value is valid.
Per a trouble report from Brent Bates. Back-patch to 9.5; the
issue is old, but the code in 9.4 is enough different that the
patch doesn't apply easily, and it doesn't seem worth the trouble
to fix there.
Ian Barwick and Tom Lane
Discussion: https://postgr.es/m/8c8bcbca-3bd9-dc6e-8986-04a5abdef142@2ndquadrant.com
FD_SETSIZE needs to be declared before winsock2.h, or it is possible to
run into buffer overflow issues when using --jobs. This is similar to
pgbench's solution done in a23c641.
This has been introduced by 71d84ef, and older versions have been using
the default value of FD_SETSIZE, defined at 64.
Per buildfarm member jacana, but this impacts all Windows animals
running the TAP tests. I have reproduced the failure locally to check
the patch.
Author: Michael Paquier
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/20190826054000.GE7005@paquier.xyz
Backpatch-through: 9.5
Previously 'uname -r' on Msys2 reported a kernele release starting with
2. The latest version starts with 3. In commit 1638623f we specifically
looked for one starting with 2. This is now changed to look for any
digit between 2 and 9.
backpatch to release 10.
On msys2, 'uname -s' reports a string starting MSYS instead on MINGW
as happens on msys1. Treat these both the same way. This reverts
608a710195a4b in favor of a more general solution.
Backpatch to all live branches.
When trying to use a high number of jobs, vacuumdb has only checked for
a maximum number of jobs used, causing confusing failures when running
out of file descriptors when the jobs open connections to Postgres.
This commit changes the error handling so as we do not check anymore for
a maximum number of allowed jobs when parsing the option value with
FD_SETSIZE, but check instead if a file descriptor is within the
supported range when opening the connections for the jobs so as this is
detected at the earliest time possible.
Also, improve the error message to give a hint about the number of jobs
recommended, using a wording given by the reviewers of the patch.
Reported-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Andres Freund, Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/20190818001858.ho3ev4z57fqhs7a5@alap3.anarazel.de
Backpatch-through: 9.5