I still want to review the rest of the notes, but this seems like the
minimum work required to prepare for beta4.
Major-enhancements text, and a couple of other fixes, from Jonathan Katz
(with minor copy-editing by me).
The old code didn't differentiate between a read error and a
concurrent truncation. fread reports both of these by returning 0;
you have to use feof() or ferror() to distinguish between them,
which this code did not do.
It might be a better idea to use read() rather than fread() here,
so that we can display a less-generic error message, but I'm not
sure that would qualify as a back-patchable bug fix, so just do
this much for now.
Jeevan Chalke, reviewed by Jeevan Ladhe and by me.
Discussion: http://postgr.es/m/CA+TgmobG4ywMzL5oQq2a8YKp8x2p3p1LOMMcGqpS7aekT9+ETA@mail.gmail.com
Previously even if postmaster died and WaitLatch() woke up with that event
while pg_promote() was waiting for the standby promotion to finish,
pg_promote() did nothing special and kept waiting until timeout occurred.
This could cause a busy loop.
This patch make pg_promote() return false immediately when postmaster
dies, to avoid such a busy loop.
Back-patch to v12 where pg_promote() was added.
Author: Fujii Masao
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAHGQGwEs9ROgSp+QF+YdDU+xP8W=CY1k-_Ov-d_Z3JY+to3eXA@mail.gmail.com
The logic ending progress reporting for a backend entry introduced by
b6fb647 causes callers of pgstat_progress_end_command() to do some extra
work when track_activities is enabled as the process fields are reset in
the backend entry even if no command were started for reporting.
This resets the fields only if a command is registered for progress
reporting, and only if track_activities is enabled.
Author: Masahiho Sawada
Discussion: https://postgr.es/m/CAD21AoCry_vJ0E-m5oxJXGL3pnos-xYGCzF95rK5Bbi3Uf-rpA@mail.gmail.com
Backpatch-through: 9.6
Since the addition of fsync requests in bc34223 to make base backup data
consistent on disk once pg_basebackup finishes, each tablespace tar file
is individually flushed once completed, with an additional flush of the
parent directory when the base backup finishes. While holding a
connection to the server, a fsync request taking a long time may cause a
failure of the base backup, which is annoying for any integration. A
recent example of breakage can involve tcp_user_timeout, but
wal_sender_timeout can cause similar problems.
While reviewing the code, there was a second issue causing too many
fsync requests to be done for the same WAL data. As recursive fsyncs
are done at the end of the backup for both the plain and tar formats
from the base target directory where everything is written, it is fine
to disable fsyncs when fetching or streaming WAL.
Reported-by: Ryohei Takahashi
Author: Michael Paquier
Reviewed-by: Ryohei Takahashi
Discussion: https://postgr.es/m/OSBPR01MB4550DAE2F8C9502894A45AAB82BE0@OSBPR01MB4550.jpnprd01.prod.outlook.com
Backpatch-through: 10
Clarify in the help output and documentation that -n, -t etc. take a
"pattern" rather than a "schema" or "table" etc. This was especially
confusing now that the new pg_dumpall --exclude-database option was
documented with "pattern" and the others not, even though they all
behave the same.
Discussion: https://www.postgresql.org/message-id/flat/b85f3fa1-b350-38d1-1893-4f7911bd7310%402ndquadrant.com
The leak happens in str_tolower, str_toupper and str_initcap, which are
used in several places including their equivalent SQL-level functions,
and can only be triggered when using an ICU-provided collation when
converting the input string.
b615920 fixed a similar leak. Backpatch down 10 where ICU collations
have been introduced.
Author: Konstantin Knizhnik
Discussion: https://postgr.es/m/94c0ad0a-cbc2-e4a3-7829-2bdeaf9146db@postgrespro.ru
Backpatch-through: 10
In what seems like a fit of misplaced optimization,
ExtractReplicaIdentity() accessed the relation's replica-identity
index without taking any lock on it. Usually, the surrounding query
already holds some lock so this is safe enough ... but in the case
of a previously-planned delete, there might be no existing lock.
Given a suitable test case, this is exposed in v12 and HEAD by an
assertion added by commit b04aeb0a0.
The whole thing's rather poorly thought out anyway; rather than
looking directly at the index, we should use the index-attributes
bitmap that's held by the parent table's relcache entry, as the
caller functions do. This is more consistent and likely a bit
faster, since it avoids a cache lookup. Hence, change to doing it
that way.
While at it, rather than blithely assuming that the identity
columns are non-null (with catastrophic results if that's wrong),
add assertion checks that they aren't null. Possibly those should
be actual test-and-elog, but I'll leave it like this for now.
In principle, this is a bug that's been there since this code was
introduced (in 9.4). In practice, the risk seems quite low, since
we do have a lock on the index's parent table, so concurrent
changes to the index's catalog entries seem unlikely. Given the
precedent that commit 9c703c169 wasn't back-patched, I won't risk
back-patching this further than v12.
Per report from Hadi Moshayedi.
Discussion: https://postgr.es/m/CAK=1=Wrek44Ese1V7LjKiQS-Nd-5LgLi_5_CskGbpggKEf3tKQ@mail.gmail.com
After an unexpected connection loss and successful reconnection,
psql neglected to resynchronize its internal state about the server,
such as server version. Ordinarily we'd be reconnecting to the same
server and so this isn't really necessary, but there are scenarios
where we do need to update --- one example is where we have a list
of possible connection targets and they're not all alike.
Define "resynchronize" as including connection_warnings(), so that
this case acts the same as \connect. This seems useful; for example,
if the server version did change, the user might wish to know that.
An attuned user might also notice that the new connection isn't
SSL-encrypted, for example, though this approach isn't especially
in-your-face about such changes. Although this part is a behavioral
change, it only affects interactive sessions, so it should not break
any applications.
Also, in do_connect, make sure that we desynchronize correctly when
abandoning an old connection in non-interactive mode.
These problems evidently are the result of people patching only one
of the two places where psql deals with connection changes, so insert
some cross-referencing comments in hopes of forestalling future bugs
of the same ilk.
Lastly, in Windows builds, issue codepage mismatch warnings only at
startup, not during reconnections. psql's codepage can't change
during a reconnect, so complaining about it again seems like useless
noise.
Peter Billen and Tom Lane. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAMTXbE8e6U=EBQfNSe01Ej17CBStGiudMAGSOPaw-ALxM-5jXg@mail.gmail.com
Section 16.2 pointed to platform-specific FAQ files that we removed
way back in 8.4. Section 16.7 contained a bunch of information about
AIX and HPUX bugs that were squashed decades ago, plus discussions of
old compiler versions that are certainly moot now that we require C99
support. Since we're obviously not maintaining this stuff carefully,
just remove it. The HPUX sub-section seems like it can go away
entirely, since everything it said that was still applicable was
redundant with material elsewhere in the chapter.
In passing, I couldn't resist the temptation to do a small amount
of copy-editing on nearby text.
Back-patch to v12, since this stuff is surely obsolete in any
branch that requires C99.
Discussion: https://postgr.es/m/15538.1567042743@sss.pgh.pa.us
The comment did not match what the code actually did for integers with
the 43rd bit set. You get an integer like that, if you have a posting
list with two adjacent TIDs that are more than 2^31 blocks apart.
According to the comment, we would store that in 6 bytes, with no
continuation bit on the 6th byte, but in reality, the code encodes it
using 7 bytes, with a continuation bit on the 6th byte as normal.
The decoding routine also handled these 7-byte integers correctly, except
for an overflow check that assumed that one integer needs at most 6 bytes.
Fix the overflow check, and fix the comment to match what the code
actually does. Also fix the comment that claimed that there are 17 unused
bits in the 64-bit representation of an item pointer. In reality, there
are 64-32-11=21.
Fitting any item pointer into max 6 bytes was an important property when
this was written, because in the old pre-9.4 format, item pointers were
stored as plain arrays, with 6 bytes for every item pointer. The maximum
of 6 bytes per integer in the new format guaranteed that we could convert
any page from the old format to the new format after upgrade, so that the
new format was never larger than the old format. But we hardly need to
worry about that anymore, and running into that problem during upgrade,
where an item pointer is expanded from 6 to 7 bytes such that the data
doesn't fit on a page anymore, is implausible in practice anyway.
Backpatch to all supported versions.
This also includes a little test module to test these large distances
between item pointers, without requiring a 16 TB table. It is not
backpatched, I'm including it more for the benefit of future development
of new posting list formats.
Discussion: https://www.postgresql.org/message-id/33bfc20a-5c86-f50c-f5a5-58e9925d05ff%40iki.fi
Reviewed-by: Masahiko Sawada, Alexander Korotkov
RelationAllowsEarlyPruning() performed a catalog scan, but is used
in two contexts where that was a bad idea:
1. In heap_page_prune_opt(), which runs very frequently in some large
scans. This caused major performance problems in a field report
that was easy to reproduce.
2. In TestForOldSnapshot(), which runs while we hold a buffer content
lock. It's not clear if this was guaranteed to be free of buffer
deadlock risk.
The check was introduced in commit 2cc41acd8 and defended against a
real problem: 9.6's hash indexes have no page LSN and so we can't
allow early pruning (ie the snapshot-too-old feature). We can remove
the check from all later releases though: hash indexes are now logged,
and there is no way to create UNLOGGED indexes on regular logged
tables.
If a future release allows such a combination, it might need to put
a similar check in place, but it'll need some more thought.
Back-patch to 10.
Author: Thomas Munro
Reviewed-by: Tom Lane, who spotted the second problem
Discussion: https://postgr.es/m/CA%2BhUKGKT8oTkp5jw_U4p0S-7UG9zsvtw_M47Y285bER6a2gD%2Bg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1%2BWy%2BN4eE5zPm765h68LrkWc3Biu_8rzzi%2BOYX4j%2BiHRw%40mail.gmail.com
In this case, the transfer uses a libpq connection, which is subject to
the timeout parameters set at system level, and this can make the rewind
operation suddenly canceled which is not good for automation. One
workaround to such issues would be to use PGOPTIONS to enforce the
wanted timeout parameters, but that's annoying, and for example pg_dump,
which can run potentially long-running queries disables all types of
timeouts.
lock_timeout and statement_timeout are the ones which can cause problems
now. Note that pg_rewind does not use transactions, so disabling
idle_in_transaction_session_timeout is optional, but it feels safer to
do so for the future.
This is back-patched down to 9.5. idle_in_transaction_session_timeout
is only present since 9.6.
Author: Alexander Kukushkin
Discussion: https://postgr.es/m/CAFh8B=krcVXksxiwVQh1SoY+ziJ-JC=6FcuoBL3yce_40Es5_g@mail.gmail.com
Backpatch-through: 9.5
Give it an explanatory para like the other default roles have.
Don't imply that it can send any signal whatever.
In passing, reorder the table entries and explanatory paras
for the default roles into some semblance of consistency.
Ian Barwick, tweaked a bit by me.
Discussion: https://postgr.es/m/89907e32-76f3-7282-a89c-ea19c722fe5d@2ndquadrant.com
Turns out that returning "unrecognized signal" is confusing.
Make it explicit that the platform lacks any support for signal names.
(At least of the machines in the buildfarm, only HPUX lacks it.)
Back-patch to v12 where we invented this function.
Discussion: https://postgr.es/m/3067.1566870481@sss.pgh.pa.us
Section 4.2.7 says that unless otherwise specified, built-in
aggregates ignore rows in which any input is null. This is
not true of the JSON aggregates, but it wasn't documented.
Fix that.
Of the other entries in table 9.55, some were explicit about
ignoring nulls, and some weren't; for consistency and
self-contained-ness, make them all say it explicitly.
Per bug #15884 from Tim Möhlmann. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/15884-c32d848f787fcae3@postgresql.org
An empty file name or subdirectory name leads join_path_components() to
just produce the parent directory name, which leads to weird failures or
recursive inclusions. Let's throw a specific error for that. It takes
only slightly more code to detect all-blank names, so do so.
Also, detect direct recursion, ie a file calling itself. As coded
this will also detect recursion via "include_dir '.'", which is
perhaps more likely than explicitly including the file itself.
Detecting indirect recursion would require API changes for guc-file.l
functions, which seems not worth it since extensions might call them.
The nesting depth limit will catch such cases eventually, just not
with such an on-point error message.
In passing, adjust the example usages in postgresql.conf.sample
to perhaps eliminate the problem at the source: there's no reason
for the examples to suggest that an empty value is valid.
Per a trouble report from Brent Bates. Back-patch to 9.5; the
issue is old, but the code in 9.4 is enough different that the
patch doesn't apply easily, and it doesn't seem worth the trouble
to fix there.
Ian Barwick and Tom Lane
Discussion: https://postgr.es/m/8c8bcbca-3bd9-dc6e-8986-04a5abdef142@2ndquadrant.com
FD_SETSIZE needs to be declared before winsock2.h, or it is possible to
run into buffer overflow issues when using --jobs. This is similar to
pgbench's solution done in a23c641.
This has been introduced by 71d84ef, and older versions have been using
the default value of FD_SETSIZE, defined at 64.
Per buildfarm member jacana, but this impacts all Windows animals
running the TAP tests. I have reproduced the failure locally to check
the patch.
Author: Michael Paquier
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/20190826054000.GE7005@paquier.xyz
Backpatch-through: 9.5
Previously 'uname -r' on Msys2 reported a kernele release starting with
2. The latest version starts with 3. In commit 1638623f we specifically
looked for one starting with 2. This is now changed to look for any
digit between 2 and 9.
backpatch to release 10.
On msys2, 'uname -s' reports a string starting MSYS instead on MINGW
as happens on msys1. Treat these both the same way. This reverts
608a710195a4b in favor of a more general solution.
Backpatch to all live branches.
When trying to use a high number of jobs, vacuumdb has only checked for
a maximum number of jobs used, causing confusing failures when running
out of file descriptors when the jobs open connections to Postgres.
This commit changes the error handling so as we do not check anymore for
a maximum number of allowed jobs when parsing the option value with
FD_SETSIZE, but check instead if a file descriptor is within the
supported range when opening the connections for the jobs so as this is
detected at the earliest time possible.
Also, improve the error message to give a hint about the number of jobs
recommended, using a wording given by the reviewers of the patch.
Reported-by: Andres Freund
Author: Michael Paquier
Reviewed-by: Andres Freund, Álvaro Herrera, Tom Lane
Discussion: https://postgr.es/m/20190818001858.ho3ev4z57fqhs7a5@alap3.anarazel.de
Backpatch-through: 9.5
POSIX permits getopt() to advance optind beyond argc when the last
argv entry is an option that requires an argument and hasn't got one.
It seems that no major platforms actually do that, but musl does,
so that something like "psql -f" would crash with that libc.
Add a check that optind is in range before trying to look at the
possibly-bogus option.
Report and fix by Quentin Rameau. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/20190825100617.GA6087@fifth.space
We were setting extra_float_digits = 0 to avoid platform-dependent
output in this test, but that's still able to expose platform-specific
roundoff behavior in some new test cases added by commit a3d284485,
as reported by Peter Eisentraut. Reduce it to -1 to hide that.
(Over in geometry.sql, we're using -3, which is an ancient decision
dating to 337f73b1b. I wonder whether that's overkill now. But
there's probably little value in trying to change it.)
Back-patch to v12 where a3d284485 came in; there's no evidence that
we have any platform-dependent issues here before that.
Discussion: https://postgr.es/m/15551268-e224-aa46-084a-124b64095ee3@2ndquadrant.com
Bleeding-edge LLVM has stopped supplying replacements for various
C++14 library features, for people on older C++ versions. Since we're
not ready to require C++14 yet, just use plain old new instead of
make_unique. As revealed by buildfarm animal seawasp.
Back-patch to 11.
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CA%2BhUKGJWG7unNqmkxg7nC5o3o-0p2XP6co4r%3D9epqYMm8UY4Mw%40mail.gmail.com
This adds a section for heap-related functions. These were previously
mixed with functions having a more general purpose, leading to
confusion. While on it, add a query example for fsm_page_contents.
Backpatch down to 10, where b5e3942 introduced the subsections for
function types in pageinspect documentation.
Author: Masahiko Sawada
Discussion: https://postgr.es/m/CAD21AoDyM7E1+cK3-aWejxKTGC-wVVP2B+RnJhN6inXyeRmqzw@mail.gmail.com
Backpatch-through: 10
This contains all individuals mentioned in the commit messages during
PostgreSQL 12 development.
current through 068bc300c6ed5333d9561e55cb5f45d17de88422
In early development patches, "replication origins" were called "identifiers";
almost everything was renamed, but these references to the old terminology
went unnoticed.
Reported-by: Craig Ringer
If the record argument is NULL and has no declared type more concrete
than RECORD, we can't extract useful information about the desired
rowtype from it. In this case, see if we're in FROM with an AS clause,
and if so extract the needed rowtype info from AS.
It worked like this before v11, but commit 37a795a60 removed the
behavior, reasoning that it was undocumented, inefficient, and utterly
not self-consistent. If you want to take type info from an AS clause,
you should be using the json_to_record() family of functions not the
json_populate_record() family. Also, it was already the case that
the "populate" functions would fail for a null-valued RECORD input
(with an unfriendly "record type has not been registered" error)
when there wasn't an AS clause at hand, and it wasn't obvious that
that behavior wasn't OK when there was one. However, it emerges
that some people were depending on this to work, and indeed the
rather off-point error message you got if you left off AS encouraged
slapping on AS without switching to the json_to_record() family.
Hence, put back the fallback behavior of looking for AS. While at it,
improve the run-time error you get when there's no place to obtain type
info; we can do a lot better than "record type has not been registered".
(We can't, unfortunately, easily improve the parse-time error message
that leads people down this path in the first place.)
While at it, I refactored the code a bit to avoid duplicating the
same logic in several different places.
Per bug #15940 from Jaroslav Sivy. Back-patch to v11 where the
current coding came in. (The pre-v11 deficiencies in this area
aren't regressions, so we'll leave those branches alone.)
Patch by me, based on preliminary analysis by Dmitry Dolgov.
Discussion: https://postgr.es/m/15940-2ab76dc58ffb85b6@postgresql.org
If a table inherits from multiple unrelated parents, we must disallow
changing the type of a column inherited from multiple such parents, else
it would be out of step with the other parents. However, it's possible
for the column to ultimately be inherited from just one common ancestor,
in which case a change starting from that ancestor should still be
allowed. (I would not be excited about preserving that option, were
it not that we have regression test cases exercising it already ...)
It's slightly annoying that this patch looks different from the logic
with the same end goal in renameatt(), and more annoying that it
requires an extra syscache lookup to make the test. However, the
recursion logic is quite different in the two functions, and a
back-patched bug fix is no place to be trying to unify them.
Per report from Manuel Rigger. Back-patch to 9.5. The bug exists in
9.4 too (and doubtless much further back); but the way the recursion
is done in 9.4 is a good bit different, so that substantial refactoring
would be needed to fix it in 9.4. I'm disinclined to do that, or risk
introducing new bugs, for a bug that has escaped notice for this long.
Discussion: https://postgr.es/m/CA+u7OA4qogDv9rz1HAb-ADxttXYPqQdUdPY_yd4kCzywNxRQXA@mail.gmail.com
This is a variant of the problem fixed in commit 25b692568, which
unfortunately we failed to detect at the time. If an update trigger
returns the "old" tuple, as it's entitled to do, then a subsequent
iteration of the loop in ExecBRUpdateTriggers would have "oldtuple"
equal to "trigtuple" and would fail to notice that it shouldn't
free that.
In addition to fixing the code, extend the test case added by
25b692568 so that it covers multiple-trigger-iterations cases.
This problem does not manifest in v12/HEAD, as a result of the
relevant code having been largely rewritten for slotification.
However, include the test case into v12/HEAD anyway, since this
is clearly an area that someone could break again in future.
Per report from Piotr Gabriel Kosinski. Back-patch into all
supported branches, since the bug seems quite old.
Diagnosis and code fix by Thomas Munro, test case by me.
Discussion: https://postgr.es/m/CAFMLSdP0rd7LqC3j-H6Fh51FYSt5A10DDh-3=W4PPc4LLUQ8YQ@mail.gmail.com
Commit 4b93f5799 rearranged things in plpgsql to make it cope better with
composite types changing underneath it intra-session. However, I failed to
consider the case of a composite type being dropped and recreated entirely.
In my defense, the previous coding didn't consider that possibility at all
either --- but it would accidentally work so long as you didn't change the
type's field list, because the built-at-compile-time list of component
variables would then still match the type's new definition. The new
coding, however, occasionally tries to re-look-up the type by OID, and
then fails to find the dropped type.
To fix this, we need to save the TypeName struct, and then redo the type
OID lookup from that. Of course that's expensive, so we don't want to do
it every time we need the type OID. This can be fixed in the same way that
4b93f5799 dealt with changes to composite types' definitions: keep an eye
on the type's typcache entry to see if its tupledesc has been invalidated.
(Perhaps, at some point, this mechanism should be generalized so it can
work for non-composite types too; but for now, plpgsql only tries to
cope with intra-session redefinitions of composites.)
I'm slightly hesitant to back-patch this into v11, because it changes
the contents of struct PLpgSQL_type as well as the signature of
plpgsql_build_datatype(), so in principle it could break code that is
poking into the innards of plpgsql. However, the only popular extension
of that ilk is pldebugger, and it doesn't seem to be affected. Since
this is a regression for people who were relying on the old behavior,
it seems worth taking the small risk of causing compatibility issues.
Per bug #15913 from Daniel Fiori. Back-patch to v11 where 4b93f5799
came in.
Discussion: https://postgr.es/m/15913-a7e112e16dedcffc@postgresql.org