Much of the code in process_pm_child_exit() to launch replacement
processes when one exits or when progressing to next postmaster state
was unnecessary, because the ServerLoop will launch any missing
background processes anyway. Remove the redundant code and let
ServerLoop handle it.
In ServerLoop, move the code to launch all the processes to a new
subroutine, to group it all together.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
libpq must not use palloc/pfree. It's not allowed to exit on allocation
failure, and mixing the frontend pfree with malloc is architecturally
unsound.
Remove fe_memutils from the shlib build entirely, to keep devs from
accidentally depending on it in the future.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
The now preferred way to call realpath() is by passing NULL as the
second argument and get a malloc'ed result. We still supported the
old way of providing our own buffer as a second argument, for some
platforms that didn't support the new way yet. Those were only
Solaris less than version 11 and some older AIX versions (7.1 and
newer appear to support the new variant). We don't support those
platforms versions anymore, so we can remove this extra code.
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/9e638b49-5c3f-470f-a392-2cbedb2f7855%40eisentraut.org
32d3ed816 added the "path" column to pg_backend_memory_contexts to allow
a stable method of obtaining the parent MemoryContext of a given row in
the view. Using the "path" column is now the preferred method of
obtaining the parent row.
Previously, any queries which were self-joining to this view using the
"name" and "parent" columns could get incorrect results due to the fact
that names are not unique. Here we aim to explicitly break such queries
so that they can be corrected and use the "path" column instead.
It is possible that there are more innocent users of the parent column
that just need an indication of the parent and having to write out a
self-joining CTE may be an unnecessary hassle for those cases. Let's
remove the column for now and see if anyone comes back with any
complaints. This does seem like a good time to attempt to get rid of
the column as we still have around 1 year to revert this if someone comes
back with a valid complaint. Plus this view is new to v14 and is quite
niche, so perhaps not many people will be affected.
Author: Melih Mutlu <m.melihmutlu@gmail.com>
Discussion: https://postgr.es/m/CAGPVpCT7NOe4fZXRL8XaoxHpSXYTu6GTpULT_3E-HT9hzjoFRA@mail.gmail.com
Teach nbtree backwards scans to avoid relocking a just-read leaf page to
read its current left sibling link when it isn't truly necessary. This
happened inside _bt_readnextpage whenever _bt_readpage had already
determined that there'll be no further matches to the left (or at least
none for the current primitive index scan, for a scan with array keys).
A new precheck inside _bt_readnextpage is all that we need to avoid
these useless lock acquisitions. Arguably, using a precheck like this
was a missed opportunity for commit 2ed5b87f96, which taught nbtree to
drop leaf page pins early to avoid blocking cleanup by VACUUM. Forwards
scans already managed to avoid relocking the page like this.
The optimization added by this commit is particularly helpful with
backwards scans that use array keys where the scan must perform multiple
primitive index scans. Such backwards scans will now avoid a useless
leaf page re-lock at the end of each primitive index scan.
Note that this commit does not attempt to avoid needlessly re-locking a
leaf page that was just read when the scan must follow the leaf page's
left link. That more ambitious optimization could work by stashing the
left link when the page is first read by a backwards scan, allowing the
subsequent _bt_readnextpage call to optimistically skip re-reading the
original page just to get a new copy of its left link. For now we only
address cases where we don't care about our original page's left link.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-Wz=xgs7PojG=EUvhgadwENzu_mY_riNh-w9wFPsaS717ew@mail.gmail.com
Coverity complained that the hash_create() call might access
hash_table_ctl->hctl. That's a false alarm, hash_create() only
accesses that field when passed the HASH_SHARED_MEM flag. Try to
silence it by using a plain local variable instead of a const. That's
how the HASHCTL is initialized in all the other hash_create() calls.
Coverity thinks dpns->plan could be null at these points. That
shouldn't really be possible, but it's easy enough to modify the
Asserts so they'd not core-dump if it were true.
These are new in b919a97a6. Back-patch to v13; the v12 version
of the patch didn't have these Asserts.
The code intends to allow GUCs to be set within parallel workers
via function SET clauses, but not otherwise. However, doing so fails
for "session_authorization" and "role", because the assign hooks for
those attempt to set the subsidiary "is_superuser" GUC, and that call
falls foul of the "not otherwise" prohibition. We can't switch to
using GUC_ACTION_SAVE for this, so instead add a new GUC variable
flag GUC_ALLOW_IN_PARALLEL to mark is_superuser as being safe to set
anyway. (This is okay because is_superuser has context PGC_INTERNAL
and thus only hard-wired calls can change it. We'd need more thought
before applying the flag to other GUCs; but maybe there are other
use-cases.) This isn't the prettiest fix perhaps, but other
alternatives we thought of would be much more invasive.
While here, correct a thinko in commit 059de3ca4: when rejecting
a GUC setting within a parallel worker, we should return 0 not -1
if the ereport doesn't longjmp. (This seems to have no consequences
right now because no caller cares, but it's inconsistent.) Improve
the comments to try to forestall future confusion of the same kind.
Despite the lack of field complaints, this seems worth back-patching.
Thanks to Nathan Bossart for the idea to invent a new flag,
and for review.
Discussion: https://postgr.es/m/2833457.1723229039@sss.pgh.pa.us
pg_wal_replay_wait() is intended to be called on standby. However, standby
can be promoted to primary at any moment, even concurrently with the
pg_wal_replay_wait() call. If recovery is not currently in progress
that doesn't mean the wait was unsuccessful. Thus, we always need to recheck
if the target LSN is replayed.
Reported-by: Kevin Hale Boyes
Discussion: https://postgr.es/m/CAPpHfdu5QN%2BZGACS%2B7foxmr8_nekgA2PA%2B-G3BuOUrdBLBFb6Q%40mail.gmail.com
Author: Alexander Korotkov
Since the introduction of TID store, vacuum uses far less memory in
the common case than in versions 16 and earlier. Invoking multiple
rounds of index vacuuming in turn requires a much larger table. It'd
be a good idea anyway to cover this case in regression testing, and a
lower limit is less painful for slow buildfarm animals. The reason to
do it now is to re-enable coverage of the bugfix in commit 83c39a1f7f.
For consistency, give autovacuum_work_mem the same treatment.
Suggested by Andres Freund
Tested by Melanie Plageman
Backpatch to v17, where TID store was introduced
Discussion: https://postgr.es/m/20240516205458.ohvlzis5b5tvejru@awork3.anarazel.de
Discussion: https://postgr.es/m/20240722164745.fvaoh6g6zprisqgp%40awork3.anarazel.de
Some code using atol() would not work correctly if sizeof(long)==4:
- src/bin/pg_basebackup/pg_basebackup.c: Would miscount size of a
tablespace over 2 TB.
- src/bin/pg_basebackup/streamutil.c: Would truncate a timeline ID
beyond INT32_MAX.
- src/bin/pg_rewind/libpq_source.c: Would miscount size of files
larger than 2 GB (but this currently cannot happen).
Replace these with atoll().
In one case, the use of atol() did not result in incorrect behavior
but seems inconsistent with related code:
- src/interfaces/ecpg/ecpglib/execute.c: Gratuitous, since it
processes a value from pg_type.typlen, which is int16.
Replace this with atoi().
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/a52738ad-06bc-4d45-b59f-b38a8a89de49%40eisentraut.org
libpq tracing via PQtrace would uselessly print the wrong thing for
these types of messages. With this commit, their type and contents
would be correctly listed. (This can be verified with PQconnectStart(),
but we don't use that in libpq_pipeline, so I (Álvaro) haven't bothered
to add any tests.)
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQSoPHtZ4xe0raJ6FYSEiPPS+YWXBhOGo+Y1YecLgknF3g@mail.gmail.com
All child processes except the syslogger are killed on a restart. The
archiver might be already running though, if it was started during
recovery.
The split in the comments between "other special children" and the
first group of "background tasks" seemed really arbitrary, so I just
merged them all into one group.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/8f2118b9-79e3-4af7-b2c9-bd5818193ca4@iki.fi
Currently, when a child process exits, the postmaster first scans
through BackgroundWorkerList, to see if it the child process was a
background worker. If not found, then it scans through BackendList to
see if it was a regular backend. That leads to some duplication
between the bgworker and regular backend cleanup code, as both have an
entry in the BackendList that needs to be cleaned up in the same way.
Refactor that so that we scan just the BackendList to find the child
process, and if it was a background worker, do the additional
bgworker-specific cleanup in addition to the normal Backend cleanup.
Change HandleChildCrash so that it doesn't try to handle the cleanup
of the process that already exited, only the signaling of all the
other processes. When called for any of the aux processes, the caller
had already cleared the *PID global variable, so the code in
HandleChildCrash() to do that was unused.
On Windows, if a child process exits with ERROR_WAIT_NO_CHILDREN, it's
now logged with that exit code, instead of 0. Also, if a bgworker
exits with ERROR_WAIT_NO_CHILDREN, it's now treated as crashed and is
restarted. Previously it was treated as a normal exit.
If a child process is not found in the BackendList, the log message
now calls it "untracked child process" rather than "server process".
Arguably that should be a PANIC, because we do track all the child
processes in the list, so failing to find a child process is highly
unexpected. But if we want to change that, let's discuss and do that
as a separate commit.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
This allows ForgetBackgroundWorker() and ReportBackgroundWorkerExit()
to take a RegisteredBgWorker pointer as argument, rather than a list
iterator. That feels a little more natural. But more importantly, this
paves the way for more refactoring in the next commit.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/835232c0-a5f7-4f20-b95b-5b56ba57d741@iki.fi
Presently, we inconsistently use dashes in references to these
algorithms (e.g., CRC32C versus CRC-32C). Some popular web sources
appear to prefer dashes, and with this commit, we will, too.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/ZrUFpLP-w2zTAHqq%40nathan
This section claims we use CRC-32 for WAL records and two-phase
state files, but we've actually used CRC-32C since v9.5 (commit
5028f22f6e). Fix that.
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/ZrUFpLP-w2zTAHqq%40nathan
Backpatch-through: 12
To deparse a reference to a field of a RECORD-type output of a
subquery, EXPLAIN normally digs down into the subquery's plan to try
to discover exactly which anonymous RECORD type is meant. However,
this can fail if the subquery has been optimized out of the plan
altogether on the grounds that no rows could pass the WHERE quals,
which has been possible at least since 3fc6e2d7f. There isn't
anything remaining in the plan tree that would help us, so fall back
to printing the field name as "fN" for the N'th column of the record.
(This will actually be the right thing some of the time, since it
matches the column names we assign to RowExprs.)
In passing, fix a comment typo in create_projection_plan, which
I noticed while experimenting with an alternative fix for this.
Per bug #18576 from Vasya B. Back-patch to all supported branches.
Richard Guo and Tom Lane
Discussion: https://postgr.es/m/18576-9feac34e132fea9e@postgresql.org
This used to be part of CREATE OPERATOR CLASS and ALTER OPERATOR
FAMILY, but it has done nothing (except issue a NOTICE) since
PostgreSQL 8.4. Commit 30e7c175b8 removed support for dumping from
pre-9.2 servers, so this no longer serves any need.
This now removes it completely, and you'd get a normal parse error if
you used it.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/113ef2d2-3657-4353-be97-f28fceddbca1%40eisentraut.org
The apply worker was using XactLastCommitEnd as local end_lsn for applying
prepare and rollback_prepare. The XactLastCommitEnd value is the end lsn
of the last commit applied before the prepare transaction which makes no
sense. This LSN is used to decide whether we can send the acknowledgment
of the corresponding remote LSN to the server.
It is okay not to set the local_end LSN with the actual WAL position for
the prepare because we always flush the prepare record. So, we can send
the acknowledgment of the remote_end LSN as soon as prepare is finished.
The current code is misleading but as such doesn't create any problem, so
decided not to backpatch.
Author: Hayato Kuroda
Reviewed-by: Shveta Malik, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2@TYAPR01MB5692.jpnprd01.prod.outlook.com
In future commits we're going to trace authentication related messages.
Some of these messages contain challenge bytes as part of a
challenge-response flow. Since these bytes are different for every
connection, we want to normalize them when the PQTRACE_REGRESS_MODE
trace flag is set. This commit modifies pqTraceOutputNchar to take a
suppress argument, which makes it possible to do so.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQSoPHtZ4xe0raJ6FYSEiPPS+YWXBhOGo+Y1YecLgknF3g@mail.gmail.com
Trying to attach a table as a partition which is already on the
referenced side of a foreign key on the partitioned table that it is
being attached to, leads to strange behavior: we try to clone the
foreign key from the parent to the partition, but this new FK points to
the partition itself, and the mix of pg_constraint rows and triggers
doesn't behave well.
Rather than trying to untangle the mess (which might be possible given
sufficient time), I opted to forbid the ATTACH. This doesn't seem a
problematic restriction, given that we already fail to create the
foreign key if you do it the other way around, that is, having the
partition first and the FK second.
Backpatch to all supported branches.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org
I also took the liberty of changing
errmsg("COPY DEFAULT only available using COPY FROM")
to
errmsg("COPY %s cannot be used with %s", "DEFAULT", "COPY TO")
because the original wording is unlike all other messages that indicate
option incompatibility. This message was added by commit 9f8377f7a2
(16-era), in whose development thread there was no discussion on this
point.
Backpatch to 17.
getTimelineHistory() is called twice, to read the source and the
target timeline history files. However, the loop to print the file
with the --debug option used the wrong variable when dealing with the
source. As a result, the source's history was always printed as empty.
Spotted while debugging bug #18575, but this does not fix that bug,
just the debugging output. Backpatch to all supported versions.
Discussion: https://www.postgresql.org/message-id/092dd515-b7b4-4fd0-8407-ceca2f02f6ec@iki.fi
If the plancache entry for the CALL statement is already stale,
it's possible for us to fetch an old procedure OID out of it,
and then fail with "cache lookup failed for function NNN".
In ordinary usage this never happens because make_callstmt_target
is called just once immediately after building the plancache
entry. It can be forced however by setting up an erroneous CALL
(that causes make_callstmt_target itself to report an error),
then dropping/recreating the target procedure, then repeating
the erroneous CALL.
To fix, use SPI_plan_get_cached_plan() to fetch the plancache's
plan, rather than assuming we can use SPI_plan_get_plan_sources().
This shouldn't add any noticeable overhead in the normal case,
and in the stale-plan case we'd have had to replan anyway a little
further down.
The other callers of SPI_plan_get_plan_sources() seem OK, because
either they don't need up-to-date plans or they know that the
query was just (re) planned. But add some commentary in hopes
of not falling into this trap again.
Per bug #18574 from Song Hongyu. Back-patch to v14 where this coding
was introduced. (Older branches have comparable code, but it's run
after any required replanning, so there's no issue.)
Discussion: https://postgr.es/m/18574-2ce7ba3249221389@postgresql.org
Make it clear that "astreamer" stands for "archive streamer".
Generalize comments that still believe this code can only be used
by pg_basebackup. Add some comments explaining the asymmetry
between the gzip, lz4, and zstd astreamers, in the hopes of making
life easier for anyone who hacks on this code in the future.
Robert Haas, reviewed by Amul Sul.
Discussion: http://postgr.es/m/CAAJ_b97O2kkKVTWxt8MxDN1o-cDfbgokqtiN2yqFf48=gXpcxQ@mail.gmail.com
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.
Backpatch to v14, where we switched to the new implementation.
Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
Commit 0b9466fce added a dependency on fe_memutils' pnstrdup() inside
informix.c. This adds an exit() path in a library, which we don't
want. (Unlike libpq, the ecpg libraries don't have an automated check
for that, but it makes sense to keep them to a similar standard.) The
ecpg code can already handle failure results from the *strdup() call
by itself.
Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
These callbacks are receiving hash values as arguments, which doesn't allow
direct lookups for AttoptCacheHash and TypeCacheHash. This is why subject
callbacks currently use full iteration over corresponding hashes.
This commit avoids full hash iteration in InvalidateAttoptCacheCallback(),
and TypeCacheTypCallback(). At first, we switch AttoptCacheHash and
TypeCacheHash to use same hash function as syscache. As second, we
use hash_seq_init_with_hash_value() to iterate only hash entries with matching
hash value.
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru
Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov
Reviewed-by: Andrei Lepikhov
This new function iterates hash entries with given hash values. This function
is designed to avoid full sequential hash search in the syscache invalidation
callbacks.
Discussion: https://postgr.es/m/5812a6e5-68ae-4d84-9d85-b443176966a1%40sigaev.ru
Author: Teodor Sigaev
Reviewed-by: Aleksander Alekseev, Tom Lane, Michael Paquier, Roman Zharkov
Reviewed-by: Andrei Lepikhov
The buffer allocation was correct, but looked archaic and scary:
- It was weird to calculate the buffer size before determining which
format string was used. With the same effort, we could've used the
right-sized buffer for each branch.
- Commit aa0d350456 added one more possible return string ("all true
bits"), but didn't adjust the code at the top of the function to
calculate the returned string's max size. It was not a live bug,
because the new string was smaller than the existing ones, but
seemed wrong in principle.
- Use of sprintf() is generally eyebrow-raising these days
Switch to psprintf(). psprintf() allocates a larger buffer than what
was allocated before, 128 bytes vs 80 bytes, which is acceptable as
this code is not performance or space critical.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
I started by marking VoidString as const, and fixing the fallout by
marking more fields and function arguments as const. It proliferated
quite a lot, but all within spell.c and spell.h.
A more narrow patch to get rid of the static VoidString buffer would
be to replace it with '#define VoidString ""', as C99 allows assigning
"" to a non-const pointer, even though you're not allowed to modify
it. But it seems like good hygiene to mark all these as const. In the
structs, the pointers can point to the constant VoidString, or a
buffer allocated with palloc(), or with compact_palloc(), so you
should not modify them.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
There was no need for these to be static buffers, local variables work
just as well. I think they were marked as 'static' to imply that they
are read-only, but 'const' is more appropriate for that, so change
them to const.
To make it possible to mark the variables as 'const', also add 'const'
decorations to the transformRelOptions() signature.
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
pg_strxfrm() takes a pg_locale_t, so it works properly with all
providers. This improves estimates for ICU when performing linear
interpolation within a histogram bin.
Previously, convert_string_datum() always used strxfrm() and relied on
setlocale(). That did not produce good estimates for non-default or
non-libc collations.
Discussion: https://postgr.es/m/89475ee5487d795124f4e25118ea8f1853edb8cb.camel@j-davis.com
Parallel workers failed after a sequence like
BEGIN;
CREATE USER foo;
SET SESSION AUTHORIZATION foo;
because check_session_authorization could not see the uncommitted
pg_authid row for "foo". This is because we ran RestoreGUCState()
in a separate transaction using an ordinary just-created snapshot.
The same disease afflicts any other GUC that requires catalog lookups
and isn't forgiving about the lookups failing.
To fix, postpone RestoreGUCState() into the worker's main transaction
after we've set up a snapshot duplicating the leader's. This affects
check_transaction_isolation and check_transaction_deferrable, which
think they should only run during transaction start. Make them
act like check_transaction_read_only, which already knows it should
silently accept the value when InitializingParallelWorker.
This un-reverts commit f5f30c22e. The original plan was to back-patch
that, but the fact that 0ae5b763e proved to be a pre-requisite shows
that the subtle API change for GUC hooks might actually break some of
them. The problem we're trying to fix seems not worth taking such a
risk for in stable branches.
Per bug #18545 from Andrey Rachitskiy.
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
The previous coding here threw an error from assign_client_encoding
if it was invoked in a parallel worker. That's a very fundamental
violation of the GUC hook API: assign hooks must not throw errors.
The place to complain is in the check hook, so move the test to
there, and use the regular check-hook API (ie return false) to
report it.
The reason this coding is a problem is that it breaks GUC rollback,
which may occur after we leave InitializingParallelWorker state.
That case seems not actually reachable before now, but commit
f5f30c22e made it reachable, so we need to fix this before that
can be un-reverted.
In passing, improve the commentary in ParallelWorkerMain, and
add a check for failure of SetClientEncoding. That's another
case that can't happen now but might become possible after
foreseeable code rearrangements (notably, if the shortcut of
skipping PrepareClientEncoding stops being OK).
Discussion: https://postgr.es/m/18545-feba138862f19aaa@postgresql.org
Prior to commit 0709b7ee72, which changed the spinlock primitives
to function as compiler barriers, access to variables within a
spinlock-protected section required using a volatile pointer, but
that is no longer necessary.
Reviewed-by: Bertrand Drouvot, Michael Paquier
Discussion: https://postgr.es/m/Zqkv9iK7MkNS0KaN%40nathan