This commit adds a new COPY option LOG_VERBOSITY, which controls the
amount of messages emitted during processing. Valid values are
'default' and 'verbose'.
This is currently used in COPY FROM when ON_ERROR option is set to
ignore. If 'verbose' is specified, a NOTICE message is emitted for
each discarded row, providing additional information such as line
number, column name, and the malformed value. This helps users to
identify problematic rows that failed to load.
Author: Bharath Rupireddy
Reviewed-by: Michael Paquier, Atsushi Torikoshi, Masahiko Sawada
Discussion: https://www.postgresql.org/message-id/CALj2ACUk700cYhx1ATRQyRw-fBM%2BaRo6auRAitKGff7XNmYfqQ%40mail.gmail.com
This reverts commit 07f0f6abfc7f6c55cede528d9689dedecefc734a.
This has shown failures on both Valgrind and big-endian machines,
per members skink and pike.
After encountering the NUL terminator, the word-at-a-time loop exits
and we must hash the remaining bytes. Previously we calculated the
terminator's position and re-loaded the remaining bytes from the input
string. We already have all the data we need in a register, so let's
just mask off the bytes we need and hash them immediately. The mask can
be cheaply computed without knowing the terminator's position. We still
need that position for the length calculation, but the CPU can now
do that in parallel with other work, shortening the dependency chain.
Ants Aasma and John Naylor
Discussion: https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com
Previously, the executor did index insert unconditionally after calling
table AM interface methods tuple_insert() and multi_insert(). This commit
introduces the new parameter insert_indexes for these two methods. Setting
'*insert_indexes' to true saves the current logic. Setting it to false
indicates that table AM cares about index inserts itself and doesn't want the
caller to do that.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent, Mark Dilger
Currently, there is just one algorithm for sampling tuples from a table written
in acquire_sample_rows(). Custom table AM can just redefine the way to get the
next block/tuple by implementing scan_analyze_next_block() and
scan_analyze_next_tuple() API functions.
This approach doesn't seem general enough. For instance, it's unclear how to
sample this way index-organized tables. This commit allows table AM to
encapsulate the whole sampling algorithm (currently implemented in
acquire_sample_rows()) into the relation_analyze() API function.
Discussion: https://postgr.es/m/CAPpHfdurb9ycV8udYqM%3Do0sPS66PJ4RCBM1g-bBpvzUfogY0EA%40mail.gmail.com
Reviewed-by: Pavel Borisov, Matthias van de Meent
This SQL-callable function behaves much like our internal utility
function getBaseType(), except it returns NULL rather than failing for
an invalid type OID. (That behavior is modeled on our experience with
other catalog-inquiry functions such as the ACL checking functions.)
The key advantage over doing a join to pg_type is that it will loop
as needed to find the bottom base type of a nest of domains.
Steve Chavez, reviewed by jian he and others
Discussion: https://postgr.es/m/CAGRrpzZSX8j=MQcbCSEisFA=ic=K3bknVfnFjAv1diVJxFHJvg@mail.gmail.com
The test fails when RESET statement_timeout takes longer than 10ms.
Avoid the problem by using SET LOCAL instead.
Overall, this test is not ideal: 10ms could be shorter than the time to
have sent the query to the "remote" server, so it's possible that on
some machines this test doesn't actually witness a remote query being
cancelled. We may want to improve on this someday by using some other
testing technique, but for now it's better than nothing. I verified
manually that one round of remote cancellation occurs when this runs on
my machine.
Discussion: https://postgr.es/m/CAGECzQRsdWnj=YaaPCnA8d7E1AdbxRPBYmyBQRMPUijR2MpM_w@mail.gmail.com
This allows MERGE commands to include WHEN NOT MATCHED BY SOURCE
actions, which operate on rows that exist in the target relation, but
not in the data source. These actions can execute UPDATE, DELETE, or
DO NOTHING sub-commands.
This is in contrast to already-supported WHEN NOT MATCHED actions,
which operate on rows that exist in the data source, but not in the
target relation. To make this distinction clearer, such actions may
now be written as WHEN NOT MATCHED BY TARGET.
Writing WHEN NOT MATCHED without specifying BY SOURCE or BY TARGET is
equivalent to writing WHEN NOT MATCHED BY TARGET.
Dean Rasheed, reviewed by Alvaro Herrera, Ted Yu and Vik Fearing.
Discussion: https://postgr.es/m/CAEZATCWqnKGc57Y_JanUBHQXNKcXd7r=0R4NEZUVwP+syRkWbA@mail.gmail.com
This brings the titlecasing implementation for the builtin provider
out of formatting.c and into unicode_case.c, along with
unicode_strlower() and unicode_strupper(). Accepts an arbitrary word
boundary callback.
Simple for now, but can be extended to support the Unicode Default
Case Conversion algorithm with full case mapping.
Discussion: https://postgr.es/m/3bc653b5d562ae9e2838b11cb696816c328a489a.camel@j-davis.com
Reviewed-by: Peter Eisentraut
This recently-added test case checks the plan of an inner join
between two identical tables. It's just chance which join order
the planner will pick, and in the presence of any variation in
the underlying statistics, the displayed plan might change.
Add a WHERE condition to break the cost symmetry and hopefully
stabilize matters.
(We're still trying to understand exactly why the underlying
statistics aren't as stable as intended, but this seems like
a good change anyway, since this test would surely bite us
again in future.)
While here, clean up assorted comment spelling, grammar, and
whitespace problems.
Discussion: https://postgr.es/m/4168116.1711720146@sss.pgh.pa.us
This is marked PGC_SIGHUP, so it can only be set in a configuration
file, not anywhere else; and it is also marked GUC_DISALLOW_IN_AUTO_FILE,
so it can't be set using ALTER SYSTEM. When set to false, the
ALTER SYSTEM command is disallowed.
There was considerable concern that this would be misinterpreted as
a security feature, which it is not, because a determined superuser
has various ways of bypassing it. Hence, a lot of work has gone into
wordsmithing the documentation, in the hopes of avoiding any such
confusion.
Jelte Fennemia-Nio and Gabriele Bartolini, with wording suggestions
for the documentation from many others.
Discussion: http://postgr.es/m/CA%2BVUV5rEKt2%2BCdC_KUaPoihMu%2Bi5ChT4WVNTr4CD5-xXZUfuQw%40mail.gmail.com
Allow use of BeginInternalSubTransaction() in parallel mode, so long
as the subtransaction doesn't attempt to acquire an XID or increment
the command counter. Given those restrictions, the other parallel
processes don't need to know about the subtransaction at all, so
this should be safe. The benefit is that it allows subtransactions
intended for error recovery, such as pl/pgsql exception blocks,
to be used in PARALLEL SAFE functions.
Another reason for doing this is that the API of
BeginInternalSubTransaction() doesn't allow reporting failure.
pl/python for one, and perhaps other PLs, copes very poorly with an
error longjmp out of BeginInternalSubTransaction(). The headline
feature of this patch removes the only easily-triggerable failure
case within that function. There remain some resource-exhaustion
and similar cases, which we now deal with by promoting them to FATAL
errors, so that callers need not try to clean up. (It is likely
that such errors would leave us with corrupted transaction state
inside xact.c, making recovery difficult if not impossible anyway.)
Although this work started because of a report of a pl/python crash,
we're not going to do anything about that in the back branches.
Back-patching this particular fix is obviously not very wise.
While we could contemplate some narrower band-aid, pl/python is
already an untrusted language, so it seems okay to classify this
as a "so don't do that" case.
Patch by me, per report from Hao Zhang. Thanks to Robert Haas for
review.
Discussion: https://postgr.es/m/CALY6Dr-2yLVeVPhNMhuBnRgOZo1UjoTETgtKBx1B2gUi8yy+3g@mail.gmail.com
Avoid setting an access method OID for relation kinds that don't take
one. Code review for new feature added in 374c7a229042.
Author: Justin Pryzby <pryzby@telsasoft.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/e5516ac1-5264-c3c0-d822-9e6f614ea93b@gmail.com
Apparently these markers cause the modules to not link correctly in some
platforms, at least per buildfarm member indri; moreover, this code is
only used in modules that don't have a translation. If we someday add
i18n support to contrib/ it might be worth revisiting this.
Commit 61461a300c1c introduced new functions to libpq for cancelling
queries. This commit introduces a helper function that backend-side
libraries and extensions can use to invoke those. This function takes a
timeout and can itself be interrupted while it is waiting for a cancel
request to be sent and processed, instead of being blocked.
This replaces the usage of the old functions in postgres_fdw and dblink.
Finally, it also adds some test coverage for the cancel support in
postgres_fdw.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Discussion: https://postgr.es/m/CAGECzQT_VgOWWENUqvUV9xQmbaCyXjtRRAYO8W07oqashk_N+g@mail.gmail.com
Previously, the behavior of TidStoreCreate() was inconsistent between
local and shared TidStore instances in terms of memory limitation. For
local TidStore, a memory context was created with initial and maximum
memory block sizes, as well as a minimum memory context size, based on
the specified max_bytes values. However, for shared TidStore, the
provided DSA area was used for TID storage. Although commit bb952c8c8b
allowed specifying the initial and maximum DSA segment sizes, callers
would have needed to clamp their own limits, which was not consistent
and user-friendly.
With this commit, when creating a shared TidStore, a dedicated DSA
area is created for TID storage instead of using a provided DSA
area. The initial and maximum DSA segment sizes are chosen based on
the specified max_bytes. Other processes can attach to the shared
TidStore using the handle of the created DSA returned by the new
TidStoreGetDSA() function and the DSA pointer returned by
TidStoreGetHandle(). The created DSA has the same lifetime as the
shared TidStore and is deleted when all processes detach from it.
To improve clarity, the TidStoreCreate() function has been divided
into two separate functions: TidStoreCreateLocal() and
TidStoreCreateShared().
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/CAD21AoAyc1j%3DBCdUqZfk6qbdjZ68UgRx1Gkpk0oah4K7S0Ri9g%40mail.gmail.com
When a plain aggregate is used as a window function, and the window
frame start is specified as UNBOUNDED PRECEDING, the frame's head
cannot move so we do not need to use moving-aggregate mode. The check
for that was put into initialize_peragg(), failing to notice that
ExecInitWindowAgg() calls that function before it's filled in
winstate->frameOptions. Since makeNode() would have zeroed the field,
this didn't provoke uninitialized-value complaints, nor would the
erroneous decision have resulted in more than a little inefficiency.
Still, it's wrong, so move the initialization of
winstate->frameOptions earlier to make it work properly.
While here, also fix a thinko in a comment. Both errors crept in in
commit a9d9acbf2 which introduced the moving-aggregate mode.
Spotted by Vallimaharajan G. Back-patch to all supported branches.
Discussion: https://postgr.es/m/18e7f2a5167.fe36253866818.977923893562469143@zohocorp.com
Commit 8c16ad3b43 created a new appendix for syncfs(), which is
excessive for such a small amount of content. This commit moves
the description of the caveats to be aware of when using syncfs()
back to the documentation for recovery_init_sync_method. The
documentation for the other utilities with syncfs() support now
directs readers to recovery_init_sync_method for information about
these caveats.
Reported-by: Peter Eisentraut, Robert Haas
Suggested-by: Robert Haas
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/42804669-7063-1320-ed37-3226d5f1067d%40eisentraut.org
Discussion: https://postgr.es/m/CA%2BTgmobUiqKr%2BZMCLc5Qap-sXBnjfGUU%2BZBmzYEjUuWyjsGr1g%40mail.gmail.com
The user-facing name is "Other Platforms and Clients", but the
internal name seems too focused on clients specifically, especially
given the plan to add a new setting to this session that is about
platform or deployment model compatibility rather than client
compatibility.
Jelte Fennema-Nio
Discussion: http://postgr.es/m/CAGECzQTfMbDiM6W3av+3weSnHxJvPmuTEcjxVvSt91sQBdOxuQ@mail.gmail.com
Buildfarm member avocet has shown a plan change by switching the
finalize aggregate stage to use a GroupAggregate rather than a
HashAggregate. This is consistent with autovacuum having triggered on
the table, per analysis by Alexander Lakhin.
Fix this by disabling autovacuum on the table.
Reported-by: Alexander Lakhin
Discussion: https://postgr.es/m/d4493a28-589a-5328-fed5-250f2d7d3e2a@gmail.com
Backpatch-through: 16, where this test was added.
This adds 3 new variants of the random() function:
random(min integer, max integer) returns integer
random(min bigint, max bigint) returns bigint
random(min numeric, max numeric) returns numeric
Each returns a random number x in the range min <= x <= max.
For the numeric function, the number of digits after the decimal point
is equal to the number of digits that "min" or "max" has after the
decimal point, whichever has more.
The main entry points for these functions are in a new C source file.
The existing random(), random_normal(), and setseed() functions are
moved there too, so that they can all share the same PRNG state, which
is kept private to that file.
Dean Rasheed, reviewed by Jian He, David Zhang, Aleksander Alekseev,
and Tomas Vondra.
Discussion: https://postgr.es/m/CAEZATCV89Vxuq93xQdmc0t-0Y2zeeNQTdsjbmV7dyFBPykbV4Q@mail.gmail.com
After the upgrade, the failed test was ensuring that the changes made on
the publisher should be replicated to the subscriber. We missed waiting
for one of the subscriptions to catch up.
Per buildfarm
Author: Vignesh C
Reviewed-by: Kuroda Hayato
Discussion: https://postgr.es/m/CALDaNm0z=fLtio1h50K8WossUGXU+gy0H9y9=RYh1DDZiq2EDw@mail.gmail.com
Previously, the DSA segment size always started with 1MB and grew up
to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios,
such as when the caller desired a soft constraint on the total DSA
segment size, limiting it to less than 1MB.
This commit introduces the capability to specify the initial and
maximum DSA segment sizes when creating a DSA area, providing more
flexibility and control over memory usage.
Reviewed-by: John Naylor, Tomas Vondra
Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com
The newly-introduced "one_by_one" label produces -Wunused-label
warnings when building without SIMD support. To fix, move the
label into the SIMD section of this function.
Oversight in commit 7644a7340c.
Reported-by: Tom Lane
Discussion: https://postgr.es/m/3189995.1711495704%40sss.pgh.pa.us
Commit e2fa76d80 centralized the responsibility for doing
set_cheapest() for a baserel, but these functions added later
seemingly didn't get the memo. There's no apparent reason why
we need the cheapest path for these relation types to be available
any sooner than it is for other base relation types, so delete the
duplicate calls. Doesn't save much since there's only one path
in these cases, but it might improve clarity.
Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-KFEU_fDuJPNCOkUu3rwvZvKBEytkd9VrM4kH4-2h1CQ@mail.gmail.com
When the list of roles gathered by roles_is_member_of() grows very
large, a Bloom filter is created to help avoid some linear searches
through the list. The threshold for creating the Bloom filter is
set arbitrarily high and may require future adjustment.
Suggested-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAGvXd3OSMbJQwOSc-Tq-Ro1CAz%3DvggErdSG7pv2s6vmmTOLJSg%40mail.gmail.com
Ordinary ALTER TABLE SET SCHEMA will also move any owned sequences
into the new schema. We failed to do likewise for foreign tables,
because AlterTableNamespaceInternal believed that only certain
relkinds could have indexes, owned sequences, or constraints.
We could simply add foreign tables to that relkind list, but it
seems likely that the same oversight could be made again in
future. Instead let's remove the relkind filter altogether.
These functions shouldn't cost much when there are no objects
that they need to process, and surely this isn't an especially
performance-critical case anyway.
Per bug #18407 from Vidushi Gupta. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/18407-4fd07373d252c6a0@postgresql.org
This commit improves the performance of pg_lfind32() in many cases
by modifying it to process the remaining "tail" of elements with
SIMD instructions instead of processing them one-by-one. Since the
SIMD code processes a large block of elements, this means that we
will process a subset of elements more than once, but that won't
affect the correctness of the result, and testing has shown that
this helps more cases than it regresses. With this change, the
standard one-by-one linear search code is only used for small
arrays and for platforms without SIMD support.
Suggested-by: John Naylor
Reviewed-by: John Naylor
Discussion: https://postgr.es/m/20231129171526.GA857928%40nathanxps13
If we know the sort order of a CTE's output, and it is relevant
to the outer query, label the CTE's outer-query access path using
those pathkeys. This may enable optimizations such as avoiding
a sort in the outer query.
The code for hoisting pathkeys into the outer query already exists
for regular RTE_SUBQUERY subqueries, but it wasn't getting used for
CTEs, possibly out of concern for maintaining an optimization fence
between the CTE and the outer query. However, on the same arguments
used for commit f7816aec2, there seems no harm in letting the outer
query know what the inner query decided to do.
In support of this, we now remember the best Path as well as Plan
for each subquery for the rest of the planner run. There may be
future applications for having that at hand, and it surely costs
little to build one more List.
Richard Guo (minor mods by me)
Discussion: https://postgr.es/m/CAMbWs49xYd3f8CrE8-WW3--dV1zH_sDSDn-vs2DzHj81Wcnsew@mail.gmail.com
The musl dynamic linker saves a pointer to the process' environment
value of LD_LIBRARY_PATH very early in startup. When we move/clobber
the environment to make more room for ps status strings, we clobber
that value and thereby prevent libraries from being found via
LD_LIBRARY_PATH, which breaks the use of a temporary installation
for testing purposes. To fix, stop collecting usable space for
ps status if we notice that the variable we are about to clobber
is LD_LIBRARY_PATH. This will result in some reduction in how long
the ps status can be, but it's only likely to occur in temporary
test contexts, so it doesn't seem like a big problem. In any case,
we don't have to do it if we see we are on glibc, which surely is
where the majority of our Linux testing is done.
Thomas Munro, Bruce Momjian, and Tom Lane, per report from Wolfgang
Walther. Back-patch to all supported branches, with the hope that
we'll set up a buildfarm animal to test on this platform.
Discussion: https://postgr.es/m/fddd1cd6-dc16-40a2-9eb5-d7fef2101488@technowledgy.de
ObjectClass is an enum whose values correspond to catalog OIDs. But
the extra layer of redirection, which is used only in small parts of
the code, and the similarity to ObjectType, are confusing and
cumbersome.
One advantage has been that some switches processing the OCLASS enum
don't have "default:" cases. This is so that the compiler tells us
when we fail to add support for some new object class. But you can
also handle that with some assertions and proper test coverage. It's
not even clear how strong this benefit is. For example, in
AlterObjectNamespace_oid(), you could still put a new OCLASS into the
"ignore object types that don't have schema-qualified names" case, and
it might or might not be wrong. Also, there are already various
OCLASS switches that do have a default case, so it's not even clear
what the preferred coding style should be.
Reviewed-by: jian he <jian.universality@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://www.postgresql.org/message-id/flat/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw%40mail.gmail.com
Since 82a4edabd27 we can bulk extend relations. The bulk relation extension
logic has a heuristic component. Normally the heurstic does not trigger in the
occasionally-failing test case, as the relation is only extended once. But
with very small shared_buffers the limits for the number of buffers pinned at
once prevent the extension from happening at once. With the second "bulk"
extension, the heuristic kicks in, and the relation ends up one block bigger.
That's ok from a correctness perspective, but changes the results of the test
query due to one additional block.
We discussed a few more expansive fixes, but for now have decided to avoid
this by making the table a bit smaller.
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by:
Discussion: https://postgr.es/m/29c74104-210b-ef39-2522-27a6aa7a704f@iki.fi
Discussion: https://postgr.es/m/20230916000011.2ugpkkkp7bpp4cfh@awork3.anarazel.de
Backpatch: 16-, where the new relation extension logic was added