We were being careless in some places about the order of -L switches in
link command lines, such that -L switches referring to external directories
could come before those referring to directories within the build tree.
This made it possible to accidentally link a system-supplied library, for
example /usr/lib/libpq.so, in place of the one built in the build tree.
Hilarity ensued, the more so the older the system-supplied library is.
To fix, break LDFLAGS into two parts, a sub-variable LDFLAGS_INTERNAL
and the main LDFLAGS variable, both of which are "recursively expanded"
so that they can be incrementally adjusted by different makefiles.
Establish a policy that -L switches for directories in the build tree
must always be added to LDFLAGS_INTERNAL, while -L switches for external
directories must always be added to LDFLAGS. This is sufficient to
ensure a safe search order. For simplicity, we typically also put -l
switches for the respective libraries into those same variables.
(Traditional make usage would have us put -l switches into LIBS, but
cleaning that up is a project for another day, as there's no clear
need for it.)
This turns out to also require separating SHLIB_LINK into two variables,
SHLIB_LINK and SHLIB_LINK_INTERNAL, with a similar rule about which
switches go into which variable. And likewise for PG_LIBS.
Although this change might appear to affect external users of pgxs.mk,
I think it doesn't; they shouldn't have any need to touch the _INTERNAL
variables.
In passing, tweak src/common/Makefile so that the value of CPPFLAGS
recorded in pg_config lacks "-DFRONTEND" and the recorded value of
LDFLAGS lacks "-L../../../src/common". Both of those things are
mistakes, apparently introduced during prior code rearrangements,
as old versions of pg_config don't print them. In general we don't
want anything that's specific to the src/common subdirectory to
appear in those outputs.
This is certainly a bug fix, but in view of the lack of field
complaints, I'm unsure whether it's worth the risk of back-patching.
In any case it seems wise to see what the buildfarm makes of it first.
Discussion: https://postgr.es/m/25214.1522604295@sss.pgh.pa.us
Some compilers are evidently pickier than others about whether Perl's
I32 typedef should be considered equivalent to int. Dodge the problem
by using a separate variable; the prior coding was a bit confusing anyway.
Per buildfarm. Note this does nothing to fix the test failures due to
SV_to_JsonbValue not covering enough variable types.
The prefix operator along with SP-GiST indexes can be used as an alternative
for LIKE 'word%' commands and it doesn't have a limitation of string/prefix
length as B-Tree has.
Bump catalog version
Author: Ildus Kurbangaliev with some editorization by me
Review by: Arthur Zakirov, Alexander Korotkov, and me
Discussion: https://www.postgresql.org/message-id/flat/20180202180327.222b04b3@wp.localdomain
Add a new contrib module jsonb_plperl that provides a transform between
jsonb and PL/Perl. jsonb values are converted to appropriate Perl types
such as arrays and hashes, and vice versa.
Author: Anthony Bykov <a.bykov@postgrespro.ru>
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Reviewed-by: Aleksander Alekseev <a.alekseev@postgrespro.ru>
Reviewed-by: Nikita Glukhov <n.gluhov@postgrespro.ru>
When base backups are run over the replication protocol (for example
using pg_basebackup), verify the checksums of all data blocks if
checksums are enabled. If checksum failures are encountered, log them
as warnings but don't abort the backup.
This becomes the default behaviour in pg_basebackup (provided checksums
are enabled on the server), so add a switch (-k) to disable the checks
if necessary.
Author: Michael Banck
Reviewed-By: Magnus Hagander, David Steele
Discussion: https://postgr.es/m/20180228180856.GE13784@nighthawk.caipicrew.dd-dns.de
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewer: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.comhttps://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
MERGE performs actions that modify rows in the target table
using a source table or query. MERGE provides a single SQL
statement that can conditionally INSERT/UPDATE/DELETE rows
a task that would other require multiple PL statements.
e.g.
MERGE INTO target AS t
USING source AS s
ON t.tid = s.sid
WHEN MATCHED AND t.balance > s.delta THEN
UPDATE SET balance = t.balance - s.delta
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED AND s.delta > 0 THEN
INSERT VALUES (s.sid, s.delta)
WHEN NOT MATCHED THEN
DO NOTHING;
MERGE works with regular and partitioned tables, including
column and row security enforcement, as well as support for
row, statement and transition triggers.
MERGE is optimized for OLTP and is parameterizable, though
also useful for large scale ETL/ELT. MERGE is not intended
to be used in preference to existing single SQL commands
for INSERT, UPDATE or DELETE since there is some overhead.
MERGE can be used statically from PL/pgSQL.
MERGE does not yet support inheritance, write rules,
RETURNING clauses, updatable views or foreign tables.
MERGE follows SQL Standard per the most recent SQL:2016.
Includes full tests and documentation, including full
isolation tests to demonstrate the concurrent behavior.
This version written from scratch in 2017 by Simon Riggs,
using docs and tests originally written in 2009. Later work
from Pavan Deolasee has been both complex and deep, leaving
the lead author credit now in his hands.
Extensive discussion of concurrency from Peter Geoghegan,
with thanks for the time and effort contributed.
Various issues reported via sqlsmith by Andreas Seltenreich
Authors: Pavan Deolasee, Simon Riggs
Reviewers: Peter Geoghegan, Amit Langote, Tomas Vondra, Simon Riggs
Discussion:
https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.comhttps://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com
Coverity complained about possible buffer overrun in two places added by
commit 1eb6d6527, and AFAICS it's reasonable to worry: even granting that
the WAL originator properly truncated the commit GID to GIDSIZE, we should
not really bet our lives on that having the same value as it does in the
current build. Hence, use strlcpy() not strcpy(), and adjust the pointer
advancement logic to be sure we skip over the whole source string even if
strlcpy() truncated it.
These tests don't work reliably with pre-2.6 Python versions, since
Python code like float('inf') was not guaranteed to work before that,
even granting an IEEE-compliant platform.
Since there's no explicit handling of these cases in jsonb_plpython,
we're not adding any real code coverage by testing them, and thus
it doesn't seem to make sense to go to any great lengths to work
around the test instability.
Discussion: https://postgr.es/m/E1f1AMU-00031c-9N@gemulon.postgresql.org
We already do this for Perl and some other interesting tools, so it
seems sensible to do it for Python as well, especially since the
sub-release number is never determinable from other configure output
and even the major/minor numbers may not be clear without excavation
in config.log.
While at it, get rid of the code's assumption that both the major and
minor numbers contain exactly one digit. That will foreseeably be
broken by Python 3.10 in perhaps four or five years. That's far enough
out that we probably don't need to back-patch this.
Discussion: https://postgr.es/m/2186.1522681145@sss.pgh.pa.us
Recent commit 8a3d9425 has introduced be-secure-common.c, which is aimed
at including backend-side APIs that can be used by any SSL
implementation. The purpose is similar to fe-secure-common.c for the
frontend-side APIs.
However, this has forgotten to include check_ssl_key_file_permissions()
in the move, which causes a double dependency between be-secure.c and
be-secure-openssl.c.
Refactor the code in a more logical way. This also puts into light an
API which is usable by future SSL implementations for permissions on SSL
key files.
Author: Michael Paquier <michael@paquier.xyz>
Since commit 7012b132d07c2b4ea15b0b3cb1ea9f3278801d98, postgres_fdw
has been able to push down the toplevel aggregation operation to the
remote server. Commit e2f1eb0ee30d144628ab523432320f174a2c8966 made
it possible to break down the toplevel aggregation into one
aggregate per partition. This commit lets postgres_fdw push down
aggregation in that case just as it does at the top level.
In order to make this work, this commit adds an additional argument
to the GetForeignUpperPaths FDW API. A matching argument is added
to the signature for create_upper_paths_hook. Third-party code using
either of these will need to be updated.
Also adjust create_foreignscan_plan() so that it picks up the correct
set of relids in this case.
Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some
adjustments by me. The larger patch series of which this patch is a
part was also reviewed and tested by Antonin Houska, Rajkumar
Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal
Legrand, and Rafia Sabih.
Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com
Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
round() is from C99. Use rint() instead. There are behavioral
differences between round() and rint(), but they should not matter to
the Bloom filter optimal_k() function. We already assume POSIX
behavior for rint(), so there is no question of rint() not using
"rounds towards nearest" as its rounding mode.
Cleanup from commit 51bc271790eb234a1ba4d14d3e6530f70de92ab5.
Per buildfarm member thrips.
Author: Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzn76eCGUonARy-wrVtMHsf+4cvbK_oJAWTLfORTU5ki0w@mail.gmail.com
Add a new, optional, capability to bt_index_check() and
bt_index_parent_check(): check that each heap tuple that should have an
index entry does in fact have one. The extra checking is performed at
the end of the existing nbtree checks.
This is implemented by using a Bloom filter data structure. The
implementation performs set membership tests within a callback (the same
type of callback that each index AM registers for CREATE INDEX). The
Bloom filter is populated during the initial index verification scan.
Reusing the CREATE INDEX infrastructure allows the new verification
option to automatically benefit from the heap consistency checks that
CREATE INDEX already performs. CREATE INDEX does thorough sanity
checking of HOT chains, so the new check actually manages to detect
problems in heap-only tuples.
Author: Peter Geoghegan
Reviewed-By: Pavan Deolasee, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership. Callers will sometimes incur false
positives, but never false negatives. The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.
Two classic applications of Bloom filters are cache filtering, and data
synchronization testing. Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
This commit adds a test harness extension module, test_bloomfilter. It
can be used to get a sense of how the Bloom filter implementation
performs under varying conditions.
This is infrastructure for the upcoming "heapallindexed" amcheck patch,
which verifies the consistency of a heap relation against one of its
indexes.
Author: Peter Geoghegan
Reviewed-By: Andrey Borodin, Michael Paquier, Thomas Munro, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g@mail.gmail.com
Avoid storing the result of PQsocket() in a pgsocket variable; it's
declared as int, and the no-socket test is properly written as "x < 0"
not "x == PGINVALID_SOCKET". This accidentally had no bad effect
because we never got to init_slot() with a bad connection, but it's
still wrong.
Actually, it seems like we should avoid storing the result for a long
period at all. The function's not so expensive that it's worth avoiding,
and the existing coding technique here would fail if anyone tried to
PQreset the connection during the life of the program. Hence, just
re-call PQsocket every time we construct a select(2) mask.
Speaking of select(), GetIdleSlot imagined that it could compute the
select mask once and continue to use it over multiple calls to
select_loop(), which is pretty bogus since that would stomp on the
mask on return. This could only matter if the function's outer loop
iterated more than once, which is unlikely (it'd take some connection
receiving data, but not enough to complete its command). But if it
did happen, we'd acquire "tunnel vision" and stop watching the other
connections for query termination, with the effect of losing parallelism.
Another way in which GetIdleSlot could lose parallelism is that once
PQisBusy returns false, it would lock in on that connection and do
PQgetResult until that returns NULL; in some cases that could result
in blocking. (Perhaps this can never happen in vacuumdb due to the
limited set of commands that it can issue, but I'm not quite sure
of that, and even if true today it's not a future-proof assumption.)
Refactor the code to do that properly, so that it risks blocking in
PQgetResult only in cases where we need to wait anyway.
Another loss-of-parallelism problem, which *is* easily demonstrable,
is that any setup queries issued during prepare_vacuum_command() were
always issued on the last-to-be-created connection, whether or not
that was idle. Long-running operations on that connection thus
prevented issuance of additional operations on the other ones, except
in the limited cases where no preparatory query was needed. Instead,
wait till we've identified a free connection and use that one.
Also, avoid core dump due to undersized malloc request in the case
that no tables are identified to be vacuumed.
The bogus no-socket test was noted by CharSyam, the other problems
identified in my own code review. Back-patch to 9.5 where parallel
vacuumdb was introduced.
Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com
Compilation failed for lack of an #ifdef on builds without
pg_strong_random(). Also fix relevant error messages to meet
project style guidelines.
Fabien Coelho, further adjusted by me
Discussion: https://postgr.es/m/32390.1522464534@sss.pgh.pa.us
So far as I can find, NI_MAXHOST isn't actually required anywhere by
POSIX. Nonetheless, commit 9a895462d supposed that it could rely on
having that symbol without any ceremony at all. We do have a hack
for providing it if the platform doesn't, in getaddrinfo.h, so fix
the problem by #including that file. Per buildfarm.
In 9956ddc19164b02dc1925fb389a1af77472eba5e, ten years ago, the
current objfile.txt based linking model was introduced. It's time to
retire the old SUBSYS.o based model.
This primarily is pertinent because the bitcode files for LLVM based
inlining are not produced when using PARTIAL_LINKING. It does not seem
worth to fix PARTIAL_LINKING to support that.
Author: Andres Freund
Discussion: https://postgr.es/m/20180121204356.d5oeu34jetqhmdv2@alap3.anarazel.de
LockViewRecurese() obtains view relation using heap_open() and passes
it to get_view_query() to get view info. It immediately closes the
relation then uses the returned view info by calling
LockViewRecurse_walker(). Since get_view_query() returns a pointer
within the relcache, the relcache should be kept until
LockViewRecurse_walker() returns. Otherwise the relation could point
to a garbage memory area.
Fix is moving the heap_close() call after LockViewRecurse_walker().
Problem reported by Tom Lane (buildfarm is unhappy, especially prion
since it enables -DRELCACHE_FORCE_RELEASE cpp flag), fix by me.
A followup patch will add a SKIP_LOCKED option. To avoid introducing
evermore arguments, breaking existing callers each time, introduce a
flags argument. This'll no doubt break a few external users...
Also change the MISSING_OK behaviour so a DEBUG1 debug message is
emitted when a relation is not found.
Author: Nathan Bossart
Reviewed-By: Michael Paquier and Andres Freund
Discussion: https://postgr.es/m/20180306005349.b65whmvj7z6hbe2y@alap3.anarazel.de
Previously there was no way in the standby side to find out the host and port
of the sender server that the walreceiver was currently connected to when
multiple hosts and ports were specified in primary_conninfo. For that purpose,
this patch adds sender_host and sender_port columns into pg_stat_wal_receiver
view. They report the host and port that the active replication connection
currently uses.
Bump catalog version.
Author: Haribabu Kommi
Reviewed-by: Michael Paquier and me
Discussion: https://postgr.es/m/CAJrrPGcV_aq8=cdqkFhVDJKEnDQ70yRTTdY9RODzMnXNrCz2Ow@mail.gmail.com
Richard Yen reported that pg_upgrade failed if the target cluster had
force_parallel_mode = on, because binary_upgrade_create_empty_extension()
is marked parallel restricted, allowing it to be executed in parallel
mode, which complains because it tries to acquire an XID.
In general, no function that might try to modify database data should
be considered parallel safe or restricted, since execution of it might
force XID acquisition. We found several other examples of this mistake.
Furthermore, functions that execute user-supplied SQL queries or query
fragments, or pull data from user-supplied cursors, had better be marked
both volatile and parallel unsafe, because we don't know what the supplied
query or cursor might try to do. There were several tsquery and XML
functions that had the wrong proparallel marking for this, and some of
them were even mislabeled as to volatility.
All these bugs are old, dating back to 9.6 for the proparallel mistakes
and much further for the provolatile mistakes. We can't force a
catversion bump in the back branches, but we can at least ensure that
installations initdb'd in future have the right values.
Thomas Munro and Tom Lane
Discussion: https://postgr.es/m/CAEepm=2sNDScSLTfyMYu32Q=ob98ZGW-vM_2oLxinzSABGQ6VA@mail.gmail.com
In the previous coding, skipped pages were mostly zeroes, but they still
had valid WAL page headers. That makes them very much less compressible
than an unbroken string of zeroes would be --- about 10X worse for bzip2
compression, for instance. We don't need those headers, so tweak the logic
so that we zero them out.
Chapman Flack, reviewed by Daniel Gustafsson
Discussion: https://postgr.es/m/579297F8.7020107@anastigmatix.net
When SSI was developed, slru.c was limited to segment files with names in
the range 0000-FFFF. This didn't allow enough space for predicate.c to
store every possible XID when spilling old transactions to disk, so it
would wrap around sooner and print warnings. Since commits 638cf09e and
73c986ad increased the number of segment files slru.c could manage, that
behavior is unnecessary. Therefore remove that code.
Also remove the macro OldSerXidSegment, which has been unused since
4cd3fb6e.
Thomas Munro, reviewed by Anastasia Lubennikova
Discussion: https://postgr.es/m/CAEepm=3XfsTSxgEbEOmxu0QDiXy0o18NUg2nC89JZcCGE+XFPA@mail.gmail.com
Add the target context's name to the errdetail field of "out of memory"
errors in mcxt.c. Per discussion, this seems likely to be useful to
help narrow down the cause of a reported failure, and it costs little.
Also, now that context names are required to be compile-time constants
in all cases, there's little reason to be concerned about security
issues from exposing these names to users. (Because of such concerns,
we are *not* including the context "ident" field.)
In passing, add unlikely() markers to the allocation-failed tests,
just to be sure the compiler is on the right page about that.
Also, in palloc and friends, copy CurrentMemoryContext into a local
variable, as that's almost surely cheaper to reference than a global.
Discussion: https://postgr.es/m/1099.1522285628@sss.pgh.pa.us
In btree and SP-GiST indexes, move the responsibility for calling
IndexFreeSpaceMapVacuum from the vacuumcleanup phase to the bulkdelete
phase, and do it if and only if we found some pages that could be put into
FSM. As in commit 851a26e26, the idea is to make free pages visible to FSM
searchers sooner when vacuuming very large tables (large enough to need
multiple bulkdelete scans). This adds more redundant work than that commit
did, since we have to scan the entire index FSM each time rather than being
able to localize what needs to be updated; but it still seems worthwhile.
However, we can buy something back by not touching the FSM at all when
there are no pages that can be put in it. That will result in slower
recovery from corrupt upper FSM pages in such a scenario, but it doesn't
seem like that's a case we need to optimize for.
Hash indexes don't use FSM at all. GIN, GiST, and bloom indexes update
FSM during the vacuumcleanup phase not bulkdelete, so that doing something
comparable to this would be a much more invasive change, and it's not clear
it's worth it. BRIN indexes do things sufficiently differently that this
change doesn't apply to them, either.
Claudio Freire, reviewed by Masahiko Sawada and Jing Wang, some additional
tweaks by me
Discussion: https://postgr.es/m/CAGTBQpYR0uJCNTt3M5GOzBRHo+-GccNO1nCaQ8yEJmZKSW5q1A@mail.gmail.com
Unlike the previous coding, this might result in a Gather per Append
subplan when the target list is parallel-restricted, but such a plan
is probably worth considering in that case, since a single Gather
on top of the entire Append is impossible.
Per Andres Freund and the buildfarm.
Discussion: http://postgr.es/m/20180330050351.bmxx4cdtz67czjda@alap3.anarazel.de
Predicate locks are used on per page basis only if fastupdate = off, in
opposite case predicate lock on pending list will effectively lock whole index,
to reduce locking overhead, just lock a relation. Entry and posting trees are
essentially B-tree, so locks are acquired on leaf pages only.
Author: Shubham Barai with some editorization by me and Dmitry Ivanov
Review by: Alexander Korotkov, Dmitry Ivanov, Fedor Sigaev
Discussion: https://www.postgresql.org/message-id/flat/CALxAEPt5sWW+EwTaKUGFL5_XFcZ0MuGBcyJ70oqbWqr42YKR8Q@mail.gmail.com