The new partitioned table capability added a new relkind, namely
RELKIND_PARTITIONED_TABLE. Update fireRIRrules() to apply RLS
policies on RELKIND_PARTITIONED_TABLE as it does RELKIND_RELATION.
In addition, add RLS regression test coverage for partitioned tables.
Issue raised by Fakhroutdinov Evgenievich and patch by Mike Palmiotto.
Regression test editorializing by me.
Discussion: https://postgr.es/m/flat/20170601065959.1486.69906@wrigleys.postgresql.org
This reverts commit 56b6ef893fee9e9bf47d927a02f4d1ea911f4d9c and instead
makes vcregress.pl parse out PROVE_FLAGS from a command line argument
when doing a TAP test, thus making it consistent with the makefile
treatment.
Discussion: https://postgr.es/m/c26a7416-2fb9-34ab-7991-618c922f896e%402ndquadrant.com
Backpatch to 9.4 like previous patch.
When a table is removed from a subscription before the tablesync worker
could start, this would previously result in an error when reading
pg_subscription_rel. Now we just ignore this.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
If you accidentally pass a host name in the hostaddr option, e.g.
hostaddr=localhost, you get an error like:
psql: could not translate host name "localhost" to address: Name or service not known
That's a bit confusing, because it implies that we tried to look up
"localhost" in DNS, but it failed. To make it more clear that we tried to
parse "localhost" as a numeric network address, change the message to:
psql: could not parse network address "localhost": Name or service not known
Discussion: https://www.postgresql.org/message-id/10badbc6-4d5a-a769-623a-f7ada43e14dd@iki.fi
Previously the exit handling was only able to exit from within the
main loop, and not from within the backend code it calls. Fix that by
using the standard die() SIGTERM handler, and adding the necessary
CHECK_FOR_INTERRUPTS() call.
This requires adding yet another process-type-specific branch to
ProcessInterrupts(), which hints that we probably should generalize
that handling. But that's work for another day.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/fe072153-babd-3b5d-8052-73527a6eb657@2ndquadrant.com
Since 7c4f52409a8c (merged in v10), a shutdown master is reported as
FATAL: unexpected result after CommandComplete: server closed the connection unexpectedly
by walsender. It used to be
LOG: replication terminated by primary server
FATAL: could not send end-of-streaming message to primary: no COPY in progress
while the old message clearly is not perfect, it's definitely better
than what's reported now.
The change comes from the attempt to handle finished COPYs without
erroring out, needed for the new logical replication, which wasn't
needed before.
There's probably better ways to handle this, but for now just
explicitly check for a closed connection.
Author: Petr Jelinek
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/f7c7dd08-855c-e4ed-41f4-d064a6c0665a@2ndquadrant.com
Backpatch: -
Doing a cross-version upgrade test with test.sh evidently hasn't been
tested since circa 9.2, because the script lacked case branches for
old-version servers newer than 9.1. Future-proof that a bit, and
clean up breakage induced by our recent drop of V0 function call
protocol (namely that oldstyle_length() isn't in the regression
suite anymore).
(This isn't enough to make the test work perfectly cleanly across
versions, but at least it finishes and provides dump files that
you can diff manually. One issue I didn't touch is that we might
want to execute the "reindex_hash.sql" file in the new DB before
dumping it, so that the hash indexes don't vanish from the dump.)
Improve the TESTING doc file: put the tl;dr version at the top not
the bottom, and bring its explanation of how to run a cross-version
test up to speed, since the installcheck target isn't there and won't
be resurrected. Improve the comment in the Makefile about why not.
In passing, teach .gitignore and "make clean" about a couple more
junk output files.
Discussion: https://postgr.es/m/14058.1496892482@sss.pgh.pa.us
Most of the improvements were in the new SCRAM code:
* In SCRAM protocol violation messages, use errdetail to provide the
details.
* If pg_backend_random() fails, throw an ERROR rather than just LOG. We
shouldn't continue authentication if we can't generate a random nonce.
* Use ereport() rather than elog() for the "invalid SCRAM verifier"
messages. They shouldn't happen, if everything works, but it's not
inconceivable that someone would have invalid scram verifiers in
pg_authid, e.g. if a broken client application was used to generate the
verifier.
But this change applied to old code:
* Use ERROR rather than COMMERROR for protocol violation errors. There's
no reason to not tell the client what they did wrong. The client might be
confused already, so that it cannot read and display the error correctly,
but let's at least try. In the "invalid password packet size" case, we
used to actually continue with authentication anyway, but that is now a
hard error.
Patch by Michael Paquier and me. Thanks to Daniel Varrazzo for spotting
the typo in one of the messages that spurred the discussion and these
larger changes.
Discussion: https://www.postgresql.org/message-id/CA%2Bmi_8aZYLhuyQi1Jo0hO19opNZ2OEATEOM5fKApH7P6zTOZGg%40mail.gmail.com
Commit 15ce775 changed tuple-routing constraint checking logic.
This affects the expected output for contrib/sepgsql, because
there's no longer LOG entries reporting allowance of int4eq()
execution. Per buildfarm.
Clarify in the syntax synopsis that partition bound values must be
exactly numeric literals or string literals; previously it
said "bound_literal" which was defined nowhere.
Replace confusing --- and, I think, incorrect in detail --- definition
of how range bounds work with a reference to row-wise comparison plus
a concrete example (which I stole from Robert Haas).
Minor copy-editing in the same area.
Discussion: https://postgr.es/m/30475.1496005465@sss.pgh.pa.us
Discussion: https://postgr.es/m/28106.1496041449@sss.pgh.pa.us
Commit f039eaac7131ef2a4cf63a10cf98486f8bcd09d2, later back-patched
with commit 1b812afb0eafe125b820cc3b95e7ca03821aa675, allowed many of
the queries issued by postgres_fdw to fetch remote data to respond to
cancel interrupts in a timely fashion. However, it didn't do anything
about the transaction control commands, which remained
noninterruptible.
Improve the situation by changing do_sql_command() to retrieve query
results using pgfdw_get_result(), which uses the asynchronous
interface to libpq so that it can check for interrupts every time
libpq returns control. Since this might result in a situation
where we can no longer be sure that the remote transaction state
matches the local transaction state, add a facility to force all
levels of the local transaction to abort if we've lost track of
the remote state; without this, an apparently-successful commit of
the local transaction might fail to commit changes made on the
remote side. Also, add a 60-second timeout for queries issue during
transaction abort; if that expires, give up and mark the state of
the connection as unknown. Drop all such connections when we exit
the local transaction. Together, these changes mean that if we're
aborting the local toplevel transaction anyway, we can just drop the
remote connection in lieu of waiting (possibly for a very long time)
for it to complete an abort.
This still leaves quite a bit of room for improvement. PQcancel()
has no asynchronous interface, so if we get stuck sending the cancel
request we'll still hang. Also, PQsetnonblocking() is not used, which
means we could block uninterruptibly when sending a query. There
might be some other optimizations possible as well. Nonetheless,
this allows us to escape a wait for an unresponsive remote server
quickly in many more cases than previously.
Report by Suraj Kharage. Patch by me and Rafia Sabih. Review
and testing by Amit Kapila and Tushar Ahuja.
Discussion: http://postgr.es/m/CAF1DzPU8Kx+fMXEbFoP289xtm3bz3t+ZfxhmKavr98Bh-C0TqQ@mail.gmail.com
A logical replication worker should not insert new rows into
pg_subscription_rel, only update existing rows, so that there are no
races if a concurrent refresh removes rows. Adjust the API to be able
to choose that behavior.
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Since tuple-routing implicitly checks the partitioning constraints
at least for the levels of the partitioning hierarchy it traverses,
there's normally no need to revalidate the partitioning constraint
after performing tuple routing. However, if there's a BEFORE trigger
on the target partition, it could modify the tuple, causing the
partitioning constraint to be violated. Catch that case.
Also, instead of checking the root table's partition constraint after
tuple-routing, check it beforehand. Otherwise, the rules for when
the partitioning constraint gets checked get too complicated, because
you sometimes have to check part of the constraint but not all of it.
This effectively reverts commit 39162b2030fb0a35a6bb28dc636b5a71b8df8d1c
in favor of a different approach altogether.
Report by me. Initial debugging by Jeevan Ladhe. Patch by Amit
Langote, reviewed by me.
Discussion: http://postgr.es/m/CA+Tgmoa9DTgeVOqopieV8d1QRpddmP65aCdxyjdYDoEO5pS5KA@mail.gmail.com
If authentication over an SSL connection fails, with sslmode=prefer,
libpq will reconnect without SSL and retry. However, we did not clear
the variables related to GSS, SSPI, and SASL authentication state, when
reconnecting. Because of that, the second authentication attempt would
always fail with a "duplicate GSS/SASL authentication request" error.
pg_SSPI_startup did not check for duplicate authentication requests like
the corresponding GSS and SASL functions, so with SSPI, you would leak
some memory instead.
Another way this could manifest itself, on version 10, is if you list
multiple hostnames in the "host" parameter. If the first server requests
Kerberos or SCRAM authentication, but it fails, the attempts to connect to
the other servers will also fail with "duplicate authentication request"
errors.
To fix, move the clearing of authentication state from closePGconn to
pgDropConnection, so that it is cleared also when re-connecting.
Patch by Michael Paquier, with some kibitzing by me.
Backpatch down to 9.3. 9.2 has the same bug, but the code around closing
the connection is somewhat different, so that this patch doesn't apply.
To fix this in 9.2, I think we would need to back-port commit 210eb9b743
first, and then apply this patch. However, given that we only bumped into
this in our own testing, we haven't heard any reports from users about
this, and that 9.2 will be end-of-lifed in a couple of months anyway, it
doesn't seem worth the risk and trouble.
Discussion: https://www.postgresql.org/message-id/CAB7nPqRuOUm0MyJaUy9L3eXYJU3AKCZ-0-03=-aDTZJGV4GyWw@mail.gmail.com
The logic to free the buffer after the gss_init_sec_context() call was
always a bit wonky. Because gss_init_sec_context() sets the GSS context
variable, conn->gctx, we would in fact always attempt to free the buffer.
That only works, because previously conn->ginbuf.value was initialized to
NULL, and free(NULL) is a no-op. Commit 61bf96cab0 refactored things so
that the GSS input token buffer is allocated locally in pg_GSS_continue,
and not held in the PGconn object. After that, the now-local ginbuf.value
variable isn't initialized when it's not used, so we pass a bogus pointer
to free().
To fix, only try to free the input buffer if we allocated it. That was the
intention, certainly after the refactoring, and probably even before that.
But because there's no live bug before the refactoring, I refrained from
backpatching this.
The bug was also independently reported by Graham Dutton, as bug #14690.
Patch reviewed by Michael Paquier.
Discussion: https://www.postgresql.org/message-id/6288d80e-a0bf-d4d3-4e12-7b79c77f1771%40iki.fi
Discussion: https://www.postgresql.org/message-id/20170605130954.1438.90535%40wrigleys.postgresql.org
The logical replication apply worker uses the subscription name as
application name, except for table sync. This was incorrectly set to
use the replication slot name, which might be different, in one case.
Also add a comment why the other case is different.
The larger part of this patch replaces usages of MyProc->procLatch
with MyLatch. The latter works even early during backend startup,
where MyProc->procLatch doesn't yet. While the affected code
shouldn't run in cases where it's not initialized, it might get copied
into places where it might. Using MyLatch is simpler and a bit faster
to boot, so there's little point to stick with the previous coding.
While doing so I noticed some weaknesses around newly introduced uses
of latches that could lead to missed events, and an omitted
CHECK_FOR_INTERRUPTS() call in worker_spi.
As all the actual bugs are in v10 code, there doesn't seem to be
sufficient reason to backpatch this.
Author: Andres Freund
Discussion:
https://postgr.es/m/20170606195321.sjmenrfgl2nu6j63@alap3.anarazel.dehttps://postgr.es/m/20170606210405.sim3yl6vpudhmufo@alap3.anarazel.de
Backpatch: -
Make apply busy wait check the catalog instead of shmem state to ensure
that next transaction will see the expected table synchronization state.
Also make the handover always go through same set of steps to make the
overall process easier to understand and debug.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Mark Kirkwood <mark.kirkwood@catalyst.net.nz>
Tested-by: Erik Rijkers <er@xs4all.nl>
Consistent with what we do for indexes, we shouldn't try to record
dependencies on collation OID 0 or the default collation OID (which
is pinned). Also, the fact that indcollation and partcollation can
contain zero OIDs when the data type is not collatable should be
documented.
Amit Langote, per a complaint from me.
Discussion: http://postgr.es/m/CA+Tgmoba5mtPgM3NKfG06vv8na5gGbVOj0h4zvivXQwLw8wXXQ@mail.gmail.com
This allows to cancel commands run over replication connections. While
it might have some use before v10, it has become important now that
normal SQL commands are allowed in database connected walsender
connections.
Author: Petr Jelinek
Reviewed-By: Andres Freund, Michael Paquier
Discussion: https://postgr.es/m/7966f454-7cd7-2b0c-8b70-cdca9d5a8c97@2ndquadrant.com
Because walsender and normal backends share the same main loop it's
problematic to have two different flag variables, set in signal
handlers, indicating a pending configuration reload. Only certain
walsender commands reach code paths checking for the
variable (START_[LOGICAL_]REPLICATION, CREATE_REPLICATION_SLOT
... LOGICAL, notably not base backups).
This is a bug present since the introduction of walsender, but has
gotten worse in releases since then which allow walsender to do more.
A later patch, not slated for v10, will similarly unify SIGHUP
handling in other types of processes as well.
Author: Petr Jelinek, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170423235941.qosiuoyqprq4nu7v@alap3.anarazel.de
Backpatch: 9.2-, bug is present since 9.0
When the checkpointer writes the shutdown checkpoint, it checks
afterwards whether any WAL has been written since it started and
throws a PANIC if so. At that point, only walsenders are still
active, so one might think this could not happen, but walsenders can
also generate WAL, for instance in BASE_BACKUP and logical decoding
related commands (e.g. via hint bits). So they can trigger this panic
if such a command is run while the shutdown checkpoint is being
written.
To fix this, divide the walsender shutdown into two phases. First,
checkpointer, itself triggered by postmaster, sends a
PROCSIG_WALSND_INIT_STOPPING signal to all walsenders. If the backend
is idle or runs an SQL query this causes the backend to shutdown, if
logical replication is in progress all existing WAL records are
processed followed by a shutdown. Otherwise this causes the walsender
to switch to the "stopping" state. In this state, the walsender will
reject any further replication commands. The checkpointer begins the
shutdown checkpoint once all walsenders are confirmed as
stopping. When the shutdown checkpoint finishes, the postmaster sends
us SIGUSR2. This instructs walsender to send any outstanding WAL,
including the shutdown checkpoint record, wait for it to be replicated
to the standby, and then exit.
Author: Andres Freund, based on an earlier patch by Michael Paquier
Reported-By: Fujii Masao, Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
Backpatch: 9.4, where logical decoding was introduced
The non-participation in procsignal was a problem for both changes in
master, e.g. parallelism not working for normal statements run in
walsender backends, and older branches, e.g. recovery conflicts and
catchup interrupts not working for logical decoding walsenders.
This commit thus replaces the previous WalSndXLogSendHandler with
procsignal_sigusr1_handler. In branches since db0f6cad48 that can
lead to additional SetLatch calls, but that only rarely seems to make
a difference.
Author: Andres Freund
Reviewed-By: Michael Paquier
Discussion: https://postgr.es/m/20170421014030.fdzvvvbrz4nckrow@alap3.anarazel.de
Backpatch: 9.4, earlier commits don't seem to benefit sufficiently
This reverts commit 086221cf6b1727c2baed4703c582f657b7c5350e, which
was made to master only.
The approach implemented in the above commit has some issues. While
those could easily be fixed incrementally, doing so would make
backpatching considerably harder, so instead first revert this patch.
Discussion: https://postgr.es/m/20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
There was a grammar ambiguity between SET PUBLICATION name REFRESH and
SET PUBLICATION SKIP REFRESH, because SKIP is not a reserved word. To
resolve that, fold the refresh choice into the WITH options. Refreshing
is the default now.
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
Otherwise code that uses this will abort with an assertion failure,
because postmaster_alive_fds are not initialized.
Reported-by: tushar <tushar.ahuja@enterprisedb.com>
The current method of computing the record length (excluding the
lenght of full-page images) has been wrong since the WAL format has
been revamped in 2c03216d831160bedd72d45f712601b6f7d03f1c. Only the
main record's length was counted, but that can be significantly too
little if there's data associated with further blocks.
Fix by computing the record length as total_lenght - fpi_length.
Reported-By: Chen Huajun
Bug: #14687
Reviewed-By: Heikki Linnakangas
Discussion: https://postgr.es/m/20170603165939.1436.58887@wrigleys.postgresql.org
Backpatch: 9.5-
Declare the toc_nentry field as uint32 not Size. Since shm_toc_lookup()
reads the field without any lock, it has to be atomically readable, and
we do not assume that for fields wider than 32 bits. Performance would
be impossibly bad for entry counts approaching 2^32 anyway, so there is
no need to try to preserve maximum width here.
This is probably an academic issue, because even if reading int64 isn't
atomic, the high order half would never change in practice. Still, it's
a coding rule violation, so let's fix it.
Adjust some other not-terribly-well-chosen data types too, and copy-edit
some comments. Make shm_toc_attach's Asserts consistent with
shm_toc_create's.
None of this looks to be a live bug, so no need for back-patch.
Discussion: https://postgr.es/m/16984.1496679541@sss.pgh.pa.us
Some openssl builds put their lib files in a VC subdirectory, others do
not. Cater for both cases.
Backpatch to all live branches.
From an offline discussion with Leonardo Cecchi.
Given the possibility of race conditions and so on, it seems entirely
unsafe to just assume that shm_toc_lookup() always finds the key it's
looking for --- but that was exactly what all but one call site were
doing. To fix, add a "bool noError" argument, similarly to what we
have in many other functions, and throw an error on an unexpected
lookup failure. Remove now-redundant Asserts that a rather random
subset of call sites had.
I doubt this will throw any light on buildfarm member lorikeet's
recent failures, because if an unnoticed lookup failure were involved,
you'd kind of expect a null-pointer-dereference crash rather than the
observed symptom. But you never know ... and this is better coding
practice even if it never catches anything.
Discussion: https://postgr.es/m/9697.1496675981@sss.pgh.pa.us