Allow the cluster to be optionally init'd with read access for the
group.
This means a relatively non-privileged user can perform a backup of the
cluster without requiring write privileges, which enhances security.
The mode of PGDATA is used to determine whether group permissions are
enabled for directory and file creates. This method was chosen as it's
simple and works well for the various utilities that write into PGDATA.
Changing the mode of PGDATA manually will not automatically change the
mode of all the files contained therein. If the user would like to
enable group access on an existing cluster then changing the mode of all
the existing files will be required. Note that pg_upgrade will
automatically change the mode of all migrated files if the new cluster
is init'd with the -g option.
Tests are included for the backend and all the utilities which operate
on the PG data directory to ensure that the correct mode is set based on
the data directory permissions.
Author: David Steele <david@pgmasters.net>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
Consolidate directory and file create permissions for tools which work
with the PG data directory by adding a new module (common/file_perm.c)
that contains variables (pg_file_create_mode, pg_dir_create_mode) and
constants to initialize them (0600 for files and 0700 for directories).
Convert mkdir() calls in the backend to MakePGDirectory() if the
original call used default permissions (always the case for regular PG
directories).
Add tests to make sure permissions in PGDATA are set correctly by the
tools which modify the PG data directory.
Authors: David Steele <david@pgmasters.net>,
Adam Brightwell <adam.brightwell@crunchydata.com>
Reviewed-By: Michael Paquier, with discussion amongst many others.
Discussion: https://postgr.es/m/ad346fe6-b23e-59f1-ecb7-0e08390ad629%40pgmasters.net
This was nearly the same code. Extend wait_for_catchup to allow waiting
for pg_current_wal_lsn() and use that in the subscription tests. Also
change one use in the pg_rewind tests to use this.
Also remove some broken code in wait_for_catchup and
wait_for_slot_catchup. The error message in case the waiting failed
wanted to show the current LSN, but the way it was written never
worked. So since nobody ever cared, just remove it.
Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
This is only used in the pg_rewind tests, so only set it there. It's
better if other tests run closer to a default configuration.
Author: Michael Paquier <michael.paquier@gmail.com>
This moves the data directories from using temporary directories with
randomness in the directory name to a static name, to make it easier to
debug. The data directory will be retained if tests fail or the test
code dies/exits with failure, and is automatically removed on the next
make check.
If the environment variable PG_TEST_NOCLEAN is defined, the data
directories will be retained regardless of test or exit status.
Author: Daniel Gustafsson <daniel@yesql.se>
This module becomes much more useful if we allow it to be used as base
class for external projects. To achieve this, change the exported
get_new_node function into a class method instead, and use the standard
Perl idiom of accepting the class as first argument. This method works
as expected for subclasses. The standalone function is kept for
backwards compatibility, though it could be removed in pg11.
Author: Chap Flackman, based on an earlier patch from Craig Ringer
Discussion: https://postgr.es/m/CAMsr+YF8kO+4+K-_U4PtN==2FndJ+5Bn6A19XHhMiBykEwv0wA@mail.gmail.com
By default, Perl's split() function drops trailing empty fields,
which is not what we want here. Oversight in commit fb093e4cb.
We'd managed to miss it thus far thanks to the very limited usage
of this function.
Discussion: https://postgr.es/m/14837.1499029831@sss.pgh.pa.us
Add an optional "expected" argument to override the default assumption
that we're waiting for the query to return "t". This allows replacing
a handwritten polling loop in recovery/t/007_sync_rep.pl with use of
poll_query_until(); AFAICS that's the only remaining ad-hoc polling
loop in our TAP tests.
Change poll_query_until() to probe ten times per second not once per
second. Like some similar changes I've been making recently, the
one-second interval seems to be rooted in ancient traditions rather
than the actual likely wait duration on modern machines. I'd consider
reducing it further if there were a convenient way to spawn just one
psql for the whole loop rather than one per probe attempt.
Discussion: https://postgr.es/m/12486.1498938782@sss.pgh.pa.us
Several callers of PostgresNode::poll_query_until() neglected to check
for failure; I do not think that's optional. Also, rewrite one place
that had reinvented poll_query_until() for no very good reason.
By default, wal_retrieve_retry_interval is five seconds, which is far
more than is needed in any of our TAP tests, leaving the test cases
just twiddling their thumbs for significant stretches. Moreover,
because it's so large, we get basically no testing of the retry-before-
master-is-ready code path. Hence, make PostgresNode::init set up
wal_retrieve_retry_interval = '500ms' as part of its customization of
test clusters' postgresql.conf. This shaves quite a few seconds off
the runtime of the recovery TAP tests.
Back-patch into 9.6. We have wal_retrieve_retry_interval in 9.5,
but the test infrastructure isn't there.
Discussion: https://postgr.es/m/31624.1498500416@sss.pgh.pa.us
Per discussion, "location" is a rather vague term that could refer to
multiple concepts. "LSN" is an unambiguous term for WAL locations and
should be preferred. Some function names, view column names, and function
output argument names used "lsn" already, but others used "location",
as well as yet other terms such as "wal_position". Since we've already
renamed a lot of things in this area from "xlog" to "wal" for v10,
we may as well incur a bit more compatibility pain and make these names
all consistent.
David Rowley, minor additional docs hacking by me
Discussion: https://postgr.es/m/CAKJS1f8O0njDKe8ePFQ-LK5-EjwThsDws6ohJ-+c6nWK+oUxtg@mail.gmail.com
archive_command and restore_command need to refer to Windows paths, not
Msys virtual file system paths, as postgres is completely unaware of the
latter, so prefix them with the Windows path to the virtual file system
root. Clean psql and pg_recvlogical output of carriage returns.
PostgresNode blithely ignored the exit status of pg_ctl, and in general
made no effort to be sure that the server was running when it should be.
This caused it to miss server crashes, which is a serious shortcoming
in a test scaffold. Make it complain if pg_ctl fails, and modify the
start and stop logic to complain if the server doesn't start, or doesn't
stop, when expected.
Also, have it turn off the "restart_after_crash" configuration parameter
in created clusters, as bitter experience has shown that leaving that on
can mask crashes too.
We might at some point need variant functions that allow for, eg,
server start failure to be expected. But no existing test case appears
to want that, and it surely shouldn't be the default behavior.
Note that this *will* break the buildfarm, as it will expose known
bugs that the previous testing failed to. I'm committing it despite
that, to verify that we get the expected failures in the buildfarm
not just in manual testing.
Back-patch into 9.6 where PostgresNode was introduced. (The 9.6
branch is not expected to show any failures.)
Discussion: https://postgr.es/m/21432.1492886428@sss.pgh.pa.us
Although the documentation for append_conf said clearly that it didn't
add a newline, many test authors seem to have forgotten that ... or maybe
they just consulted the example at the top of the POD documentation,
which clearly shows adding a config entry without bothering to add a
trailing newline. The worst part of that is that it works, as long as
you don't do it more than once, since the backend isn't picky about
whether config files end with newlines. So there's not a strong forcing
function reminding test authors not to do it like that. Upshot is that
this is a terribly fragile way to go about things, and there's at least
one existing test case that is demonstrably broken and not testing what
it thinks it is.
Let's just make append_conf append a newline, instead; that is clearly
way safer than the old definition.
I also cleaned up a few call sites that were unnecessarily ugly.
(I left things alone in places where it's plausible that additional
config lines would need to be added someday.)
Back-patch the change in append_conf itself to 9.6 where it was added,
as having a definitional inconsistency between branches would obviously
be pretty hazardous for back-patching TAP tests. The other changes are
just cosmetic and don't need to be back-patched.
Discussion: https://postgr.es/m/19751.1492892376@sss.pgh.pa.us
The previous default 'pg_log' might have indicated by its "pg_" prefix
that it is an internal system directory. The new default is more in
line with the typical naming of directories with user-facing log files.
Together with the renaming of pg_clog and pg_xlog, this should clear up
that difference.
Author: Andreas Karlsson <andreas@proxel.se>
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Allows testing of logical decoding using SQL interface and/or pg_recvlogical
Most logical decoding tests are in contrib/test_decoding. This module
is for work that doesn't fit well there, like where server restarts
are required.
Craig Ringer
initdb now initializes a pg_hba.conf that allows replication connections
from the local host, same as it does for regular connections. The
connecting user still needs to have the REPLICATION attribute or be a
superuser.
The intent is to allow pg_basebackup from the local host to succeed
without requiring additional configuration.
Michael Paquier <michael.paquier@gmail.com> and me
Newer Perl or IPC::Run versions default to appending the filename to string
exceptions, e.g. the exception
psql timed out
is thrown as
psql timed out at /usr/share/perl5/vendor_perl/IPC/Run.pm line 2961.
To handle this, match exceptions with !~ rather than ne.
From: Craig Ringer <craig@2ndquadrant.com>
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Commit f82ec32ac30ae7e3ec7c84067192535b2ff8ec0e renamed the pg_xlog
directory to pg_wal. To make things consistent, and because "xlog" is
terrible terminology for either "transaction log" or "write-ahead log"
rename all SQL-callable functions that contain "xlog" in the name to
instead contain "wal". (Note that this may pose an upgrade hazard for
some users.)
Similarly, rename the xlog_position argument of the functions that
create slots to be called wal_position.
Discussion: https://www.postgresql.org/message-id/CA+Tgmob=YmA=H3DbW1YuOXnFVgBheRmyDkWcD9M8f=5bGWYEoQ@mail.gmail.com
This changes the default values of the following parameters:
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
in order to make it possible to make a backup and set up simple
replication on the default settings, without requiring a system restart.
Discussion: https://postgr.es/m/CABUevEy4PR_EAvZEzsbF5s+V0eEvw7shJ2t-AUwbHOjT+yRb3A@mail.gmail.com
Reviewed by Peter Eisentraut. Benchmark help from Tomas Vondra.
The different actions in pg_ctl had different defaults for -w and -W,
mostly for historical reasons. Most users will want the -w behavior, so
make that the default.
Remove the -w option in most example and test code, so avoid confusion
and reduce verbosity. pg_upgrade is not touched, so it can continue to
work with older installations.
Reviewed-by: Beena Emerson <memissemerson@gmail.com>
Reviewed-by: Ryan Murphy <ryanfmurphy@gmail.com>
Add methods to the core test framework PostgresNode.pm to allow us to
test that standby nodes have caught up with the master, as well as
basic LSN handling. Used in tests recovery/t/001_stream_rep.pl and
recovery/t/004_timeline_switch.pl
Craig Ringer, reviewed by Aleksander Alekseev and Simon Riggs
Since streaming is now supported for all output formats, make this the
default as this is what most people want.
To get the old behavior, the parameter -X none can be specified to turn
it off.
This also removes the parameter -x for fetch, now requiring -X fetch to
be specified to use that.
Reviewed by Vladimir Rusinov, Michael Paquier and Simon Riggs
Switch TAP tests to use the new wait mode of pg_ctl promote. This
allows avoiding extra logic with poll_query_until() to be sure that a
promoted standby is ready for read-write queries.
From: Michael Paquier <michael.paquier@gmail.com>
--noclean and --nosync were the only options spelled without a hyphen,
so change this for consistency with other options. The options in
pg_basebackup have not been in a release, so we just rename them. For
initdb, we retain the old variants.
Vik Fearing and me
Before pg_regress runs psql, set the application name to the test name.
Similarly, set the application name to the test file name in the TAP
tests. Also, set a default log_line_prefix that show the application
name, as well as the PID and a time stamp.
That way, the server log output can be correlated to the test input
files, making debugging a bit easier.
Add tests for consistent support of connection strings in frontend
programs as well as proper handling of unusual characters in database
and user names. These tests were developed for the issues of
CVE-2016-5424.
To allow testing of names with spaces, change the pg_regress
command-line options --create-role and --dbname to split their arguments
by comma only, not space or comma as before. Only commas were actually
used in existing uses.
Noah Misch, Michael Paquier, Peter Eisentraut
These tests are currently only running in buildfarm member hamster,
which is purposefully very slow. This suite has failed a couple of
times recently because of timeouts, so increase the allowed number of
iterations to avoid spurious failures.
Author: Michaël Paquier
Change assorted places in our Perl code that did things like
system("prog $path/file");
to do it more like
system('prog', "$path/file");
which is safe against spaces and other special characters in the path
variable. The latter was already the prevailing style, but a few bits
of code hadn't gotten this memo. Back-patch to 9.4 as relevant.
Michael Paquier, Kyotaro Horiguchi
Discussion: <20160704.160213.111134711.horiguchi.kyotaro@lab.ntt.co.jp>
Previously, database clusters created by a TAP test were shut down by
DESTROY methods attached to the PostgresNode objects representing them.
The trouble with that is that if the objects survive into the final global
destruction phase (which they do), Perl executes the DESTROY methods in an
unspecified order. Thus, the order of shutdown of multiple clusters was
indeterminate, which might lead to not-very-reproducible errors getting
logged (eg from a slave whose master might or might not get killed first).
Worse, the File::Temp objects representing the temporary PGDATA directories
might get destroyed before the PostgresNode objects, resulting in attempts
to delete PGDATA directories that still have live servers in them. On
Windows, this would lead to directory deletion failures; on Unix, it
usually had no effects worse than erratic "could not open temporary
statistics file "pg_stat/global.tmp": No such file or directory" log
messages.
While none of this would affect the reported result of the TAP test, which
is already determined, it could be very confusing when one is trying to
understand from the logs what went wrong with a failed test.
To fix, do the postmaster shutdowns in an END block rather than at object
destruction time. The END block will execute at a well-defined (and
reasonable) time during script termination, and it will stop the
postmasters in order of PostgresNode object creation. (Perhaps we should
change that to be reverse order of creation, but the main point here is
that we now have control which we did not before.) Use "pg_ctl stop", not
an asynchronous kill(SIGQUIT), so that we wait for the postmasters to shut
down before proceeding with directory deletion.
Deletion of temporary directories still happens in an unspecified order
during global destruction, but I can see no reason to care about that
once the postmasters are stopped.
Commit fab84c7787f25756 tried to get away without doing an actual bind(),
but buildfarm results show that that doesn't get the job done. So we must
really bind to the target port --- and at least on my Linux box, we need a
listen() as well, or conflicts won't be detected. We rely on SO_REUSEADDR
to prevent problems from starting a postmaster on the socket immediately
after we've bound to it in the test code. (There may be platforms where
that doesn't work too well. But fortunately, we only really care whether
this works on Windows, and there the default behavior should be OK.)
Buildfarm members bowerbird and jacana have shown intermittent "could not
bind IPv4 socket" failures in the BinInstallCheck stage since mid-December,
shortly after commits 1caef31d9e550408 and 9821492ee417a591 changed the
logic for selecting which port to use in temporary installations. One
plausible explanation is that we are randomly selecting ports that are
already in use for some non-Postgres purpose. Although the code tried
to defend against already-in-use ports, it used pg_isready to probe
the port which is quite unhelpful: if some non-Postgres server responds
at the given address, pg_isready will generally say "no response",
leading to exactly the wrong conclusion about whether the port is free.
Instead, let's use a simple TCP connect() call to see if anything answers
without making assumptions about what it is. Note that this means there's
no direct check for a conflicting Unix socket, but that should be okay
because there should be no other Unix sockets in use in the temporary
socket directory created for a test run.
This is only a partial solution for the TCP case, since if the port number
is in use for an outgoing connection rather than a listening socket, we'll
fail to detect that. We could try to bind() to the proposed port as a
means of detecting that case, but that would introduce its own failure
modes, since the system might consider the address to remain reserved for
some period of time after we drop the bound socket. Close study of the
errors returned by bowerbird and jacana suggests that what we're seeing
there may be conflicts with listening not outgoing sockets, so let's try
this and see if it improves matters. It's certainly better than what's
there now, in any case.
Michael Paquier, adjusted by me to work on non-Windows as well as Windows