Make every page in a hash index which isn't all-zeroes have a valid
special space, so that tools like pageinspect don't error out.
Also, make pageinspect cope with all-zeroes pages, because
_hash_alloc_buckets can leave behind large numbers of those until
they're consumed by splits.
Ashutosh Sharma and Robert Haas, reviewed by Amit Kapila.
Original trouble report from Jeff Janes.
Discussion: http://postgr.es/m/CAMkU=1y6NjKmqbJ8wLMhr=F74WzcMALYWcVFhEpm7i=mV=XsOg@mail.gmail.com
Since hash indexes typically have very few overflow pages, adding a
new splitpoint essentially doubles the on-disk size of the index,
which can lead to large and abrupt increases in disk usage (and
perhaps long delays on occasion). To mitigate this problem to some
degree, divide larger splitpoints into four equal phases. This means
that, for example, instead of growing from 4GB to 8GB all at once, a
hash index will now grow from 4GB to 5GB to 6GB to 7GB to 8GB, which
is perhaps still not as smooth as we'd like but certainly an
improvement.
This changes the on-disk format of the metapage, so bump HASH_VERSION
from 2 to 3. This will force a REINDEX of all existing hash indexes,
but that's probably a good idea anyway. First, hash indexes from
pre-10 versions of PostgreSQL could easily be corrupted, and we don't
want to confuse corruption carried over from an older release with any
corruption caused despite the new write-ahead logging in v10. Second,
it will let us remove some backward-compatibility code added by commit
293e24e507838733aba4748b514536af2d39d7f2.
Mithun Cy, reviewed by Amit Kapila, Jesper Pedersen and me. Regression
test outputs updated by me.
Discussion: http://postgr.es/m/CAD__OuhG6F1gQLCgMQNnMNgoCvOLQZz9zKYJQNYvYmmJoM42gA@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYty0jCf-pa+m+vYUJ716+AxM7nv_syvyanyf5O-L_i2A@mail.gmail.com
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution. At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs. The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.
An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.
Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.
The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.
An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement. No tests previously covered that
possibility, so one is added.
Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
Don't import partitions. Do import partitioned tables which are
not themselves partitions.
Report by Stephen Frost. Design and patch by Michael Paquier,
reviewed by Amit Langote. Documentation revised by me.
Discussion: http://postgr.es/m/20170309141531.GD9812@tamriel.snowman.net
Three nologin roles with non-overlapping privs are created by default
* pg_read_all_settings - read all GUCs.
* pg_read_all_stats - pg_stat_*, pg_database_size(), pg_tablespace_size()
* pg_stat_scan_tables - may lock/scan tables
Top level role - pg_monitor includes all of the above by default, plus others
Author: Dave Page
Reviewed-by: Stephen Frost, Robert Haas, Peter Eisentraut, Simon Riggs
The V0 convention is failure prone because we've so far assumed that a
function is V0 if PG_FUNCTION_INFO_V1 is missing, leading to crashes
if a function was coded against the V1 interface. V0 doesn't allow
proper NULL, SRF and toast handling. V0 doesn't offer features that
V1 doesn't.
Thus remove V0 support and obsolete fmgr README contents relating to
it.
Author: Andres Freund, with contributions by Peter Eisentraut & Craig Ringer
Reviewed-By: Peter Eisentraut, Craig Ringer
Discussion: https://postgr.es/m/20161208213441.k3mbno4twhg2qf7g@alap3.anarazel.de
There are no functional changes here; this simply encapsulates knowledge
of the ItemPointerData struct so that a future patch can change things
without more breakage.
All direct users of ip_blkid and ip_posid are changed to use existing
macros ItemPointerGetBlockNumber and ItemPointerGetOffsetNumber
respectively. For callers where that's inappropriate (because they
Assert that the itempointer is is valid-looking), add
ItemPointerGetBlockNumberNoCheck and ItemPointerGetOffsetNumberNoCheck,
which lack the assertion but are otherwise identical.
Author: Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdNnFon4cJiL=h1mZH3bgUeU+sWHuU4Yr8AB=j3A2p1GiA@mail.gmail.com
The conname variable was not initialized in some code paths, resulting
in error reports referring to the "unnamed" connection rather than the
correct connection name.
Author: Rushabh Lathia <rushabh.lathia@gmail.com>
The trouble with the original choice here is that "?" is a valid (and
indeed used) operator name, so that you could end up with ambiguous
statement texts like "SELECT ? ? ?". With this patch, you instead
see "SELECT $1 ? $2", which seems significantly more readable.
The numbers used for this purpose begin after the last actual $N parameter
in the particular query. The conflict with external parameters has its own
potential for confusion of course, but it was agreed to be an improvement
over the previous behavior.
Lukas Fittl
Discussion: https://postgr.es/m/CAP53PkxeaCuwYmF-A4J5z2-qk5fYFo5_NH3gpXGJJBxv1DMwEw@mail.gmail.com
Fix all perlcritic warnings of severity level 5, except in
src/backend/utils/Gen_dummy_probes.pl, which is automatically generated.
Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
This replaces the old, recursive tree-walk based evaluation, with
non-recursive, opcode dispatch based, expression evaluation.
Projection is now implemented as part of expression evaluation.
This both leads to significant performance improvements, and makes
future just-in-time compilation of expressions easier.
The speed gains primarily come from:
- non-recursive implementation reduces stack usage / overhead
- simple sub-expressions are implemented with a single jump, without
function calls
- sharing some state between different sub-expressions
- reduced amount of indirect/hard to predict memory accesses by laying
out operation metadata sequentially; including the avoidance of
nearly all of the previously used linked lists
- more code has been moved to expression initialization, avoiding
constant re-checks at evaluation time
Future just-in-time compilation (JIT) has become easier, as
demonstrated by released patches intended to be merged in a later
release, for primarily two reasons: Firstly, due to a stricter split
between expression initialization and evaluation, less code has to be
handled by the JIT. Secondly, due to the non-recursive nature of the
generated "instructions", less performance-critical code-paths can
easily be shared between interpreted and compiled evaluation.
The new framework allows for significant future optimizations. E.g.:
- basic infrastructure for to later reduce the per executor-startup
overhead of expression evaluation, by caching state in prepared
statements. That'd be helpful in OLTPish scenarios where
initialization overhead is measurable.
- optimizing the generated "code". A number of proposals for potential
work has already been made.
- optimizing the interpreter. Similarly a number of proposals have
been made here too.
The move of logic into the expression initialization step leads to some
backward-incompatible changes:
- Function permission checks are now done during expression
initialization, whereas previously they were done during
execution. In edge cases this can lead to errors being raised that
previously wouldn't have been, e.g. a NULL array being coerced to a
different array type previously didn't perform checks.
- The set of domain constraints to be checked, is now evaluated once
during expression initialization, previously it was re-built
every time a domain check was evaluated. For normal queries this
doesn't change much, but e.g. for plpgsql functions, which caches
ExprStates, the old set could stick around longer. The behavior
around might still change.
Author: Andres Freund, with significant changes by Tom Lane,
changes by Heikki Linnakangas
Reviewed-By: Tom Lane, Heikki Linnakangas
Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
Previously, it was unsafe to execute a plan in parallel if
ExecutorRun() might be called with a non-zero row count. However,
it's quite easy to fix things up so that we can support that case,
provided that it is known that we will never call ExecutorRun() a
second time for the same QueryDesc. Add infrastructure to signal
this, and cross-checks to make sure that a caller who claims this is
true doesn't later reneg.
While that pattern never happens with queries received directly from a
client -- there's no way to know whether multiple Execute messages
will be sent unless the first one requests all the rows -- it's pretty
common for queries originating from procedural languages, which often
limit the result to a single tuple or to a user-specified number of
tuples.
This commit doesn't actually enable parallelism in any additional
cases, because currently none of the places that would be able to
benefit from this infrastructure pass CURSOR_OPT_PARALLEL_OK in the
first place, but it makes it much more palatable to pass
CURSOR_OPT_PARALLEL_OK in places where we currently don't, because it
eliminates some cases where we'd end up having to run the parallel
plan serially.
Patch by me, based on some ideas from Rafia Sabih and corrected by
Rafia Sabih based on feedback from Dilip Kumar and myself.
Discussion: http://postgr.es/m/CA+TgmobXEhvHbJtWDuPZM9bVSLiTj-kShxQJ2uM5GPDze9fRYA@mail.gmail.com
Add functionality for a new subscription to copy the initial data in the
tables and then sync with the ongoing apply process.
For the copying, add a new internal COPY option to have the COPY source
data provided by a callback function. The initial data copy works on
the subscriber by receiving COPY data from the publisher and then
providing it locally into a COPY that writes to the destination table.
A WAL receiver can now execute full SQL commands. This is used here to
obtain information about tables and publications.
Several new options were added to CREATE and ALTER SUBSCRIPTION to
control whether and when initial table syncing happens.
Change pg_dump option --no-create-subscription-slots to
--no-subscription-connect and use the new CREATE SUBSCRIPTION
... NOCONNECT option for that.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
Tested-by: Erik Rijkers <er@xs4all.nl>
This will allow enums to be used in exclusion constraints.
The code uses the new CallerFInfoFunctionCall infrastructure in fmgr,
and the support for it added to btree_gist in commit 393bb504d7.
Reviewed by Tom Lane and Anastasia Lubennikova
Discussion: http://postgr.es/m/56EA8A71.8060107@dunslane.net
None of the existing types actually need to use this mechanism, but this
will allow support for enum types which will need it. A separate patch
will adjust the varlena types support for consistency.
Reviewed by Tom Lane and Anastasia Lubennikova
Discussion: http://postgr.es/m/27220.1478360811@sss.pgh.pa.us
The previous deparsing logic wasn't smart enough to produce subqueries
when deparsing; make it smart enough to do that. However, we only do
it that way when necessary, because it generates more complicated SQL
which will be harder for any humans reading the queries to understand.
Etsuro Fujita, reviewed by Ashutosh Bapat
Discussion: http://postgr.es/m/c449261a-b033-dc02-9254-2fe5b7044795@lab.ntt.co.jp
This adds in support for EUI-64 MAC addresses by adding a new data type
called 'macaddr8' (using our usual convention of indicating the number
of bytes stored).
This was largely a copy-and-paste from the macaddr data type, with
appropriate adjustments for having 8 bytes instead of 6 and adding
support for converting a provided EUI-48 (6 byte format) to the EUI-64
format. Conversion from EUI-48 to EUI-64 inserts FFFE as the 4th and
5th bytes but does not perform the IPv6 modified EUI-64 action of
flipping the 7th bit, but we add a function to perform that specific
action for the user as it may be commonly done by users who wish to
calculate their IPv6 address based on their network prefix and 48-bit
MAC address.
Author: Haribabu Kommi, with a good bit of rework of macaddr8_in by me.
Reviewed by: Vitaly Burovoy, Kuntal Ghosh
Discussion: https://postgr.es/m/CAJrrPGcUi8ZH+KkK+=TctNQ+EfkeCEHtMU_yo1mvX8hsk_ghNQ@mail.gmail.com
Previously if a directory had both isolationtester and plain
regression tests, they couldn't be run in parallel, because they'd
access the same files/directories. That, so far, only affected
contrib/test_decoding.
Rather than fix that locally in contrib/test_decoding, improve
pg_regress_isolation_[install]check to use separate resources from
plain regression tests.
That requires a minor change in pg_regress, namely that the
--outputdir is created if not already existing, that seems like good
idea anyway.
Use the improved helpers even where previously not used.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20170311194831.vm5ikpczq52c2drg@alap3.anarazel.de
The previous coding of the test was vulnerable against autovacuum
triggering work on one of the tables in check_btree.sql.
For the purpose of the test it's entirely sufficient to check for
locks taken by the current process, so add an appropriate restriction.
While touching the test, expand it to also check for locks on the
underlying relations, rather than just the indexes.
Reported-By: Tom Lane
Discussion: https://postgr.es/m/30354.1489434301@sss.pgh.pa.us
The warning about hash indexes not being write-ahead logged and their
use being discouraged has been removed. "snapshot too old" is now
supported for tables with hash indexes. Most importantly, barring
bugs, hash indexes will now be crash-safe and usable on standbys.
This commit doesn't yet add WAL consistency checking for hash
indexes, as we now have for other index types; a separate patch has
been submitted to cure that lack.
Amit Kapila, reviewed and slightly modified by me. The larger patch
series of which this is a part has been reviewed and tested by Álvaro
Herrera, Ashutosh Sharma, Mark Kirkwood, Jeff Janes, and Jesper
Pedersen.
Discussion: http://postgr.es/m/CAA4eK1JOBX=YU33631Qh-XivYXtPSALh514+jR8XeD7v+K3r_Q@mail.gmail.com
This makes almost all core code follow the policy introduced in the
previous commit. Specific decisions:
- Text search support functions with char* and length arguments, such as
prsstart and lexize, may receive unaligned strings. I doubt
maintainers of non-core text search code will notice.
- Use plain VARDATA() on values detoasted or synthesized earlier in the
same function. Use VARDATA_ANY() on varlenas sourced outside the
function, even if they happen to always have four-byte headers. As an
exception, retain the universal practice of using VARDATA() on return
values of SendFunctionCall().
- Retain PG_GETARG_BYTEA_P() in pageinspect. (Page images are too large
for a one-byte header, so this misses no optimization.) Sites that do
not call get_page_from_raw() typically need the four-byte alignment.
- For now, do not change btree_gist. Its use of four-byte headers in
memory is partly entangled with storage of 4-byte headers inside
GBT_VARKEY, on disk.
- For now, do not change gtrgm_consistent() or gtrgm_distance(). They
incorporate the varlena header into a cache, and there are multiple
credible implementation strategies to consider.
Detect fclose() failures; given "ln -s /dev/full $PGDATA/devfull",
"pg_file_write('devfull', 'x', true)" now fails as it should. Don't
leak a stream when fwrite() fails. Remove a born-ineffective test that
aimed to skip zero-length writes. Back-patch to 9.2 (all supported
versions).
In functions that issue a deconstruct_array() call, consistently use
plain VARSIZE()/VARDATA() on the array elements. Prior practice was
divided between those and VARSIZE_ANY_EXHDR()/VARDATA_ANY().
Although it's reasonable to expect that most of these constants will
never change, that does not make it good programming style to hard-code
the value rather than using the RELKIND_FOO macros.
I think I've now gotten all the hard-coded references in C code.
Unfortunately there's no equally convenient way to parameterize
SQL files ...
Discussion: https://postgr.es/m/11145.1488931324@sss.pgh.pa.us
This is the beginning of a collection of SQL-callable functions to
verify the integrity of data files. For now it only contains code to
verify B-Tree indexes.
This adds two SQL-callable functions, validating B-Tree consistency to
a varying degree. Check the, extensive, docs for details.
The goal is to later extend the coverage of the module to further
access methods, possibly including the heap. Once checks for
additional access methods exist, we'll likely add some "dispatch"
functions that cover multiple access methods.
Author: Peter Geoghegan, editorialized by Andres Freund
Reviewed-By: Andres Freund, Tomas Vondra, Thomas Munro,
Anastasia Lubennikova, Robert Haas, Amit Langote
Discussion: CAM3SWZQzLMhMwmBqjzK+pRKXrNUZ4w90wYMUWfkeV8mZ3Debvw@mail.gmail.com