Add general purpose chaining hash tables for DSA memory. Unlike
DynaHash in shared memory mode, these hash tables can grow as
required, and cope with being mapped into different addresses in
different backends.
There is a wide range of potential users for such a hash table, though
it's very likely the interface will need to evolve as we come to
understand the needs of different kinds of users. E.g support for
iterators and incremental resizing is planned for later commits and
the details of the callback signatures are likely to change.
Author: Thomas Munro
Reviewed-By: John Gorman, Andres Freund, Dilip Kumar, Robert Haas
Discussion:
https://postgr.es/m/CAEepm=3d8o8XdVwYT6O=bHKsKAM2pu2D6sV1S_=4d+jStVCE7w@mail.gmail.comhttps://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Previously, tuple descriptors were stored in chains keyed by a fixed size
array of OIDs. That meant there were effectively two levels of collision
chain -- one inside and one outside the hash table. Instead, let dynahash.c
look after conflicts for us by supplying a proper hash and equal function
pair.
This is a nice cleanup on its own, but also simplifies followup
changes allowing blessed TupleDescs to be shared between backends
participating in parallel query.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm%3D34GVhOL%2BarUx56yx7OPk7%3DqpGsv3CpO54feqjAwQKm5g%40mail.gmail.com
Users can still create them themselves. Instead, document Unicode TR 35
collation options for ICU, so users can create all this themselves.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Install language+region combinations even if they are not distinct from
the language's base locale. This gives better long-term stability of
the set of predefined locales and makes the predefined locales less
implementation-dependent and more practical for users.
Reviewed-by: Peter Geoghegan <pg@bowt.ie>
Periodically while the server is running, and at shutdown, write out a
list of blocks in shared buffers. When the server reaches consistency
-- unfortunatey, we can't do it before that point without breaking
things -- reload those blocks into any still-unused shared buffers.
Mithun Cy and Robert Haas, reviewed and tested by Beena Emerson,
Amit Kapila, Jim Nasby, and Rafia Sabih.
Discussion: http://postgr.es/m/CAD__OugubOs1Vy7kgF6xTjmEqTR4CrGAv8w+ZbaY_+MZeitukw@mail.gmail.com
It appeared in a conditional that excludes AIX, Cygwin and MinGW. Give
ICU support a chance to work on those platforms. Back-patch to v10,
where ICU support was introduced.
TupleDesc's attributes were already stored in contiguous memory after the
struct. Go one step further and get rid of the array of pointers to
attributes so that they can be stored in shared memory mapped at different
addresses in each backend. This won't work for TupleDescs with contraints
and defaults, since those point to other objects, but for many purposes
only attributes are needed.
Author: Thomas Munro
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
This is a mechanical change in preparation for a later commit that
will change the layout of TupleDesc. Introducing a macro to abstract
the details of where attributes are stored will allow us to change
that in separate step and revise it in future.
Author: Thomas Munro, editorialized by Andres Freund
Reviewed-By: Andres Freund
Discussion: https://postgr.es/m/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
Commit 3eb9a5e7c unintentionally introduced an ordering dependency
into restore_toc_entries_prefork(). The existing coding of
reduce_dependencies() contains a check to skip moving a TOC entry
to the ready_list if it wasn't initially in the pending_list.
This used to suffice to prevent reduce_dependencies() from trying to
move anything into the ready_list during restore_toc_entries_prefork(),
because the pending_list stayed empty throughout that phase; but it no
longer does. The problem doesn't manifest unless the TOC has been
reordered by SortTocFromFile, which is how I missed it in testing.
To fix, just add a test for ready_list == NULL, converting the call
with NULL from a poor man's sanity check into an explicit command
not to touch TOC items' list membership. Clarify some of the comments
around this; in particular, note the primary purpose of the check for
pending_list membership, which is to ensure that we can't try to restore
the same item twice, in case a TOC list forces it to be restored before
its dependency count goes to zero.
Per report from Fabrízio de Royes Mello. Back-patch to 9.3, like the
previous commit.
Discussion: https://postgr.es/m/CAFcNs+pjuv0JL_x4+=71TPUPjdLHOXA4YfT32myj_OrrZb4ohA@mail.gmail.com
Add a new EState member es_leaf_result_relations, so that the trigger
code knows about ResultRelInfos created by tuple routing. Also make
sure ExplainPrintTriggers knows about partition-related
ResultRelInfos.
Etsuro Fujita, reviewed by Amit Langote
Discussion: http://postgr.es/m/57163e18-8e56-da83-337a-22f2c0008051@lab.ntt.co.jp
That code patch was good as far as it went, but the associated test case
has exposed fundamental brain damage in the parallel scan mechanism,
which is going to take nontrivial work to correct. In the interests of
getting the buildfarm back to green so that unrelated work can proceed,
let's temporarily remove the test case.
Instead, lock them in the caller using find_all_inheritors so that
they get locked in the standard order, minimizing deadlock risks.
Also in RelationGetPartitionDispatchInfo, avoid opening tables which
are not partitioned; there's no need.
Amit Langote, reviewed by Ashutosh Bapat and Amit Khandekar
Discussion: http://postgr.es/m/91b36fa1-c197-b72f-ca6e-56c593bae68c@lab.ntt.co.jp
It now emerges that we can only rely on Perl to tell us we must use
-D_USE_32BIT_TIME_T if it's Perl 5.13.4 or later. For older versions,
revert to our previous practice of assuming we need that symbol in
all 32-bit Windows builds. This is not ideal, but inquiring into
which compiler version Perl was built with seems far too fragile.
In any case, we had not previously had complaints about these old
Perl versions, so let's assume this is Good Enough. (It's still
better than the situation ante commit 5a5c2feca, in that at least
the effects are confined to PL/Perl rather than the whole PG build.)
Back-patch to all supported versions, like 5a5c2feca and predecessors.
Discussion: https://postgr.es/m/CANFyU97OVQ3+Mzfmt3MhuUm5NwPU=-FtbNH5Eb7nZL9ua8=rcA@mail.gmail.com
This became possible by commit
6c2003f8a1bbc7c192a2e83ec51581c018aa162f. This just makes pg_dump aware
of it and updates the documentation.
Author: Petr Jelinek <petr.jelinek@2ndquadrant.com>
As Andres pointed out, pg_atomic_init_u64 must be used to initialize an
atomic variable, before it can be accessed with the actual atomic ops.
Trying to use pg_atomic_write_u64 on an uninitialized variable leads to a
failure with the fallback implementation that uses a spinlock.
Discussion: https://www.postgresql.org/message-id/20170816191346.d3ke5tpshhco4bnd%40alap3.anarazel.de
Improve generate_unaccent_rules.py to handle composed characters whose base
is another composed character rather than a plain letter. The net effect
of this is to add a bunch of multi-accented Vietnamese characters to
unaccent.rules.
Original complaint from Kha Nguyen, diagnosis of the script's shortcoming
by Thomas Munro.
Dang Minh Huong and Michael Paquier
Discussion: https://postgr.es/m/CALo3sF6EC8cy1F2JUz=GRf5h4LMUJTaG3qpdoiLrNbWEXL-tRg@mail.gmail.com
Previously, if we had to estimate the number of distinct values in a
VALUES column, we fell back on the default behavior used whenever we lack
statistics, which effectively is that there are Min(# of entries, 200)
distinct values. This can be very badly off with a large VALUES list,
as noted by Jeff Janes.
We could consider actually running an ANALYZE-like scan on the VALUES,
but that seems unduly expensive, and anyway it could not deliver reliable
info if the entries are not all constants. What seems like a better choice
is to assume that the values are all distinct. This will sometimes be just
as wrong as the old code, but it seems more likely to be more nearly right
in many common cases. Also, it is more consistent with what happens in
some related cases, for example WHERE x = ANY(ARRAY[1,2,3,...,n]) and
WHERE x = ANY(VALUES (1),(2),(3),...,(n)) now are estimated similarly.
This was discussed some time ago, but consensus was it'd be better
to slip it in at the start of a development cycle not near the end.
(It should've gone into v10, really, but I forgot about it.)
Discussion: https://postgr.es/m/CAMkU=1xHkyPa8VQgGcCNg3RMFFvVxUdOpus1gKcFuvVi0w6Acg@mail.gmail.com
Previously, if you passed a non-aligned size to shm_toc_create(), the
memory returned by shm_toc_allocate() would be similarly non-aligned.
This was exposed by commit 3cda10f41b, which allocated structs containing
a pg_atomic_uint64 field with shm_toc_allocate(). On systems with
MAXIMUM_ALIGNOF = 4, such structs still need to be 8-bytes aligned, but
the memory returned by shm_toc_allocate() was only 4-bytes aligned.
It's quite bogus that we abuse BUFFERALIGN to align the structs for
pg_atomic_uint64. It doesn't really have anything to do with buffers. But
that's a separate issue.
This ought to fix the buildfarm failures on 32-bit x86 systems.
Discussion: https://www.postgresql.org/message-id/7e0a73a5-0df9-1859-b8ae-9acf122dc38d@iki.fi
Since commit 40dae7ec53, which changed the way b-tree page splitting
works, there has been no difference in the handling of root, and non-root
split WAL records. We don't need to distinguish them anymore
If you're worried about the loss of debugging information, note that
usually a root split record will normally be followed by a WAL record to
create the new root page. The root page will also have the BTP_ROOT flag
set on the page itself, and there is a pointer to it from the metapage.
Author: Aleksander Alekseev
Discussion: https://www.postgresql.org/message-id/20170406122116.GA11081@e733.localdomain
Change to appendStringInfoChar() or appendStringInfoString() where those
can be used.
Author: David Rowley <david.rowley@2ndquadrant.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>
Although not confirmed and probably rare, if the newly allocated memory
is not already zero, this could possibly have caused some problems.
Also reorder the initializations slightly so they match the order of the
struct definition.
Author: Wong, Yi Wen <yiwong@amazon.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Some informational messages showed up even if verbose mode was not
used. Move them to verbose mode.
Author: Michael Paquier <michael.paquier@gmail.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
This appears to have been an omission in the original commit
0d692a0dc9f. All related information_schema views already include
foreign tables.
Reported-by: Nicolas Thauvin <nicolas.thauvin@dalibo.com>
The initial implementation of autovacuum work-items used a dynamic
shared memory area (DSA). However, it's argued that dynamic shared
memory is not portable enough, so we cannot rely on it being supported
everywhere; at the same time, autovacuum work-items are now a critical
part of the server, so it's not acceptable that they don't work in the
cases where dynamic shared memory is disabled. Therefore, let's fall
back to a simpler implementation of work-items that just uses
autovacuum's main shared memory segment for storage.
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
Commit 00418c612 expected that the plan generated for a simple-expression
query would always be a plain Result node. However, if force_parallel_mode
is on, the planner might stick a Gather atop that. Cope by looking through
the Gather. For safety, assert that the Gather's tlist is trivial.
Per buildfarm.
Discussion: https://postgr.es/m/23425.1502822098@sss.pgh.pa.us
Since we currently only have one protocol, this doesn't make much of a
difference other than the error message.
Author: Yugo Nagata <nagata@sraoss.co.jp>
Also create PDF bookmarks/ToC entries for subsections of reference
pages. This was a regression from the previous jadetex-based build.
Reported-by: Erik Rijkers <er@xs4all.nl>
The executor is capable of splitting buckets during a hash join if
too much memory is being used by a small number of buckets. However,
this only helps if a bucket's population is actually divisible; if
all the hash keys are alike, the tuples still end up in the same
new bucket. This can result in an OOM failure if there are enough
inner keys with identical hash values. The planner's cost estimates
will bias it against choosing a hash join in such situations, but not
by so much that it will never do so. To mitigate the OOM hazard,
explicitly estimate the hash bucket space needed by just the inner
side's most common value, and if that would exceed work_mem then
add disable_cost to the hash cost estimate.
This approach doesn't account for the possibility that two or more
common values would share the same hash value. On the other hand,
work_mem is normally a fairly conservative bound, so that eating
two or more times that much space is probably not going to kill us.
If we have no stats about the inner side, ignore this consideration.
There was some discussion of making a conservative assumption, but that
would effectively result in disabling hash join whenever we lack stats,
which seems like an overreaction given how seldom the problem manifests
in the field.
Per a complaint from David Hinkle. Although this could be viewed
as a bug fix, the lack of similar complaints weighs against back-
patching; indeed we waited for v11 because it seemed already rather
late in the v10 cycle to be making plan choice changes like this one.
Discussion: https://postgr.es/m/32013.1487271761@sss.pgh.pa.us
The original code (since 00e6a16d01) was assuming aborting the
transaction in autovacuum launcher was sufficient to release all
resources, but in reality the launcher runs quite a lot of code out of
any transactions. Re-introduce individual cleanup calls to make abort
more robust.
Reported-by: Robert Haas
Discussion: https://postgr.es/m/CA+TgmobQVbz4K_+RSmiM9HeRKpy3vS5xnbkL95gSEnWijzprKQ@mail.gmail.com
Instead of duplicating the logic to search for a matching
ParamPathInfo in multiple places, factor it out into a separate
function.
Pass only the relevant bits of the PartitionKey to
partition_bounds_equal instead of the whole thing, because
partition-wise join will want to call this without having a
PartitionKey available.
Adjust allow_star_schema_join and calc_nestloop_required_outer
to take relevant Relids rather than the entire Path, because
partition-wise join will want to call it with the top-parent
relids to determine whether a child join is allowable.
Ashutosh Bapat. Review and testing of the larger patch set of which
this is a part by Amit Langote, Rajkumar Raghuwanshi, Rafia Sabih,
Thomas Munro, Dilip Kumar, and me.
Discussion: http://postgr.es/m/CA+TgmobQK80vtXjAsPZWWXd7c8u13G86gmuLupN+uUJjA+i4nA@mail.gmail.com