This patch introduces the SQL standard IS JSON predicate. It operates
on text and bytea values representing JSON, as well as on the json and
jsonb types. Each test has IS and IS NOT variants and supports a WITH
UNIQUE KEYS flag. The tests are:
IS JSON [VALUE]
IS JSON ARRAY
IS JSON OBJECT
IS JSON SCALAR
These should be self-explanatory.
The WITH UNIQUE KEYS flag makes these return false when duplicate keys
exist in any object within the value, not necessarily directly contained
in the outermost object.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Author: Andrew Dunstan <andrew@dunslane.net>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
This commit introduces the SQL/JSON standard-conforming constructors for
JSON types:
JSON_ARRAY()
JSON_ARRAYAGG()
JSON_OBJECT()
JSON_OBJECTAGG()
Most of the functionality was already present in PostgreSQL-specific
functions, but these include some new functionality such as the ability
to skip or include NULL values, and to allow duplicate keys or throw
error when they are found, as well as the standard specified syntax to
specify output type and format.
Author: Nikita Glukhov <n.gluhov@postgrespro.ru>
Author: Teodor Sigaev <teodor@sigaev.ru>
Author: Oleg Bartunov <obartunov@gmail.com>
Author: Alexander Korotkov <aekorotkov@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/CAF4Au4w2x-5LTnN_bxky-mq4=WOqsGsxSpENCzHRAzSnEd8+WQ@mail.gmail.com
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de
Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org
When determining whether an index update may be skipped by using HOT, we
can ignore attributes indexed by block summarizing indexes without
references to individual tuples that need to be cleaned up.
A new type TU_UpdateIndexes provides a signal to the executor to
determine which indexes to update - no indexes, all indexes, or only the
summarizing indexes.
This also removes rd_indexattr list, and replaces it with rd_attrsvalid
flag. The list was not used anywhere, and a simple flag is sufficient.
This was originally committed as 5753d4ee32, but then got reverted by
e3fcca0d0d because of correctness issues.
Original patch by Josef Simanek, various fixes and improvements by Tomas
Vondra and me.
Authors: Matthias van de Meent, Josef Simanek, Tomas Vondra
Reviewed-by: Tomas Vondra, Alvaro Herrera
Discussion: https://postgr.es/m/05ebcb44-f383-86e3-4f31-0a97a55634cf@enterprisedb.com
Discussion: https://postgr.es/m/CAFp7QwpMRGcDAQumN7onN9HjrJ3u4X3ZRXdGFT0K5G2JWvnbWg%40mail.gmail.com
Traditionally we used the same Var struct to represent the value
of a table column everywhere in parse and plan trees. This choice
predates our support for SQL outer joins, and it's really a pretty
bad idea with outer joins, because the Var's value can depend on
where it is in the tree: it might go to NULL above an outer join.
So expression nodes that are equal() per equalfuncs.c might not
represent the same value, which is a huge correctness hazard for
the planner.
To improve this, decorate Var nodes with a bitmapset showing
which outer joins (identified by RTE indexes) may have nulled
them at the point in the parse tree where the Var appears.
This allows us to trust that equal() Vars represent the same value.
A certain amount of klugery is still needed to cope with cases
where we re-order two outer joins, but it's possible to make it
work without sacrificing that core principle. PlaceHolderVars
receive similar decoration for the same reason.
In the planner, we include these outer join bitmapsets into the relids
that an expression is considered to depend on, and in consequence also
add outer-join relids to the relids of join RelOptInfos. This allows
us to correctly perceive whether an expression can be calculated above
or below a particular outer join.
This change affects FDWs that want to plan foreign joins. They *must*
follow suit when labeling foreign joins in order to match with the
core planner, but for many purposes (if postgres_fdw is any guide)
they'd prefer to consider only base relations within the join.
To support both requirements, redefine ForeignScan.fs_relids as
base+OJ relids, and add a new field fs_base_relids that's set up by
the core planner.
Large though it is, this commit just does the minimum necessary to
install the new mechanisms and get check-world passing again.
Follow-up patches will perform some cleanup. (The README additions
and comments mention some stuff that will appear in the follow-up.)
Patch by me; thanks to Richard Guo for review.
Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
I started out with the intention to rename value_type to item_type to
avoid a collision with a typedef name that appears on some platforms.
Along the way, I noticed that the adjacent field "format" was not being
correctly handled by the backend/nodes/ infrastructure functions:
copyfuncs.c erroneously treated it as a scalar, while equalfuncs,
outfuncs, and readfuncs omitted handling it at all. This looks like
it might be cosmetic at the moment because the field is always NULL
after parse analysis; but that's likely a bug in itself, and the code's
certainly not very future-proof. Let's fix it while we can still do so
without forcing an initdb on beta testers.
Further study found a few other inconsistencies in the backend/nodes/
infrastructure for the recently-added JSON node types, so fix those too.
catversion bumped because of potential change in stored rules.
Discussion: https://postgr.es/m/526703.1652385613@sss.pgh.pa.us
These clauses allow the user to specify how data from nested paths are
joined, allowing considerable freedom in shaping the tabular output of
JSON_TABLE.
PLAN DEFAULT allows the user to specify the global strategies when
dealing with sibling or child nested paths. The is often sufficient to
achieve the necessary goal, and is considerably simpler than the full
PLAN clause, which allows the user to specify the strategy to be used
for each named nested path.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
This introduces the SQL/JSON functions for querying JSON data using
jsonpath expressions. The functions are:
JSON_EXISTS()
JSON_QUERY()
JSON_VALUE()
All of these functions only operate on jsonb. The workaround for now is
to cast the argument to jsonb.
JSON_EXISTS() tests if the jsonpath expression applied to the jsonb
value yields any values. JSON_VALUE() must return a single value, and an
error occurs if it tries to return multiple values. JSON_QUERY() must
return a json object or array, and there are various WRAPPER options for
handling scalar or multi-value results. Both these functions have
options for handling EMPTY and ERROR conditions.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
This patch intrdocuces the SQL standard IS JSON predicate. It operates
on text and bytea values representing JSON as well as on the json and
jsonb types. Each test has an IS and IS NOT variant. The tests are:
IS JSON [VALUE]
IS JSON ARRAY
IS JSON OBJECT
IS JSON SCALAR
IS JSON WITH | WITHOUT UNIQUE KEYS
These are mostly self-explanatory, but note that IS JSON WITHOUT UNIQUE
KEYS is true whenever IS JSON is true, and IS JSON WITH UNIQUE KEYS is
true whenever IS JSON is true except it IS JSON OBJECT is true and there
are duplicate keys (which is never the case when applied to jsonb values).
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
This patch introduces the SQL/JSON standard constructors for JSON:
JSON()
JSON_ARRAY()
JSON_ARRAYAGG()
JSON_OBJECT()
JSON_OBJECTAGG()
For the most part these functions provide facilities that mimic
existing json/jsonb functions. However, they also offer some useful
additional functionality. In addition to text input, the JSON() function
accepts bytea input, which it will decode and constuct a json value from.
The other functions provide useful options for handling duplicate keys
and null values.
This series of patches will be followed by a consolidated documentation
patch.
Nikita Glukhov
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
This introduces some of the building blocks used by the SQL/JSON
constructor and query functions. Specifically, it provides node
executor and grammar support for the FORMAT JSON [ENCODING foo]
clause, and values decorated with it, and for the RETURNING clause.
The following SQL/JSON patches will leverage these.
Nikita Glukhov (who probably deserves an award for perseverance).
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu,
Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
This introduces some of the building blocks used by the SQL/JSON
constructor and query functions. Specifically, it provides node
executor and grammar support for the FORMAT JSON [ENCODING foo]
clause, and values decorated with it, and for the RETURNING clause.
The following SQL/JSON patches will leverage these.
Nikita Glukhov (who probably deserves an award for perseverance).
Reviewers have included (in no particular order) Andres Freund, Alexander
Korotkov, Pavel Stehule, Andrew Alsup. Erik Rijkers, Zihong Yu and
Himanshu Upadhyaya.
Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru
The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not. Different
implementations have different behaviors. In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
This patch adds this option to PostgreSQL. The default behavior
remains UNIQUE NULLS DISTINCT. Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
I named all the internal flags, catalog columns, etc. in the negative
("nulls not distinct") so that the default PostgreSQL behavior is the
default if the flag is false.
Reviewed-by: Maxim Orlov <orlovmg@gmail.com>
Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bde78@enterprisedb.com
Commit 9dc718bd added a "logically unchanged by UPDATE" hinting
mechanism, which is currently used within nbtree indexes only (see
commit d168b666). This mechanism determined whether or not the incoming
item is a logically unchanged duplicate (a duplicate needed only for
MVCC versioning purposes) once per row updated per non-HOT update. This
approach led to memory leaks which were noticeable with an UPDATE
statement that updated sufficiently many rows, at least on tables that
happen to have an expression index.
On HEAD, fix the issue by adding a cache to the executor's per-index
IndexInfo struct.
Take a different approach on Postgres 14 to avoid an ABI break: simply
pass down the hint to all indexes unconditionally with non-HOT UPDATEs.
This is deemed acceptable because the hint is currently interpreted
within btinsert() as "perform a bottom-up index deletion pass if and
when the only alternative is splitting the leaf page -- prefer to delete
any LP_DEAD-set items first". nbtree must always treat the hint as a
noisy signal about what might work, as a strategy of last resort, with
costs imposed on non-HOT updaters. (The same thing might not be true
within another index AM that applies the hint, which is why the original
behavior is preserved on HEAD.)
Author: Peter Geoghegan <pg@bowt.ie>
Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com>
Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us
Backpatch: 14-, where the hinting mechanism was added.
Up to now the size of a query's rangetable has been limited by the
constants INNER_VAR et al, which mustn't be equal to any real
rangetable index. 65000 doubtless seemed like enough for anybody,
and it still is orders of magnitude larger than the number of joins
we can realistically handle. However, we need a rangetable entry
for each child partition that is (or might be) processed by a query.
Queries with a few thousand partitions are getting more realistic,
so that the day when that limit becomes a problem is in sight,
even if it's not here yet. Hence, let's raise the limit.
Rather than just increase the values of INNER_VAR et al, this patch
adopts the approach of making them small negative values, so that
rangetables could theoretically become as long as INT_MAX.
The bulk of the patch is concerned with changing Var.varno and some
related variables from "Index" (unsigned int) to plain "int". This
is basically cosmetic, with little actual effect other than to help
debuggers print their values nicely. As such, I've only bothered
with changing places that could actually see INNER_VAR et al, which
the parser and most of the planner don't. We do have to be careful
in places that are performing less/greater comparisons on varnos,
but there are very few such places, other than the IS_SPECIAL_VARNO
macro itself.
A notable side effect of this patch is that while it used to be
possible to add INNER_VAR et al to a Bitmapset, that will now
draw an error. I don't see any likelihood that it wouldn't be a
bug to include these fake varnos in a bitmapset of real varnos,
so I think this is all to the good.
Although this touches outfuncs/readfuncs, I don't think a catversion
bump is required, since stored rules would never contain Vars
with these fake varnos.
Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut
Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)". We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec. In this way existing calls to date_part() will
continue to have the old semantics.
To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration. (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.) Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax. It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way. (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.) Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.
This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.) FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.
Possible future work:
* Perhaps CoercionForm should now be renamed to DisplayForm,
or something like that, to reflect its more general meaning.
This'd require touching a couple hundred places, so it's not
clear it's worth the code churn.
* The SQLValueFunction node type, which was invented partly for
the same goal of improving SQL-compatibility of view output,
could perhaps be replaced with regular function calls marked
with COERCE_SQL_SYNTAX. It's unclear if this would be a net
code savings, however.
Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
Commit f7f70d5e2 left one inconsistency behind: we're still creating
pg_type entries for the composite types of sequences and toast tables,
but not arrays over those composites. But there seems precious little
reason to have named composite types for toast tables, and not much more
to have them for sequences (especially given the thought that sequences
may someday not be standalone relations at all).
So, let's close that inconsistency by removing these composite types,
rather than adding arrays for them. This buys back a little bit of
the initial pg_type bloat added by the previous patch, and could be
a significant savings in a large database with many toast tables.
Aside from a small logic rearrangement in heap_create_with_catalog,
this patch mostly needs to clean up some places that were assuming that
pg_class.reltype always has a valid value. Those are really pre-existing
bugs, given that it's documented otherwise; notably, the plpgsql changes
fix code that gives "cache lookup failed for type 0" on indexes today.
But none of these seem interesting enough to back-patch.
Also, remove the pg_dump/pg_upgrade infrastructure for propagating
a toast table's pg_type OID into the new database, since we no longer
need that.
Discussion: https://postgr.es/m/761F1389-C6A8-4C15-80CE-950C961F5341@gmail.com
PostgreSQL provides set of template index access methods, where opclasses have
much freedom in the semantics of indexing. These index AMs are GiST, GIN,
SP-GiST and BRIN. There opclasses define representation of keys, operations on
them and supported search strategies. So, it's natural that opclasses may be
faced some tradeoffs, which require user-side decision. This commit implements
opclass parameters allowing users to set some values, which tell opclass how to
index the particular dataset.
This commit doesn't introduce new storage in system catalog. Instead it uses
pg_attribute.attoptions, which is used for table column storage options but
unused for index attributes.
In order to evade changing signature of each opclass support function, we
implement unified way to pass options to opclass support functions. Options
are set to fn_expr as the constant bytea expression. It's possible due to the
fact that opclass support functions are executed outside of expressions, so
fn_expr is unused for them.
This commit comes with some examples of opclass options usage. We parametrize
signature length in GiST. That applies to multiple opclasses: tsvector_ops,
gist__intbig_ops, gist_ltree_ops, gist__ltree_ops, gist_trgm_ops and
gist_hstore_ops. Also we parametrize maximum number of integer ranges for
gist__int_ops. However, the main future usage of this feature is expected
to be json, where users would be able to specify which way to index particular
json parts.
Catversion is bumped.
Discussion: https://postgr.es/m/d22c3a18-31c7-1879-fc11-4c1ce2f5e5af%40postgrespro.ru
Author: Nikita Glukhov, revised by me
Reviwed-by: Nikolay Shaplov, Robert Haas, Tom Lane, Tomas Vondra, Alvaro Herrera
The core idea of this patch is to make the parser generate join alias
Vars (that is, ones with varno pointing to a JOIN RTE) only when the
alias Var is actually different from any raw join input, that is a type
coercion and/or COALESCE is necessary to generate the join output value.
Otherwise just generate varno/varattno pointing to the relevant join
input column.
In effect, this means that the planner's flatten_join_alias_vars()
transformation is already done in the parser, for all cases except
(a) columns that are merged by JOIN USING and are transformed in the
process, and (b) whole-row join Vars. In principle that would allow
us to skip doing flatten_join_alias_vars() in many more queries than
we do now, but we don't have quite enough infrastructure to know that
we can do so --- in particular there's no cheap way to know whether
there are any whole-row join Vars. I'm not sure if it's worth the
trouble to add a Query-level flag for that, and in any case it seems
like fit material for a separate patch. But even without skipping the
work entirely, this should make flatten_join_alias_vars() faster,
particularly where there are nested joins that it previously had to
flatten recursively.
An essential part of this change is to replace Var nodes'
varnoold/varoattno fields with varnosyn/varattnosyn, which have
considerably more tightly-defined meanings than the old fields: when
they differ from varno/varattno, they identify the Var's position in
an aliased JOIN RTE, and the join alias is what ruleutils.c should
print for the Var. This is necessary because the varno change
destroyed ruleutils.c's ability to find the JOIN RTE from the Var's
varno.
Another way in which this change broke ruleutils.c is that it's no
longer feasible to determine, from a JOIN RTE's joinaliasvars list,
which join columns correspond to which columns of the join's immediate
input relations. (If those are sub-joins, the joinaliasvars entries
may point to columns of their base relations, not the sub-joins.)
But that was a horrid mess requiring a lot of fragile assumptions
already, so let's just bite the bullet and add some more JOIN RTE
fields to make it more straightforward to figure that out. I added
two integer-List fields containing the relevant column numbers from
the left and right input rels, plus a count of how many merged columns
there are.
This patch depends on the ParseNamespaceColumn infrastructure that
I added in commit 5815696bc. The biggest bit of code change is
restructuring transformFromClauseItem's handling of JOINs so that
the ParseNamespaceColumn data is propagated upward correctly.
Other than that and the ruleutils fixes, everything pretty much
just works, though some processing is now inessential. I grabbed
two pieces of low-hanging fruit in that line:
1. In find_expr_references, we don't need to recurse into join alias
Vars anymore. There aren't any except for references to merged USING
columns, which are more properly handled when we scan the join's RTE.
This change actually fixes an edge-case issue: we will now record a
dependency on any type-coercion function present in a USING column's
joinaliasvar, even if that join column has no references in the query
text. The odds of the missing dependency causing a problem seem quite
small: you'd have to posit somebody dropping an implicit cast between
two data types, without removing the types themselves, and then having
a stored rule containing a whole-row Var for a join whose USING merge
depends on that cast. So I don't feel a great need to change this in
the back branches. But in theory this way is more correct.
2. markRTEForSelectPriv and markTargetListOrigin don't need to recurse
into join alias Vars either, because the cases they care about don't
apply to alias Vars for USING columns that are semantically distinct
from the underlying columns. This removes the only case in which
markVarForSelectPriv could be called with NULL for the RTE, so adjust
the comments to describe that hack as being strictly internal to
markRTEForSelectPriv.
catversion bump required due to changes in stored rules.
Discussion: https://postgr.es/m/7115.1577986646@sss.pgh.pa.us
When copying the definition of an index rebuilt concurrently for the new
entry, the index information was taken directly from the old index using
the relation cache. In this case, predicates and expressions have
some post-processing to prepare things for the planner, which loses some
information including the collations added in any of them.
This inconsistency can cause issues when attempting for example a table
rewrite, and makes the new indexes rebuilt concurrently inconsistent
with the old entries.
In order to fix the problem, fetch expressions and predicates directly
from the catalog of the old entry, and fill in IndexInfo for the new
index with that. This makes the process more consistent with
DefineIndex(), and the code is refactored with the addition of a routine
to create an IndexInfo node.
Reported-by: Manuel Rigger
Author: Michael Paquier
Discussion: https://postgr.es/m/CA+u7OA5Hp0ra235F3czPom_FyAd-3+XwSJmX95r1+sRPOJc9VQ@mail.gmail.com
Backpatch-through: 12
Move a few very simple node-creation and node-type-testing functions
from the planner's clauses.c to nodes/makefuncs and nodes/nodeFuncs.
There's nothing planner-specific about them, as evidenced by the
number of other places that were using them.
While at it, rename and_clause() etc to is_andclause() etc, to clarify
that they are node-type-testing functions not node-creation functions.
And use "static inline" implementations for the shortest ones.
Also, modify flatten_join_alias_vars() and some subsidiary functions
to take a Query not a PlannerInfo to define the join structure that
Vars should be translated according to. They were only using the
"parse" field of the PlannerInfo anyway, so this just requires removing
one level of indirection. The advantage is that now parse_agg.c can
use flatten_join_alias_vars() without the horrid kluge of creating an
incomplete PlannerInfo, which will allow that file to be decoupled from
relation.h in a subsequent patch.
Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
The original code to propagate NOT NULL and default expressions
specified when creating a partition was mostly copy-pasted from
typed-tables creation, but not being a great match it contained some
duplicity, inefficiency and bugs.
This commit fixes the bug that NOT NULL constraints declared in the
parent table would not be honored in the partition. One reported issue
that is not fixed is that a DEFAULT declared in the child is not used
when inserting through the parent. That would amount to a behavioral
change that's better not back-patched.
This rewrite makes the code simpler:
1. instead of checking for duplicate column names in its own block,
reuse the original one that already did that;
2. instead of concatenating the list of columns from parent and the one
declared in the partition and scanning the result to (incorrectly)
propagate defaults and not-null constraints, just scan the latter
searching the former for a match, and merging sensibly. This works
because we know the list in the parent is already correct and there can
only be one parent.
This rewrite makes ColumnDef->is_from_parent unused, so it's removed
on branch master; on released branches, it's kept as an unused field in
order not to cause ABI incompatibilities.
This commit also adds a test case for creating partitions with
collations mismatching that on the parent table, something that is
closely related to the code being patched. No code change is introduced
though, since that'd be a behavior change that could break some (broken)
working applications.
Amit Langote wrote a less invasive fix for the original
NOT NULL/defaults bug, but while I kept the tests he added, I ended up
not using his original code. Ashutosh Bapat reviewed Amit's fix. Amit
reviewed mine.
Author: Álvaro Herrera, Amit Langote
Reviewed-by: Ashutosh Bapat, Amit Langote
Reported-by: Jürgen Strobel (bug #15212)
Discussion: https://postgr.es/m/152746742177.1291.9847032632907407358@wrigleys.postgresql.org
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.
The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check(). It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints. Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite. This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.
In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.
I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.
Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
Not much to say about this; does what it says on the tin.
However, formerly, if there was a column list then the ANALYZE action was
implied; now it must be specified, or you get an error. This is because
it would otherwise be a bit unclear what the user meant if some tables
have column lists and some don't.
Nathan Bossart, reviewed by Michael Paquier and Masahiko Sawada, with some
editorialization by me
Discussion: https://postgr.es/m/E061A8E3-5E3D-494D-94F0-E8A9B312BBFC@amazon.com
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
Now that it has only INH_NO and INH_YES values, it's just weird that
it's not a plain bool, so make it that way.
Also rename RangeVar.inhOpt to "inh", to be like RangeTblEntry.inh.
My recollection is that we gave it a different name specifically because
it had a different representation than the derived bool value, but it
no longer does. And this is a good forcing function to be sure we
catch any places that are affected by the change.
Bump catversion because of possible effect on stored RangeVar nodes.
I'm not exactly convinced that we ever store RangeVar on disk, but
we have a readfuncs function for it, so be cautious. (If we do do so,
then commit e13486eba was in error not to bump catversion.)
Follow-on to commit e13486eba.
Discussion: http://postgr.es/m/CA+TgmoYe+EG7LdYX6pkcNxr4ygkP4+A=jm9o-CPXyOvRiCNwaQ@mail.gmail.com
Add a location field to the DefElem struct, used to parse many utility
commands. Update various error messages to supply error position
information.
To propogate the error position information in a more systematic way,
create a ParseState in standard_ProcessUtility() and pass that to
interested functions implementing the utility commands. This seems
better than passing the query string and then reassembling a parse state
ad hoc, which violates the encapsulation of the ParseState type.
Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
Previously, these commands always planned the given query and went through
executor startup before deciding not to actually run the query if WITH NO
DATA is specified. This behavior is problematic for pg_dump because it
may cause errors to be raised that we would rather not see before a
REFRESH MATERIALIZED VIEW command is issued. See for example bug #13907
from Marian Krucina. This change is not sufficient to fix that particular
bug, because we also need to tweak pg_dump to issue the REFRESH later,
but it's a necessary step on the way.
A user-visible side effect of doing things this way is that the returned
command tag for WITH NO DATA cases will now be "CREATE MATERIALIZED VIEW"
or "CREATE TABLE AS", not "SELECT 0". We could preserve the old behavior
but it would take more code, and arguably that was just an implementation
artifact not intended behavior anyhow.
In 9.5 and HEAD, also get rid of the static variable CreateAsReladdr, which
was trouble waiting to happen; there is not any prohibition on nested
CREATE commands.
Back-patch to 9.3 where CREATE MATERIALIZED VIEW was introduced.
Michael Paquier and Tom Lane
Report: <20160202161407.2778.24659@wrigleys.postgresql.org>
Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons. It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime. Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself. This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.
I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.
The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems. Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b66563cf2f.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)
The test added in commit 65c3d05e18e7c530 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum(). That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.
Back-patch to 9.5 where the expanded-object facility was added. It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.
This SQL standard functionality allows to aggregate data by different
GROUP BY clauses at once. Each grouping set returns rows with columns
grouped by in other sets set to NULL.
This could previously be achieved by doing each grouping as a separate
query, conjoined by UNION ALLs. Besides being considerably more concise,
grouping sets will in many cases be faster, requiring only one scan over
the underlying data.
The current implementation of grouping sets only supports using sorting
for input. Individual sets that share a sort order are computed in one
pass. If there are sets that don't share a sort order, additional sort &
aggregation steps are performed. These additional passes are sourced by
the previous sort step; thus avoiding repeated scans of the source data.
The code is structured in a way that adding support for purely using
hash aggregation or a mix of hashing and sorting is possible. Sorting
was chosen to be supported first, as it is the most generic method of
implementation.
Instead of, as in an earlier versions of the patch, representing the
chain of sort and aggregation steps as full blown planner and executor
nodes, all but the first sort are performed inside the aggregation node
itself. This avoids the need to do some unusual gymnastics to handle
having to return aggregated and non-aggregated tuples from underlying
nodes, as well as having to shut down underlying nodes early to limit
memory usage. The optimizer still builds Sort/Agg node to describe each
phase, but they're not part of the plan tree, but instead additional
data for the aggregation node. They're a convenient and preexisting way
to describe aggregation and sorting. The first (and possibly only) sort
step is still performed as a separate execution step. That retains
similarity with existing group by plans, makes rescans fairly simple,
avoids very deep plans (leading to slow explains) and easily allows to
avoid the sorting step if the underlying data is sorted by other means.
A somewhat ugly side of this patch is having to deal with a grammar
ambiguity between the new CUBE keyword and the cube extension/functions
named cube (and rollup). To avoid breaking existing deployments of the
cube extension it has not been renamed, neither has cube been made a
reserved keyword. Instead precedence hacking is used to make GROUP BY
cube(..) refer to the CUBE grouping sets feature, and not the function
cube(). To actually group by a function cube(), unlikely as that might
be, the function name has to be quoted.
Needs a catversion bump because stored rules may change.
Author: Andrew Gierth and Atri Sharma, with contributions from Andres Freund
Reviewed-By: Andres Freund, Noah Misch, Tom Lane, Svenne Krap, Tomas
Vondra, Erik Rijkers, Marti Raudsepp, Pavel Stehule
Discussion: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
This patch introduces generic support for ordered-set and hypothetical-set
aggregate functions, as well as implementations of the instances defined in
SQL:2008 (percentile_cont(), percentile_disc(), rank(), dense_rank(),
percent_rank(), cume_dist()). We also added mode() though it is not in the
spec, as well as versions of percentile_cont() and percentile_disc() that
can compute multiple percentile values in one pass over the data.
Unlike the original submission, this patch puts full control of the sorting
process in the hands of the aggregate's support functions. To allow the
support functions to find out how they're supposed to sort, a new API
function AggGetAggref() is added to nodeAgg.c. This allows retrieval of
the aggregate call's Aggref node, which may have other uses beyond the
immediate need. There is also support for ordered-set aggregates to
install cleanup callback functions, so that they can be sure that
infrastructure such as tuplesort objects gets cleaned up.
In passing, make some fixes in the recently-added support for variadic
aggregates, and make some editorial adjustments in the recent FILTER
additions for aggregates. Also, simplify use of IsBinaryCoercible() by
allowing it to succeed whenever the target type is ANY or ANYELEMENT.
It was inconsistent that it dealt with other polymorphic target types
but not these.
Atri Sharma and Andrew Gierth; reviewed by Pavel Stehule and Vik Fearing,
and rather heavily editorialized upon by Tom Lane
This patch adds the ability to write TABLE( function1(), function2(), ...)
as a single FROM-clause entry. The result is the concatenation of the
first row from each function, followed by the second row from each
function, etc; with NULLs inserted if any function produces fewer rows than
others. This is believed to be a much more useful behavior than what
Postgres currently does with multiple SRFs in a SELECT list.
This syntax also provides a reasonable way to combine use of column
definition lists with WITH ORDINALITY: put the column definition list
inside TABLE(), where it's clear that it doesn't control the ordinality
column as well.
Also implement SQL-compliant multiple-argument UNNEST(), by turning
UNNEST(a,b,c) into TABLE(unnest(a), unnest(b), unnest(c)).
The SQL standard specifies TABLE() with only a single function, not
multiple functions, and it seems to require an implicit UNNEST() which is
not what this patch does. There may be something wrong with that reading
of the spec, though, because if it's right then the spec's TABLE() is just
a pointless alternative spelling of UNNEST(). After further review of
that, we might choose to adopt a different syntax for what this patch does,
but in any case this functionality seems clearly worthwhile.
Andrew Gierth, reviewed by Zoltán Böszörményi and Heikki Linnakangas, and
significantly revised by me