This avoids additional translatable strings for each distinct type, as
well as making our quoting style around type names more consistent
(namely, that we don't quote type names). This continues what started
as f402b9950120.
Discussion: https://postgr.es/m/20160401170642.GA57509@alvherre.pgsql
Commit e09996ff8dee3f70 was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.
Commit bda76c1c8cfb1d11751ba6be88f0242850481733 caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json(). Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely. Per bug #13687 from Stepan Perlov. Back-patch to 9.4, like
the previous commit.
In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
Sufficiently-deep recursion heretofore elicited a SIGSEGV. If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections. That applies to 9.3, where the json parser
adopted recursive descent, and later versions. Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.
Oskari Saarenmaa, reviewed by Michael Paquier.
Security: CVE-2015-5289
These functions have been looking up type info for every row they
process. Instead of doing that we only look them up the first time
through and stash the information in the aggregate state object.
Affects json_agg, json_object_agg, jsonb_agg and jsonb_object_agg.
There is plenty more work to do in making these more efficient,
especially the jsonb functions, but this is a virtually cost free
improvement that can be done right away.
Backpatch to 9.5 where the jsonb variants were introduced.
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting. Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator). This inconsistency seemed likely to be
counter-intuitive to users. To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators. This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.
For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count. From there on, it's as
if a positive-wise value was originally provided.
Note that there is still a minor inconsistency here across jsonb
deletion operators. Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed. The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements. However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).
In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte. This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator). Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.
Peter Geoghegan and Andrew Dunstan
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.
The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.
Per gripe on pgsql-general.
We've been trying to support \u0000 in JSON values since commit
78ed8e03c67d7333, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a816320a2b53. However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all. It's also
entirely bogus to output \u0000 when de-escaped output is called for.
The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output. However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.
To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a816320a2b53, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)
In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.
Back-patch to 9.4, where we have to do something because 0ad1a816320a2b53
broke things for many cases having nothing to do with \u0000. 9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
The functions are:
to_jsonb()
jsonb_object()
jsonb_build_object()
jsonb_build_array()
jsonb_agg()
jsonb_object_agg()
Also along the way some better logic is implemented in
json_categorize_type() to match that in the newly implemented
jsonb_categorize_type().
Andrew Dunstan, reviewed by Pavel Stehule and Alvaro Herrera.
Davide S. reported that json_agg() sometimes produced multiple trailing
right brackets. This turns out to be because json_agg_finalfn() attaches
the final right bracket, and was doing so by modifying the aggregate state
in-place. That's verboten, though unfortunately it seems there's no way
for nodeAgg.c to check for such mistakes.
Fix that back to 9.3 where the broken code was introduced. In 9.4 and
HEAD, likewise fix json_object_agg(), which had copied the erroneous logic.
Make some cosmetic cleanups as well.
We expose a function IsValidJsonNumber that internally calls the lexer
for json numbers. That allows us to use the same test everywhere,
instead of inventing a broken test for hstore conversions. The new
function is also used in datum_to_json, replacing the code that is now
moved to the new function.
Backpatch to 9.3 where hstore_to_json_loose was introduced.
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json. This capability would be better added as an independent
function rather than being bolted on to row_to_json. Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.
Pointed out by Tom and discussed with Andrew and Robert.
We removed a similar ban on this in json_object recently, but the ban in
datum_to_json was left, which generate4d sprutious errors in othee json
generators, notable json_build_object.
Along the way, add an assertion that datum_to_json is not passed a null
key. All current callers comply with this rule, but the assertion will
catch any possible future misbehaviour.
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json. This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.
This also makes row_to_json into a single function with default values,
rather than having multiple functions. In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).
Pavel Stehule
Commit f30015b6d794c15d52abbb3df3a65081fbefb1ed made this happen for
timestamp and timestamptz, but it seems pretty inconsistent to not
do it for simple dates as well.
(In passing, I re-pgindent'd json.c.)
There's actually no need for any special case for unknown-type literals,
since we only need to push the value through its output function and
unknownout() works fine. The code that was here was completely bizarre
anyway, and would fail outright in cases that should work, not to mention
suffering from some copy-and-paste bugs.
Fix an obvious typo in json_build_object()'s complaint about invalid
number of arguments, and make the errhint a bit more sensible too.
Per discussion about how to word the improved hint, change the few places
in the documentation that refer to JSON object field names as "names" to
say "keys" instead, since that's what we've said in the vast majority of
places in the docs. Arguably "name" is more correct, since that's the
terminology used in RFC 7159; but we're stuck with "key" in view of the
naming of json_object_keys() so let's at least be self-consistent.
I adjusted a few code comments to match this as well, and failed to
resist the temptation to clean up some odd whitespace choices in the
same area, as well as a useless duplicate PG_ARGISNULL() check. There's
still quite a bit of code that uses the phrase "field name" in non-user-
visible ways, so I left those usages alone.
The isxdigit() calls relied on undefined behavior. The isascii() call
was well-defined, but our prevailing style is to include the cast.
Back-patch to 9.4, where the isxdigit() calls were introduced.
Per gripe from Peter Eisentraut and Tom Lane.
The output is slightly different, but still ISO 8601 compliant: to_char
doesn't output the minutes when time zone offset is an integer number of
hours, while EncodeDateTime outputs ":00".
The code is slightly adapted from code in xml.c
Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.
This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.
Per complaint from Teodor Sigaev.
Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().
This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.
These functions were relying on typcategory to identify arrays and
composites, which is not reliable and not the normal way to do it.
Using typcategory to identify boolean, numeric types, and json itself is
also pretty questionable, though the code in those cases didn't seem to be
at risk of anything worse than wrong output. Instead, use the standard
lsyscache functions to identify arrays and composites, and rely on a direct
check of the type OID for the other cases.
In HEAD, also be sure to look through domains so that a domain is treated
the same as its base type for conversions to JSON. However, this is a
small behavioral change; given the lack of field complaints, we won't
back-patch it.
In passing, refactor so that there's only one copy of the code that decides
which conversion strategy to apply, not multiple copies that could (and
have) gotten out of sync.
This code really needs to be refactored so that there aren't so many copies
that can diverge. Not to mention that this whole approach is probably
wrong. But for the moment I'll just stick my finger in the dike.
Per report from Michael Paquier.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.
The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.
This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.
Authors: Oleg Bartunov, Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.
Catalog version bumped.
Andrew Dunstan, reviewed by Marko Tiikkaja.
Instead of looking for characters that aren't valid in JSON numbers, we
simply pass the output string through the JSON number parser, and if it
fails the string is quoted. This means among other things that money and
domains over money will be quoted correctly and generate valid JSON.
Fixes bug #8676 reported by Anderson Cristian da Silva.
Backpatched to 9.2 where JSON generation was introduced.
The new JSON API uses a bit of an unusual typedef scheme, where for
example OkeysState is a pointer to okeysState. And that's not applied
consistently either. Change that to the more usual PostgreSQL style
where struct typedefs are upper case, and use pointers explicitly.
Several loops in the JSON parser examined a byte in memory just before
checking whether its address was in-bounds, so they could read one byte
beyond the datum's allocation. A SIGSEGV is possible. New in 9.3, so
no back-patch.