The core of this patch is hash_array() and associated typcache
infrastructure, which works just about exactly like the existing support
for array comparison.
In addition I did some work to ensure that the planner won't think that an
array type is hashable unless its element type is hashable, and similarly
for sorting. This includes adding a datatype parameter to op_hashjoinable
and op_mergejoinable, and adding an explicit "hashable" flag to
SortGroupClause. The lack of a cross-check on the element type was a
pre-existing bug in mergejoin support --- but it didn't matter so much
before, because if you couldn't sort the element type there wasn't any good
alternative to failing anyhow. Now that we have the alternative of hashing
the array type, there are cases where we can avoid a failure by being picky
at the planner stage, so it's time to be picky.
The issue of exactly how to combine the per-element hash values to produce
an array hash is still open for discussion, but the rest of this is pretty
solid, so I'll commit it as-is.
array_in discards unquoted leading and trailing whitespace in array values,
while array_out is careful to quote array elements that contain whitespace.
This is problematic when the definition of "whitespace" varies between
locales: array_in could drop characters that were meant to be part of the
value. To avoid that, lock down "whitespace" to mean only the traditional
six ASCII space characters.
This change also works around a bug in OS X and some older BSD systems, in
which isspace() could return true for character fragments in UTF8 locales.
(There may be other places in PG where that bug could cause problems, but
this is the only one complained of so far; see recent report from Steven
Schlansker.)
Back-patch to 9.0, but not further. Given the lack of previous reports
of trouble, changing this behavior in stable branches seems to offer
more risk of breaking applications than reward of avoiding problems.
input functions don't accept either. While the backend can handle such
values fine, they can cause trouble in clients and in pg_dump/restore.
This is followup to the original issue on time datatype reported by Andrew
McNamara a while ago. Like that one, none of these seem worth
back-patching.
ArrayBuildState, per trouble report from Merlin Moncure. By adopting
this fix, we are essentially deciding that aggregate final-functions
should not modify their inputs ever. Adjust documentation and comments
to match that conclusion.
how this ought to behave for multi-dimensional arrays. Per discussion,
not having it at all seems better than having it with what might prove
to be the wrong behavior. We can always add it later when we have consensus
on the correct behavior.
alias for array_length(v,1). The efficiency gain here is doubtless
negligible --- what I'm interested in is making sure that if we have
second thoughts about the definition, we will not have to force a
post-beta initdb to change the implementation.
anyelement. This lacks the WITH ORDINALITY option, as well as the multiple
input arrays option added in the most recent SQL specs. But it's still a
pretty useful subset of the spec's functionality, and it is enough to
allow obsoleting contrib/intagg.
function as a special case.
This version still has the suspicious behavior of returning null for an
empty array (rather than zero), but this may need a wholesale revision of
empty array behavior, currently under discussion.
Jim Nasby, Robert Haas, Peter Eisentraut
and bogus documentation (dimension arrays are int[] not anyarray). Also the
errhint() messages seem to be really errdetail(), since there is nothing
heuristic about them. Some other trivial cosmetic improvements.
unnecessary #include lines in it. Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.
For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.
While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
results to contain uninitialized, unpredictable values. While this was okay
as far as the datatypes themselves were concerned, it's a problem for the
parser because occurrences of the "same" literal might not be recognized as
equal by datumIsEqual (and hence not by equal()). It seems sufficient to fix
this in the input functions since the only critical use of equal() is in the
parser's comparisons of ORDER BY and DISTINCT expressions.
Per a trouble report from Marc Cousin.
Patch all the way back. Interestingly, array_in did not have the bug before
8.2, which may explain why the issue went unnoticed for so long.
strings. This patch introduces four support functions cstring_to_text,
cstring_to_text_with_len, text_to_cstring, and text_to_cstring_buffer, and
two macros CStringGetTextDatum and TextDatumGetCString. A number of
existing macros that provided variants on these themes were removed.
Most of the places that need to make such conversions now require just one
function or macro call, in place of the multiple notational layers that used
to be needed. There are no longer any direct calls of textout or textin,
and we got most of the places that were using handmade conversions via
memcpy (there may be a few still lurking, though).
This commit doesn't make any serious effort to eliminate transient memory
leaks caused by detoasting toasted text objects before they reach
text_to_cstring. We changed PG_GETARG_TEXT_P to PG_GETARG_TEXT_PP in a few
places where it was easy, but much more could be done.
Brendan Jurd and Tom Lane
This commit breaks any code that assumes that the mere act of forming a tuple
(without writing it to disk) does not "toast" any fields. While all available
regression tests pass, I'm not totally sure that we've fixed every nook and
cranny, especially in contrib.
Greg Stark with some help from Tom Lane
seen by code inspecting the expression. The best way to do this seems
to be to drop the original representation as a function invocation, and
instead make a special expression node type that represents applying
the element-type coercion function to each array element. In this way
the element function is exposed and will be checked for volatility.
Per report from Guillaume Smet.
Get rid of VARATT_SIZE and VARATT_DATA, which were simply redundant with
VARSIZE and VARDATA, and as a consequence almost no code was using the
longer names. Rename the length fields of struct varlena and various
derived structures to catch anyplace that was accessing them directly;
and clean up various places so caught. In itself this patch doesn't
change any behavior at all, but it is necessary infrastructure if we hope
to play any games with the representation of varlena headers.
Greg Stark and Tom Lane
more space is needed, instead of incrementing by a fixed amount; the old
method wastes lots of space and time when the ultimate size is large.
Per gripe from Tatsuo.
present; intervening positions are filled with nulls. This behavior
is required by SQL99 but was not implementable before 8.2 due to lack
of support for nulls in arrays. I have only made it work for the
one-dimensional case, which is all that SQL99 requires. It seems quite
complex to get it right in higher dimensions, and since we never allowed
extension at all in higher dimensions, I think that must count as a
future feature addition not a bug fix.
we probably should make them work reliably for all arrays. Fix code
to handle NULLs and multidimensional arrays, move it into arrayfuncs.c.
GIN is still restricted to indexing arrays with no null elements, however.
functions are not strict, they will be called (passing a NULL first parameter)
during any attempt to input a NULL value of their datatype. Currently, all
our input functions are strict and so this commit does not change any
behavior. However, this will make it possible to build domain input functions
that centralize checking of domain constraints, thereby closing numerous holes
in our domain support, as per previous discussion.
While at it, I took the opportunity to introduce convenience functions
InputFunctionCall, OutputFunctionCall, etc to use in code that calls I/O
functions. This eliminates a lot of grotty-looking casts, but the main
motivation is to make it easier to grep for these places if we ever need
to touch them again.
comment line where output as too long, and update typedefs for /lib
directory. Also fix case where identifiers were used as variable names
in the backend, but as typedefs in ecpg (favor the backend for
indenting).
Backpatch to 8.1.X.
functionality, but I still need to make another pass looking at places
that incidentally use arrays (such as ACL manipulation) to make sure they
are null-safe. Contrib needs work too.
I have not changed the behaviors that are still under discussion about
array comparison and what to do with lower bounds.
their OID parameter. It was possible to crash the backend with
select array_in('{123}',0,0); because that would bypass the needed step
of initializing the workspace. These seem to be the only two places
with a problem, though (record_in and record_recv don't have the issue,
and the other array functions aren't depending on user-supplied input).
Back-patch as far as 7.4; 7.3 does not have the bug.
optional arguments as text input functions, ie, typioparam OID and
atttypmod. Make all the datatypes that use typmod enforce it the same
way in typreceive as they do in typinput. This fixes a problem with
failure to enforce length restrictions during COPY FROM BINARY.
only one argument. (Per recent discussion, the option to accept multiple
arguments is pretty useless for user-defined types, and would be a likely
source of security holes if it was used.) Simplify call sites of
output/send functions to not bother passing more than one argument.