The function converted the first argument i.e. the number of tuples to
return into an unsigned integer which turns out to be huge number when
a negative value is passed. This causes the function to take much
longer time to execute. Instead, reject a negative value.
(If someone really wants to generate many more result rows, they
should consider adding a bigint or numeric variant.)
While at it, improve SQL test to test the number of tuples returned by
this function.
Author: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>
Discussion: https://www.postgresql.org/message-id/CAG-ACPW3PUUmSnM6cLa9Rw4BEC5cEMKjX8Gogc8gvQcT3cYA1A@mail.gmail.com
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426b it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
This isn't fully tested as yet, in particular I'm not sure that the
"foo--unpackaged--1.0.sql" scripts are OK. But it's time to get some
buildfarm cycles on it.
sepgsql is not converted to an extension, mainly because it seems to
require a very nonstandard installation process.
Dimitri Fontaine and Tom Lane
about best practice for including the module creation scripts: to wit
that you should suppress NOTICE messages. This avoids creating
regression failures by adding or removing comment lines in the module
scripts.
and RelationNameGetTupleDesc() as deprecated; remove uses of the
latter in the contrib library. Along the way, clean up crosstab()
code and documentation a little.
This was the last piece of code that took it upon itself to reset the
random number sequence --- now we only have srandom() in postmaster start,
backend start, and explicit setseed() operations.
>>Sounds like all that's needed for your case. But to be complete, in
>>addition to changing tablefunc.c we'd have to:
>>1) come up with a new function call signature that makes sense and does
>>not cause backward compatibility problems for other people
>>2) make needed changes to tablefunc.sql.in
>>3) adjust the README.tablefunc appropriately
>>4) adjust the regression test for new functionality
>>5) be sure we don't break any of the old cases
>>
>>If you want to submit a complete patch, it would be gratefully accepted
>>-- for review at least ;-)
>
> Here's the patch, at least for steps 1-3
Nabil Sayegh
Joe Conway
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
Create objects in public schema.
Make spacing/capitalization consistent.
Remove transaction block use for object creation.
Remove unneeded function GRANTs.
contrib/tablefunc. Specifically it replaces the use of VIEWs (for needed
composite type creation) with use of CREATE TYPE. It also performs GRANT
EXECUTE ON FUNCTION foo() TO PUBLIC for all of the created functions. There
was also a cosmetic change to two regression files.
Joe Conway