As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/*
|
2010-09-21 00:08:53 +04:00
|
|
|
* contrib/tablefunc/tablefunc.c
|
2008-05-17 05:28:26 +04:00
|
|
|
*
|
|
|
|
*
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
* tablefunc
|
|
|
|
*
|
|
|
|
* Sample to demonstrate C functions which return setof scalar
|
|
|
|
* and setof composite.
|
|
|
|
* Joe Conway <mail@joeconway.com>
|
2003-07-27 07:51:59 +04:00
|
|
|
* And contributors:
|
|
|
|
* Nabil Sayegh <postgresql@e-trolley.de>
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*
|
2020-01-01 20:21:45 +03:00
|
|
|
* Copyright (c) 2002-2020, PostgreSQL Global Development Group
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*
|
|
|
|
* Permission to use, copy, modify, and distribute this software and its
|
|
|
|
* documentation for any purpose, without fee, and without a written agreement
|
|
|
|
* is hereby granted, provided that the above copyright notice and this
|
|
|
|
* paragraph and the following two paragraphs appear in all copies.
|
2002-09-05 00:31:48 +04:00
|
|
|
*
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
* IN NO EVENT SHALL THE AUTHORS OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
|
|
|
|
* DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
|
|
|
|
* LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
|
|
|
|
* DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
|
|
|
|
* POSSIBILITY OF SUCH DAMAGE.
|
2002-09-05 00:31:48 +04:00
|
|
|
*
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
* THE AUTHORS AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
|
|
|
|
* INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
|
|
|
|
* AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS
|
|
|
|
* ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
|
|
|
|
* PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
#include "postgres.h"
|
|
|
|
|
2002-09-05 04:43:07 +04:00
|
|
|
#include <math.h>
|
|
|
|
|
2012-08-31 00:15:44 +04:00
|
|
|
#include "access/htup_details.h"
|
2009-01-07 16:44:37 +03:00
|
|
|
#include "catalog/pg_type.h"
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
#include "executor/spi.h"
|
2012-08-29 07:43:09 +04:00
|
|
|
#include "funcapi.h"
|
|
|
|
#include "lib/stringinfo.h"
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
#include "miscadmin.h"
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
#include "tablefunc.h"
|
2019-10-23 06:56:22 +03:00
|
|
|
#include "utils/builtins.h"
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2006-05-31 02:12:16 +04:00
|
|
|
PG_MODULE_MAGIC;
|
|
|
|
|
2007-12-07 19:44:58 +03:00
|
|
|
static HTAB *load_categories_hash(char *cats_sql, MemoryContext per_query_ctx);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
static Tuplestorestate *get_crosstab_tuplestore(char *sql,
|
2019-05-22 20:04:48 +03:00
|
|
|
HTAB *crosstab_hash,
|
|
|
|
TupleDesc tupdesc,
|
|
|
|
bool randomAccess);
|
2003-07-27 07:51:59 +04:00
|
|
|
static void validateConnectbyTupleDesc(TupleDesc tupdesc, bool show_branch, bool show_serial);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
static bool compatCrosstabTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
static void compatConnectbyTupleDescs(TupleDesc tupdesc1, TupleDesc tupdesc2);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
static void get_normal_pair(float8 *x1, float8 *x2);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
static Tuplestorestate *connectby(char *relname,
|
2019-05-22 20:04:48 +03:00
|
|
|
char *key_fld,
|
|
|
|
char *parent_key_fld,
|
|
|
|
char *orderby_fld,
|
|
|
|
char *branch_delim,
|
|
|
|
char *start_with,
|
|
|
|
int max_depth,
|
|
|
|
bool show_branch,
|
|
|
|
bool show_serial,
|
|
|
|
MemoryContext per_query_ctx,
|
|
|
|
bool randomAccess,
|
|
|
|
AttInMetadata *attinmeta);
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
static void build_tuplestore_recursively(char *key_fld,
|
2019-05-22 20:04:48 +03:00
|
|
|
char *parent_key_fld,
|
|
|
|
char *relname,
|
|
|
|
char *orderby_fld,
|
|
|
|
char *branch_delim,
|
|
|
|
char *start_with,
|
|
|
|
char *branch,
|
|
|
|
int level,
|
|
|
|
int *serial,
|
|
|
|
int max_depth,
|
|
|
|
bool show_branch,
|
|
|
|
bool show_serial,
|
|
|
|
MemoryContext per_query_ctx,
|
|
|
|
AttInMetadata *attinmeta,
|
|
|
|
Tuplestorestate *tupstore);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
typedef struct
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
float8 mean; /* mean of the distribution */
|
|
|
|
float8 stddev; /* stddev of the distribution */
|
|
|
|
float8 carry_val; /* hold second generated value */
|
|
|
|
bool use_carry; /* use second generated value */
|
2009-06-11 18:49:15 +04:00
|
|
|
} normal_rand_fctx;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
#define xpfree(var_) \
|
|
|
|
do { \
|
|
|
|
if (var_ != NULL) \
|
|
|
|
{ \
|
|
|
|
pfree(var_); \
|
|
|
|
var_ = NULL; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
2007-11-10 08:00:41 +03:00
|
|
|
#define xpstrdup(tgtvar_, srcvar_) \
|
|
|
|
do { \
|
|
|
|
if (srcvar_) \
|
|
|
|
tgtvar_ = pstrdup(srcvar_); \
|
|
|
|
else \
|
|
|
|
tgtvar_ = NULL; \
|
|
|
|
} while (0)
|
|
|
|
|
|
|
|
#define xstreq(tgtvar_, srcvar_) \
|
|
|
|
(((tgtvar_ == NULL) && (srcvar_ == NULL)) || \
|
|
|
|
((tgtvar_ != NULL) && (srcvar_ != NULL) && (strcmp(tgtvar_, srcvar_) == 0)))
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* sign, 10 digits, '\0' */
|
|
|
|
#define INT32_STRLEN 12
|
|
|
|
|
2007-12-07 19:44:58 +03:00
|
|
|
/* stored info for a crosstab category */
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
typedef struct crosstab_cat_desc
|
|
|
|
{
|
2007-12-07 19:44:58 +03:00
|
|
|
char *catname; /* full category name */
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 attidx; /* zero based */
|
2009-06-11 18:49:15 +04:00
|
|
|
} crosstab_cat_desc;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
#define MAX_CATNAME_LEN NAMEDATALEN
|
|
|
|
#define INIT_CATS 64
|
|
|
|
|
2007-12-07 19:44:58 +03:00
|
|
|
#define crosstab_HashTableLookup(HASHTAB, CATNAME, CATDESC) \
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
do { \
|
|
|
|
crosstab_HashEnt *hentry; char key[MAX_CATNAME_LEN]; \
|
|
|
|
\
|
|
|
|
MemSet(key, 0, MAX_CATNAME_LEN); \
|
|
|
|
snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATNAME); \
|
2007-12-07 19:44:58 +03:00
|
|
|
hentry = (crosstab_HashEnt*) hash_search(HASHTAB, \
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
key, HASH_FIND, NULL); \
|
|
|
|
if (hentry) \
|
|
|
|
CATDESC = hentry->catdesc; \
|
|
|
|
else \
|
|
|
|
CATDESC = NULL; \
|
|
|
|
} while(0)
|
|
|
|
|
2007-12-07 19:44:58 +03:00
|
|
|
#define crosstab_HashTableInsert(HASHTAB, CATDESC) \
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
do { \
|
|
|
|
crosstab_HashEnt *hentry; bool found; char key[MAX_CATNAME_LEN]; \
|
|
|
|
\
|
|
|
|
MemSet(key, 0, MAX_CATNAME_LEN); \
|
|
|
|
snprintf(key, MAX_CATNAME_LEN - 1, "%s", CATDESC->catname); \
|
2007-12-07 19:44:58 +03:00
|
|
|
hentry = (crosstab_HashEnt*) hash_search(HASHTAB, \
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
key, HASH_ENTER, &found); \
|
|
|
|
if (found) \
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR, \
|
|
|
|
(errcode(ERRCODE_DUPLICATE_OBJECT), \
|
|
|
|
errmsg("duplicate category name"))); \
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
hentry->catdesc = CATDESC; \
|
|
|
|
} while(0)
|
|
|
|
|
|
|
|
/* hash table */
|
|
|
|
typedef struct crosstab_hashent
|
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
char internal_catname[MAX_CATNAME_LEN];
|
|
|
|
crosstab_cat_desc *catdesc;
|
2009-06-11 18:49:15 +04:00
|
|
|
} crosstab_HashEnt;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/*
|
|
|
|
* normal_rand - return requested number of random values
|
|
|
|
* with a Gaussian (Normal) distribution.
|
|
|
|
*
|
2003-09-14 01:44:50 +04:00
|
|
|
* inputs are int numvals, float8 mean, and float8 stddev
|
|
|
|
* returns setof float8
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
|
|
|
PG_FUNCTION_INFO_V1(normal_rand);
|
|
|
|
Datum
|
|
|
|
normal_rand(PG_FUNCTION_ARGS)
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
FuncCallContext *funcctx;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 call_cntr;
|
|
|
|
uint64 max_calls;
|
2002-09-05 00:31:48 +04:00
|
|
|
normal_rand_fctx *fctx;
|
|
|
|
float8 mean;
|
|
|
|
float8 stddev;
|
|
|
|
float8 carry_val;
|
|
|
|
bool use_carry;
|
|
|
|
MemoryContext oldcontext;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/* stuff done only on the first call of the function */
|
2002-09-05 00:31:48 +04:00
|
|
|
if (SRF_IS_FIRSTCALL())
|
|
|
|
{
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/* create a function context for cross-call persistence */
|
2002-09-05 00:31:48 +04:00
|
|
|
funcctx = SRF_FIRSTCALL_INIT();
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2002-09-05 00:31:48 +04:00
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* switch to memory context appropriate for multiple function calls
|
2002-09-05 00:31:48 +04:00
|
|
|
*/
|
2002-08-29 21:14:33 +04:00
|
|
|
oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
|
|
|
|
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/* total number of tuples to be returned */
|
|
|
|
funcctx->max_calls = PG_GETARG_UINT32(0);
|
|
|
|
|
|
|
|
/* allocate memory for user context */
|
|
|
|
fctx = (normal_rand_fctx *) palloc(sizeof(normal_rand_fctx));
|
|
|
|
|
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* Use fctx to keep track of upper and lower bounds from call to call.
|
|
|
|
* It will also be used to carry over the spare value we get from the
|
|
|
|
* Box-Muller algorithm so that we only actually calculate a new value
|
|
|
|
* every other call.
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
|
|
|
fctx->mean = PG_GETARG_FLOAT8(1);
|
|
|
|
fctx->stddev = PG_GETARG_FLOAT8(2);
|
|
|
|
fctx->carry_val = 0;
|
|
|
|
fctx->use_carry = false;
|
|
|
|
|
|
|
|
funcctx->user_fctx = fctx;
|
|
|
|
|
2002-08-29 21:14:33 +04:00
|
|
|
MemoryContextSwitchTo(oldcontext);
|
2002-09-05 00:31:48 +04:00
|
|
|
}
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/* stuff done on every call of the function */
|
2002-09-05 00:31:48 +04:00
|
|
|
funcctx = SRF_PERCALL_SETUP();
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
call_cntr = funcctx->call_cntr;
|
|
|
|
max_calls = funcctx->max_calls;
|
|
|
|
fctx = funcctx->user_fctx;
|
|
|
|
mean = fctx->mean;
|
|
|
|
stddev = fctx->stddev;
|
|
|
|
carry_val = fctx->carry_val;
|
|
|
|
use_carry = fctx->use_carry;
|
2002-09-05 00:31:48 +04:00
|
|
|
|
|
|
|
if (call_cntr < max_calls) /* do when there is more left to send */
|
|
|
|
{
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
float8 result;
|
|
|
|
|
2002-09-05 00:31:48 +04:00
|
|
|
if (use_carry)
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* reset use_carry and use second value obtained on last pass
|
|
|
|
*/
|
|
|
|
fctx->use_carry = false;
|
|
|
|
result = carry_val;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
float8 normval_1;
|
|
|
|
float8 normval_2;
|
|
|
|
|
|
|
|
/* Get the next two normal values */
|
|
|
|
get_normal_pair(&normval_1, &normval_2);
|
|
|
|
|
|
|
|
/* use the first */
|
|
|
|
result = mean + (stddev * normval_1);
|
|
|
|
|
|
|
|
/* and save the second */
|
|
|
|
fctx->carry_val = mean + (stddev * normval_2);
|
|
|
|
fctx->use_carry = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* send the result */
|
2002-09-05 00:31:48 +04:00
|
|
|
SRF_RETURN_NEXT(funcctx, Float8GetDatum(result));
|
|
|
|
}
|
|
|
|
else
|
2005-10-15 06:49:52 +04:00
|
|
|
/* do when there is no more left */
|
2002-09-05 00:31:48 +04:00
|
|
|
SRF_RETURN_DONE(funcctx);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* get_normal_pair()
|
|
|
|
* Assigns normally distributed (Gaussian) values to a pair of provided
|
|
|
|
* parameters, with mean 0, standard deviation 1.
|
|
|
|
*
|
|
|
|
* This routine implements Algorithm P (Polar method for normal deviates)
|
|
|
|
* from Knuth's _The_Art_of_Computer_Programming_, Volume 2, 3rd ed., pages
|
|
|
|
* 122-126. Knuth cites his source as "The polar method", G. E. P. Box, M. E.
|
|
|
|
* Muller, and G. Marsaglia, _Annals_Math,_Stat._ 29 (1958), 610-611.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
get_normal_pair(float8 *x1, float8 *x2)
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
float8 u1,
|
|
|
|
u2,
|
|
|
|
v1,
|
|
|
|
v2,
|
|
|
|
s;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2002-09-14 23:32:54 +04:00
|
|
|
do
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
2002-09-14 23:32:54 +04:00
|
|
|
u1 = (float8) random() / (float8) MAX_RANDOM_VALUE;
|
|
|
|
u2 = (float8) random() / (float8) MAX_RANDOM_VALUE;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
v1 = (2.0 * u1) - 1.0;
|
|
|
|
v2 = (2.0 * u2) - 1.0;
|
|
|
|
|
2002-09-14 23:32:54 +04:00
|
|
|
s = v1 * v1 + v2 * v2;
|
|
|
|
} while (s >= 1.0);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2002-09-14 23:32:54 +04:00
|
|
|
if (s == 0)
|
|
|
|
{
|
|
|
|
*x1 = 0;
|
|
|
|
*x2 = 0;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
s = sqrt((-2.0 * log(s)) / s);
|
|
|
|
*x1 = v1 * s;
|
|
|
|
*x2 = v2 * s;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* crosstab - create a crosstab of rowids and values columns from a
|
|
|
|
* SQL statement returning one rowid column, one category column,
|
|
|
|
* and one value column.
|
|
|
|
*
|
|
|
|
* e.g. given sql which produces:
|
|
|
|
*
|
|
|
|
* rowid cat value
|
|
|
|
* ------+-------+-------
|
2002-09-05 00:31:48 +04:00
|
|
|
* row1 cat1 val1
|
|
|
|
* row1 cat2 val2
|
|
|
|
* row1 cat3 val3
|
|
|
|
* row1 cat4 val4
|
|
|
|
* row2 cat1 val5
|
|
|
|
* row2 cat2 val6
|
|
|
|
* row2 cat3 val7
|
|
|
|
* row2 cat4 val8
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*
|
|
|
|
* crosstab returns:
|
|
|
|
* <===== values columns =====>
|
|
|
|
* rowid cat1 cat2 cat3 cat4
|
|
|
|
* ------+-------+-------+-------+-------
|
2002-09-05 00:31:48 +04:00
|
|
|
* row1 val1 val2 val3 val4
|
|
|
|
* row2 val5 val6 val7 val8
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*
|
|
|
|
* NOTES:
|
|
|
|
* 1. SQL result must be ordered by 1,2.
|
|
|
|
* 2. The number of values columns depends on the tuple description
|
2005-05-31 03:09:07 +04:00
|
|
|
* of the function's declared return type. The return type's columns
|
2005-10-15 06:49:52 +04:00
|
|
|
* must match the datatypes of the SQL query's result. The datatype
|
|
|
|
* of the category column can be anything, however.
|
2005-05-31 03:09:07 +04:00
|
|
|
* 3. Missing values (i.e. not enough adjacent rows of same rowid to
|
2002-09-05 00:31:48 +04:00
|
|
|
* fill the number of result values columns) are filled in with nulls.
|
2005-05-31 03:09:07 +04:00
|
|
|
* 4. Extra values (i.e. too many adjacent rows of same rowid to fill
|
2002-09-05 00:31:48 +04:00
|
|
|
* the number of result values columns) are skipped.
|
2005-05-31 03:09:07 +04:00
|
|
|
* 5. Rows with all nulls in the values columns are skipped.
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
|
|
|
PG_FUNCTION_INFO_V1(crosstab);
|
|
|
|
Datum
|
|
|
|
crosstab(PG_FUNCTION_ARGS)
|
|
|
|
{
|
2008-12-01 04:30:18 +03:00
|
|
|
char *sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
|
|
|
|
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
|
|
|
|
Tuplestorestate *tupstore;
|
|
|
|
TupleDesc tupdesc;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 call_cntr;
|
|
|
|
uint64 max_calls;
|
2002-09-05 00:31:48 +04:00
|
|
|
AttInMetadata *attinmeta;
|
2008-12-01 04:30:18 +03:00
|
|
|
SPITupleTable *spi_tuptable;
|
2002-09-05 00:31:48 +04:00
|
|
|
TupleDesc spi_tupdesc;
|
2008-12-01 04:30:18 +03:00
|
|
|
bool firstpass;
|
|
|
|
char *lastrowid;
|
2002-09-05 00:31:48 +04:00
|
|
|
int i;
|
|
|
|
int num_categories;
|
2008-12-01 04:30:18 +03:00
|
|
|
MemoryContext per_query_ctx;
|
2002-09-05 00:31:48 +04:00
|
|
|
MemoryContext oldcontext;
|
2008-12-01 04:30:18 +03:00
|
|
|
int ret;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 proc;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* check to see if caller supports us returning a tuplestore */
|
|
|
|
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("set-valued function called in context that cannot accept a set")));
|
|
|
|
if (!(rsinfo->allowedModes & SFRM_Materialize))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-12-24 18:37:13 +03:00
|
|
|
errmsg("materialize mode required, but it is not allowed in this context")));
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* Connect to SPI manager */
|
|
|
|
if ((ret = SPI_connect()) < 0)
|
|
|
|
/* internal error */
|
|
|
|
elog(ERROR, "crosstab: SPI_connect returned %d", ret);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* Retrieve the desired rows */
|
|
|
|
ret = SPI_execute(sql, true, 0);
|
|
|
|
proc = SPI_processed;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* If no qualifying tuples, fall out early */
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
if (ret != SPI_OK_SELECT || proc == 0)
|
2008-12-01 04:30:18 +03:00
|
|
|
{
|
|
|
|
SPI_finish();
|
|
|
|
rsinfo->isDone = ExprEndResult;
|
|
|
|
PG_RETURN_NULL();
|
|
|
|
}
|
2005-05-31 03:09:07 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
spi_tuptable = SPI_tuptable;
|
|
|
|
spi_tupdesc = spi_tuptable->tupdesc;
|
2002-08-15 06:51:27 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/*----------
|
|
|
|
* The provided SQL query must always return three columns.
|
|
|
|
*
|
|
|
|
* 1. rowname
|
|
|
|
* the label or identifier for each row in the final result
|
|
|
|
* 2. category
|
|
|
|
* the label or identifier for each column in the final result
|
|
|
|
* 3. values
|
|
|
|
* the value for each column in the final result
|
|
|
|
*----------
|
|
|
|
*/
|
|
|
|
if (spi_tupdesc->natts != 3)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("invalid source data SQL statement"),
|
|
|
|
errdetail("The provided SQL must return 3 "
|
|
|
|
"columns: rowid, category, and values.")));
|
2008-12-01 02:23:52 +03:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* get a tuple descriptor for our result type */
|
|
|
|
switch (get_call_result_type(fcinfo, NULL, &tupdesc))
|
|
|
|
{
|
|
|
|
case TYPEFUNC_COMPOSITE:
|
|
|
|
/* success */
|
|
|
|
break;
|
|
|
|
case TYPEFUNC_RECORD:
|
|
|
|
/* failed to determine actual type of RECORD */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("function returning record called in context "
|
|
|
|
"that cannot accept type record")));
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
/* result type isn't composite */
|
2015-08-03 06:49:19 +03:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
|
|
|
errmsg("return type must be a row type")));
|
2008-12-01 04:30:18 +03:00
|
|
|
break;
|
|
|
|
}
|
2008-12-01 02:23:52 +03:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* Check that return tupdesc is compatible with the data we got from SPI,
|
|
|
|
* at least based on number and type of attributes
|
2008-12-01 04:30:18 +03:00
|
|
|
*/
|
|
|
|
if (!compatCrosstabTupleDescs(tupdesc, spi_tupdesc))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("return and sql tuple descriptions are " \
|
|
|
|
"incompatible")));
|
2002-08-15 06:51:27 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/*
|
|
|
|
* switch to long-lived memory context
|
|
|
|
*/
|
|
|
|
oldcontext = MemoryContextSwitchTo(per_query_ctx);
|
2002-08-15 06:51:27 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* make sure we have a persistent copy of the result tupdesc */
|
|
|
|
tupdesc = CreateTupleDescCopy(tupdesc);
|
2002-08-15 06:51:27 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* initialize our tuplestore in long-lived context */
|
|
|
|
tupstore =
|
|
|
|
tuplestore_begin_heap(rsinfo->allowedModes & SFRM_Materialize_Random,
|
|
|
|
false, work_mem);
|
2002-08-29 21:14:33 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
MemoryContextSwitchTo(oldcontext);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* Generate attribute metadata needed later to produce tuples from raw C
|
|
|
|
* strings
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
2008-12-01 04:30:18 +03:00
|
|
|
attinmeta = TupleDescGetAttInMetadata(tupdesc);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* total number of tuples to be examined */
|
|
|
|
max_calls = proc;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/* the return tuple always must have 1 rowid + num_categories columns */
|
2008-12-01 04:30:18 +03:00
|
|
|
num_categories = tupdesc->natts - 1;
|
2002-09-05 00:31:48 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
firstpass = true;
|
|
|
|
lastrowid = NULL;
|
|
|
|
|
|
|
|
for (call_cntr = 0; call_cntr < max_calls; call_cntr++)
|
2002-09-05 00:31:48 +04:00
|
|
|
{
|
2007-11-10 08:00:41 +03:00
|
|
|
bool skip_tuple = false;
|
2008-12-01 04:30:18 +03:00
|
|
|
char **values;
|
|
|
|
|
|
|
|
/* allocate and zero space */
|
|
|
|
values = (char **) palloc0((1 + num_categories) * sizeof(char *));
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* now loop through the sql results and assign each value in sequence
|
|
|
|
* to the next category
|
2008-12-01 04:30:18 +03:00
|
|
|
*/
|
|
|
|
for (i = 0; i < num_categories; i++)
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
2008-12-01 04:30:18 +03:00
|
|
|
HeapTuple spi_tuple;
|
|
|
|
char *rowid;
|
|
|
|
|
|
|
|
/* see if we've gone too far already */
|
|
|
|
if (call_cntr >= max_calls)
|
|
|
|
break;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/* get the next sql result tuple */
|
|
|
|
spi_tuple = spi_tuptable->vals[call_cntr];
|
|
|
|
|
|
|
|
/* get the rowid from the current sql result tuple */
|
|
|
|
rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* If this is the first pass through the values for this rowid,
|
|
|
|
* set the first column to rowid
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
2008-12-01 04:30:18 +03:00
|
|
|
if (i == 0)
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
2008-12-01 04:30:18 +03:00
|
|
|
xpstrdup(values[0], rowid);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2007-11-10 08:00:41 +03:00
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* Check to see if the rowid is the same as that of the last
|
|
|
|
* tuple sent -- if so, skip this tuple entirely
|
2007-11-10 08:00:41 +03:00
|
|
|
*/
|
2008-12-01 04:30:18 +03:00
|
|
|
if (!firstpass && xstreq(lastrowid, rowid))
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
2008-12-01 04:30:18 +03:00
|
|
|
xpfree(rowid);
|
|
|
|
skip_tuple = true;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-11-10 08:00:41 +03:00
|
|
|
/*
|
2008-12-01 04:30:18 +03:00
|
|
|
* If rowid hasn't changed on us, continue building the output
|
|
|
|
* tuple.
|
2007-11-10 08:00:41 +03:00
|
|
|
*/
|
2008-12-01 04:30:18 +03:00
|
|
|
if (xstreq(rowid, values[0]))
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
2008-12-01 04:30:18 +03:00
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* Get the next category item value, which is always attribute
|
|
|
|
* number three.
|
2008-12-01 04:30:18 +03:00
|
|
|
*
|
2009-06-11 18:49:15 +04:00
|
|
|
* Be careful to assign the value to the array index based on
|
|
|
|
* which category we are presently processing.
|
2008-12-01 04:30:18 +03:00
|
|
|
*/
|
|
|
|
values[1 + i] = SPI_getvalue(spi_tuple, spi_tupdesc, 3);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
/*
|
|
|
|
* increment the counter since we consume a row for each
|
2009-06-11 18:49:15 +04:00
|
|
|
* category, but not for last pass because the outer loop will
|
|
|
|
* do that for us
|
2008-12-01 04:30:18 +03:00
|
|
|
*/
|
|
|
|
if (i < (num_categories - 1))
|
|
|
|
call_cntr++;
|
|
|
|
xpfree(rowid);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
/*
|
2009-06-11 18:49:15 +04:00
|
|
|
* We'll fill in NULLs for the missing values, but we need to
|
|
|
|
* decrement the counter since this sql result row doesn't
|
|
|
|
* belong to the current output tuple.
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
2008-12-01 04:30:18 +03:00
|
|
|
call_cntr--;
|
|
|
|
xpfree(rowid);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
if (!skip_tuple)
|
|
|
|
{
|
|
|
|
HeapTuple tuple;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2009-12-29 20:40:59 +03:00
|
|
|
/* build the tuple and store it */
|
2008-12-01 04:30:18 +03:00
|
|
|
tuple = BuildTupleFromCStrings(attinmeta, values);
|
|
|
|
tuplestore_puttuple(tupstore, tuple);
|
|
|
|
heap_freetuple(tuple);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
2008-12-01 04:30:18 +03:00
|
|
|
|
|
|
|
/* Remember current rowid */
|
|
|
|
xpfree(lastrowid);
|
|
|
|
xpstrdup(lastrowid, values[0]);
|
|
|
|
firstpass = false;
|
|
|
|
|
|
|
|
/* Clean up */
|
|
|
|
for (i = 0; i < num_categories + 1; i++)
|
|
|
|
if (values[i] != NULL)
|
|
|
|
pfree(values[i]);
|
|
|
|
pfree(values);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
2008-12-01 04:30:18 +03:00
|
|
|
|
|
|
|
/* let the caller know we're sending back a tuplestore */
|
|
|
|
rsinfo->returnMode = SFRM_Materialize;
|
|
|
|
rsinfo->setResult = tupstore;
|
|
|
|
rsinfo->setDesc = tupdesc;
|
|
|
|
|
|
|
|
/* release SPI related resources (and return to caller's context) */
|
|
|
|
SPI_finish();
|
|
|
|
|
|
|
|
return (Datum) 0;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
}
|
|
|
|
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
/*
|
2003-08-04 04:43:34 +04:00
|
|
|
* crosstab_hash - reimplement crosstab as materialized function and
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
* properly deal with missing values (i.e. don't pack remaining
|
|
|
|
* values to the left)
|
2003-08-04 04:43:34 +04:00
|
|
|
*
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
* crosstab - create a crosstab of rowids and values columns from a
|
|
|
|
* SQL statement returning one rowid column, one category column,
|
|
|
|
* and one value column.
|
|
|
|
*
|
|
|
|
* e.g. given sql which produces:
|
|
|
|
*
|
|
|
|
* rowid cat value
|
|
|
|
* ------+-------+-------
|
|
|
|
* row1 cat1 val1
|
|
|
|
* row1 cat2 val2
|
|
|
|
* row1 cat4 val4
|
|
|
|
* row2 cat1 val5
|
|
|
|
* row2 cat2 val6
|
|
|
|
* row2 cat3 val7
|
|
|
|
* row2 cat4 val8
|
|
|
|
*
|
|
|
|
* crosstab returns:
|
|
|
|
* <===== values columns =====>
|
|
|
|
* rowid cat1 cat2 cat3 cat4
|
|
|
|
* ------+-------+-------+-------+-------
|
|
|
|
* row1 val1 val2 null val4
|
|
|
|
* row2 val5 val6 val7 val8
|
|
|
|
*
|
|
|
|
* NOTES:
|
|
|
|
* 1. SQL result must be ordered by 1.
|
|
|
|
* 2. The number of values columns depends on the tuple description
|
|
|
|
* of the function's declared return type.
|
2005-05-31 03:09:07 +04:00
|
|
|
* 3. Missing values (i.e. missing category) are filled in with nulls.
|
|
|
|
* 4. Extra values (i.e. not in category results) are skipped.
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
|
|
|
PG_FUNCTION_INFO_V1(crosstab_hash);
|
|
|
|
Datum
|
|
|
|
crosstab_hash(PG_FUNCTION_ARGS)
|
|
|
|
{
|
2008-03-26 01:42:46 +03:00
|
|
|
char *sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
|
|
|
|
char *cats_sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
|
2003-08-04 04:43:34 +04:00
|
|
|
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
|
|
|
|
TupleDesc tupdesc;
|
|
|
|
MemoryContext per_query_ctx;
|
|
|
|
MemoryContext oldcontext;
|
2007-12-07 19:44:58 +03:00
|
|
|
HTAB *crosstab_hash;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* check to see if caller supports us returning a tuplestore */
|
2005-07-09 05:53:22 +04:00
|
|
|
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2005-07-09 05:53:22 +04:00
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("set-valued function called in context that cannot accept a set")));
|
2008-10-29 01:02:06 +03:00
|
|
|
if (!(rsinfo->allowedModes & SFRM_Materialize) ||
|
|
|
|
rsinfo->expectedDesc == NULL)
|
2005-07-09 05:53:22 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-12-24 18:37:13 +03:00
|
|
|
errmsg("materialize mode required, but it is not allowed in this context")));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
|
|
|
|
oldcontext = MemoryContextSwitchTo(per_query_ctx);
|
|
|
|
|
|
|
|
/* get the requested return tuple description */
|
|
|
|
tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check to make sure we have a reasonable tuple descriptor
|
|
|
|
*
|
2003-08-04 04:43:34 +04:00
|
|
|
* Note we will attempt to coerce the values into whatever the return
|
|
|
|
* attribute type is and depend on the "in" function to complain if
|
|
|
|
* needed.
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
|
|
|
if (tupdesc->natts < 2)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("query-specified return tuple and " \
|
|
|
|
"crosstab function are not compatible")));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* load up the categories hash table */
|
2007-12-07 19:44:58 +03:00
|
|
|
crosstab_hash = load_categories_hash(cats_sql, per_query_ctx);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* let the caller know we're sending back a tuplestore */
|
|
|
|
rsinfo->returnMode = SFRM_Materialize;
|
|
|
|
|
|
|
|
/* now go build it */
|
|
|
|
rsinfo->setResult = get_crosstab_tuplestore(sql,
|
2007-12-07 19:44:58 +03:00
|
|
|
crosstab_hash,
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
tupdesc,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
rsinfo->allowedModes & SFRM_Materialize_Random);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/*
|
|
|
|
* SFRM_Materialize mode expects us to return a NULL Datum. The actual
|
2005-10-15 06:49:52 +04:00
|
|
|
* tuples are in our tuplestore and passed back through rsinfo->setResult.
|
|
|
|
* rsinfo->setDesc is set to the tuple description that we actually used
|
|
|
|
* to build our tuples with, so the caller can verify we did what it was
|
|
|
|
* expecting.
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
|
|
|
rsinfo->setDesc = tupdesc;
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
return (Datum) 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* load up the categories hash table
|
|
|
|
*/
|
2007-12-07 19:44:58 +03:00
|
|
|
static HTAB *
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
load_categories_hash(char *cats_sql, MemoryContext per_query_ctx)
|
|
|
|
{
|
2007-12-07 19:44:58 +03:00
|
|
|
HTAB *crosstab_hash;
|
2003-08-04 04:43:34 +04:00
|
|
|
HASHCTL ctl;
|
|
|
|
int ret;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 proc;
|
2003-08-04 04:43:34 +04:00
|
|
|
MemoryContext SPIcontext;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* initialize the category hash table */
|
2007-12-07 19:44:58 +03:00
|
|
|
MemSet(&ctl, 0, sizeof(ctl));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
ctl.keysize = MAX_CATNAME_LEN;
|
|
|
|
ctl.entrysize = sizeof(crosstab_HashEnt);
|
2007-12-07 19:44:58 +03:00
|
|
|
ctl.hcxt = per_query_ctx;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* use INIT_CATS, defined above as a guess of how many hash table entries
|
|
|
|
* to create, initially
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
2007-12-07 19:44:58 +03:00
|
|
|
crosstab_hash = hash_create("crosstab hash",
|
|
|
|
INIT_CATS,
|
|
|
|
&ctl,
|
|
|
|
HASH_ELEM | HASH_CONTEXT);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* Connect to SPI manager */
|
|
|
|
if ((ret = SPI_connect()) < 0)
|
2003-07-24 21:52:50 +04:00
|
|
|
/* internal error */
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
elog(ERROR, "load_categories_hash: SPI_connect returned %d", ret);
|
|
|
|
|
|
|
|
/* Retrieve the category name rows */
|
2004-09-14 00:10:13 +04:00
|
|
|
ret = SPI_execute(cats_sql, true, 0);
|
2007-12-07 19:44:58 +03:00
|
|
|
proc = SPI_processed;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* Check for qualifying tuples */
|
|
|
|
if ((ret == SPI_OK_SELECT) && (proc > 0))
|
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
SPITupleTable *spi_tuptable = SPI_tuptable;
|
|
|
|
TupleDesc spi_tupdesc = spi_tuptable->tupdesc;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 i;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* The provided categories SQL query must always return one column:
|
|
|
|
* category - the label or identifier for each column
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
|
|
|
if (spi_tupdesc->natts != 1)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("provided \"categories\" SQL must " \
|
|
|
|
"return 1 column of at least one row")));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
for (i = 0; i < proc; i++)
|
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
crosstab_cat_desc *catdesc;
|
|
|
|
char *catname;
|
|
|
|
HeapTuple spi_tuple;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* get the next sql result tuple */
|
|
|
|
spi_tuple = spi_tuptable->vals[i];
|
|
|
|
|
|
|
|
/* get the category from the current sql result tuple */
|
|
|
|
catname = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
|
2019-12-23 21:33:25 +03:00
|
|
|
if (catname == NULL)
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("provided \"categories\" SQL must " \
|
|
|
|
"not return NULL values")));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
SPIcontext = MemoryContextSwitchTo(per_query_ctx);
|
|
|
|
|
|
|
|
catdesc = (crosstab_cat_desc *) palloc(sizeof(crosstab_cat_desc));
|
|
|
|
catdesc->catname = catname;
|
|
|
|
catdesc->attidx = i;
|
|
|
|
|
|
|
|
/* Add the proc description block to the hashtable */
|
2007-12-07 19:44:58 +03:00
|
|
|
crosstab_HashTableInsert(crosstab_hash, catdesc);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(SPIcontext);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (SPI_finish() != SPI_OK_FINISH)
|
2003-07-24 21:52:50 +04:00
|
|
|
/* internal error */
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
elog(ERROR, "load_categories_hash: SPI_finish() failed");
|
|
|
|
|
2007-12-07 19:44:58 +03:00
|
|
|
return crosstab_hash;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* create and populate the crosstab tuplestore using the provided source query
|
|
|
|
*/
|
|
|
|
static Tuplestorestate *
|
|
|
|
get_crosstab_tuplestore(char *sql,
|
2007-12-07 19:44:58 +03:00
|
|
|
HTAB *crosstab_hash,
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
TupleDesc tupdesc,
|
2008-10-29 03:00:39 +03:00
|
|
|
bool randomAccess)
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
Tuplestorestate *tupstore;
|
2007-12-07 19:44:58 +03:00
|
|
|
int num_categories = hash_get_num_entries(crosstab_hash);
|
2003-08-04 04:43:34 +04:00
|
|
|
AttInMetadata *attinmeta = TupleDescGetAttInMetadata(tupdesc);
|
|
|
|
char **values;
|
|
|
|
HeapTuple tuple;
|
|
|
|
int ret;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 proc;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
2008-10-29 03:00:39 +03:00
|
|
|
/* initialize our tuplestore (while still in query context!) */
|
|
|
|
tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* Connect to SPI manager */
|
|
|
|
if ((ret = SPI_connect()) < 0)
|
2003-07-24 21:52:50 +04:00
|
|
|
/* internal error */
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
elog(ERROR, "get_crosstab_tuplestore: SPI_connect returned %d", ret);
|
|
|
|
|
|
|
|
/* Now retrieve the crosstab source rows */
|
2004-09-14 00:10:13 +04:00
|
|
|
ret = SPI_execute(sql, true, 0);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
proc = SPI_processed;
|
|
|
|
|
|
|
|
/* Check for qualifying tuples */
|
|
|
|
if ((ret == SPI_OK_SELECT) && (proc > 0))
|
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
SPITupleTable *spi_tuptable = SPI_tuptable;
|
|
|
|
TupleDesc spi_tupdesc = spi_tuptable->tupdesc;
|
|
|
|
int ncols = spi_tupdesc->natts;
|
|
|
|
char *rowid;
|
|
|
|
char *lastrowid = NULL;
|
2007-11-10 08:00:41 +03:00
|
|
|
bool firstpass = true;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 i;
|
|
|
|
int j;
|
2003-08-04 04:43:34 +04:00
|
|
|
int result_ncols;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
2004-08-11 04:49:35 +04:00
|
|
|
if (num_categories == 0)
|
|
|
|
{
|
|
|
|
/* no qualifying category tuples */
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
2004-08-29 09:07:03 +04:00
|
|
|
errmsg("provided \"categories\" SQL must " \
|
2004-08-11 04:49:35 +04:00
|
|
|
"return 1 column of at least one row")));
|
|
|
|
}
|
|
|
|
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* The provided SQL query must always return at least three columns:
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*
|
2005-11-22 21:17:34 +03:00
|
|
|
* 1. rowname the label for each row - column 1 in the final result
|
|
|
|
* 2. category the label for each value-column in the final result 3.
|
2005-10-15 06:49:52 +04:00
|
|
|
* value the values used to populate the value-columns
|
2003-08-04 04:43:34 +04:00
|
|
|
*
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
* If there are more than three columns, the last two are taken as
|
2005-10-15 06:49:52 +04:00
|
|
|
* "category" and "values". The first column is taken as "rowname".
|
|
|
|
* Additional columns (2 thru N-2) are assumed the same for the same
|
|
|
|
* "rowname", and are copied into the result tuple from the first time
|
|
|
|
* we encounter a particular rowname.
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
|
|
|
if (ncols < 3)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
|
|
|
|
errmsg("invalid source data SQL statement"),
|
|
|
|
errdetail("The provided SQL must return 3 " \
|
2005-10-15 06:49:52 +04:00
|
|
|
" columns; rowid, category, and values.")));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
result_ncols = (ncols - 2) + num_categories;
|
|
|
|
|
|
|
|
/* Recheck to make sure we tuple descriptor still looks reasonable */
|
|
|
|
if (tupdesc->natts != result_ncols)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_SYNTAX_ERROR),
|
|
|
|
errmsg("invalid return type"),
|
2006-03-01 09:30:32 +03:00
|
|
|
errdetail("Query-specified return " \
|
2003-07-24 21:52:50 +04:00
|
|
|
"tuple has %d columns but crosstab " \
|
2006-03-01 09:30:32 +03:00
|
|
|
"returns %d.", tupdesc->natts, result_ncols)));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
2019-03-27 06:02:50 +03:00
|
|
|
/* allocate space and make sure it's clear */
|
|
|
|
values = (char **) palloc0(result_ncols * sizeof(char *));
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
for (i = 0; i < proc; i++)
|
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
HeapTuple spi_tuple;
|
|
|
|
crosstab_cat_desc *catdesc;
|
|
|
|
char *catname;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
/* get the next sql result tuple */
|
|
|
|
spi_tuple = spi_tuptable->vals[i];
|
|
|
|
|
|
|
|
/* get the rowid from the current sql result tuple */
|
|
|
|
rowid = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* if we're on a new output row, grab the column values up to
|
|
|
|
* column N-2 now
|
|
|
|
*/
|
2007-11-10 08:00:41 +03:00
|
|
|
if (firstpass || !xstreq(lastrowid, rowid))
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
{
|
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* a new row means we need to flush the old one first, unless
|
|
|
|
* we're on the very first row
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
*/
|
2007-11-10 08:00:41 +03:00
|
|
|
if (!firstpass)
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
{
|
|
|
|
/* rowid changed, flush the previous output row */
|
|
|
|
tuple = BuildTupleFromCStrings(attinmeta, values);
|
2006-06-27 06:51:40 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
tuplestore_puttuple(tupstore, tuple);
|
2006-06-27 06:51:40 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
for (j = 0; j < result_ncols; j++)
|
|
|
|
xpfree(values[j]);
|
|
|
|
}
|
|
|
|
|
|
|
|
values[0] = rowid;
|
|
|
|
for (j = 1; j < ncols - 2; j++)
|
|
|
|
values[j] = SPI_getvalue(spi_tuple, spi_tupdesc, j + 1);
|
2007-11-10 08:00:41 +03:00
|
|
|
|
|
|
|
/* we're no longer on the first pass */
|
|
|
|
firstpass = false;
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* look up the category and fill in the appropriate column */
|
|
|
|
catname = SPI_getvalue(spi_tuple, spi_tupdesc, ncols - 1);
|
|
|
|
|
|
|
|
if (catname != NULL)
|
|
|
|
{
|
2007-12-07 19:44:58 +03:00
|
|
|
crosstab_HashTableLookup(crosstab_hash, catname, catdesc);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
|
|
|
|
if (catdesc)
|
2003-08-04 04:43:34 +04:00
|
|
|
values[catdesc->attidx + ncols - 2] =
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
SPI_getvalue(spi_tuple, spi_tupdesc, ncols);
|
|
|
|
}
|
|
|
|
|
|
|
|
xpfree(lastrowid);
|
2007-11-10 08:00:41 +03:00
|
|
|
xpstrdup(lastrowid, rowid);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
/* flush the last output row */
|
|
|
|
tuple = BuildTupleFromCStrings(attinmeta, values);
|
|
|
|
|
2006-06-27 06:51:40 +04:00
|
|
|
tuplestore_puttuple(tupstore, tuple);
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
if (SPI_finish() != SPI_OK_FINISH)
|
2003-07-24 21:52:50 +04:00
|
|
|
/* internal error */
|
Attached is an update to contrib/tablefunc. It implements a new hashed
version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
2003-03-20 09:46:30 +03:00
|
|
|
elog(ERROR, "get_crosstab_tuplestore: SPI_finish() failed");
|
|
|
|
|
|
|
|
tuplestore_donestoring(tupstore);
|
|
|
|
|
|
|
|
return tupstore;
|
|
|
|
}
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/*
|
|
|
|
* connectby_text - produce a result set from a hierarchical (parent/child)
|
|
|
|
* table.
|
|
|
|
*
|
|
|
|
* e.g. given table foo:
|
|
|
|
*
|
2003-07-27 07:51:59 +04:00
|
|
|
* keyid parent_keyid pos
|
|
|
|
* ------+------------+--
|
2003-08-04 04:43:34 +04:00
|
|
|
* row1 NULL 0
|
|
|
|
* row2 row1 0
|
|
|
|
* row3 row1 0
|
|
|
|
* row4 row2 1
|
|
|
|
* row5 row2 0
|
|
|
|
* row6 row4 0
|
|
|
|
* row7 row3 0
|
|
|
|
* row8 row6 0
|
|
|
|
* row9 row5 0
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
*
|
|
|
|
*
|
2003-07-27 07:51:59 +04:00
|
|
|
* connectby(text relname, text keyid_fld, text parent_keyid_fld
|
2003-08-04 04:43:34 +04:00
|
|
|
* [, text orderby_fld], text start_with, int max_depth
|
|
|
|
* [, text branch_delim])
|
2003-07-27 07:51:59 +04:00
|
|
|
* connectby('foo', 'keyid', 'parent_keyid', 'pos', 'row2', 0, '~') returns:
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
*
|
2003-08-04 04:43:34 +04:00
|
|
|
* keyid parent_id level branch serial
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
* ------+-----------+--------+-----------------------
|
2003-08-04 04:43:34 +04:00
|
|
|
* row2 NULL 0 row2 1
|
|
|
|
* row5 row2 1 row2~row5 2
|
|
|
|
* row9 row5 2 row2~row5~row9 3
|
|
|
|
* row4 row2 1 row2~row4 4
|
|
|
|
* row6 row4 2 row2~row4~row6 5
|
2003-07-27 07:51:59 +04:00
|
|
|
* row8 row6 3 row2~row4~row6~row8 6
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
*
|
|
|
|
*/
|
|
|
|
PG_FUNCTION_INFO_V1(connectby_text);
|
|
|
|
|
|
|
|
#define CONNECTBY_NCOLS 4
|
|
|
|
#define CONNECTBY_NCOLS_NOBRANCH 3
|
|
|
|
|
|
|
|
Datum
|
|
|
|
connectby_text(PG_FUNCTION_ARGS)
|
|
|
|
{
|
2008-03-26 01:42:46 +03:00
|
|
|
char *relname = text_to_cstring(PG_GETARG_TEXT_PP(0));
|
|
|
|
char *key_fld = text_to_cstring(PG_GETARG_TEXT_PP(1));
|
|
|
|
char *parent_key_fld = text_to_cstring(PG_GETARG_TEXT_PP(2));
|
|
|
|
char *start_with = text_to_cstring(PG_GETARG_TEXT_PP(3));
|
2002-09-05 00:31:48 +04:00
|
|
|
int max_depth = PG_GETARG_INT32(4);
|
|
|
|
char *branch_delim = NULL;
|
|
|
|
bool show_branch = false;
|
2003-07-27 07:51:59 +04:00
|
|
|
bool show_serial = false;
|
2002-09-05 00:31:48 +04:00
|
|
|
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
|
|
|
|
TupleDesc tupdesc;
|
|
|
|
AttInMetadata *attinmeta;
|
|
|
|
MemoryContext per_query_ctx;
|
|
|
|
MemoryContext oldcontext;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2003-05-16 10:07:51 +04:00
|
|
|
/* check to see if caller supports us returning a tuplestore */
|
2005-07-09 05:53:22 +04:00
|
|
|
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2005-07-09 05:53:22 +04:00
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("set-valued function called in context that cannot accept a set")));
|
2008-10-29 01:02:06 +03:00
|
|
|
if (!(rsinfo->allowedModes & SFRM_Materialize) ||
|
|
|
|
rsinfo->expectedDesc == NULL)
|
2005-07-09 05:53:22 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-12-24 18:37:13 +03:00
|
|
|
errmsg("materialize mode required, but it is not allowed in this context")));
|
2003-05-16 10:07:51 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
if (fcinfo->nargs == 6)
|
|
|
|
{
|
2008-03-26 01:42:46 +03:00
|
|
|
branch_delim = text_to_cstring(PG_GETARG_TEXT_PP(5));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
show_branch = true;
|
|
|
|
}
|
> The previous patch fixed an infinite recursion bug in
> contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error
> seems to occur even if a table has commonplace tree data(see below).
>
> I would think the patch, ancestor check, should be
>
> if (strstr(branch_delim || branchstr->data || branch_delim,
> branch_delim || current_key || branch_delim))
>
> This is my image, not a real code. However, if branchstr->data includes
> branch_delim, my image will not be perfect.
Good point. Thank you Masaru for the suggested fix.
Attached is a patch to fix the bug found by Masaru. His example now
produces:
regression=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level
int,
branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+----------
11 | | 0 | 11
10 | 11 | 1 | 11-10
111 | 11 | 1 | 11-111
1 | 111 | 2 | 11-111-1
(4 rows)
While making the patch I also realized that the "no show branch" form of
the function was not going to work very well for recursion detection.
Therefore there is now a default branch delimiter ('~') that is used
internally, for that case, to enable recursion detection to work. If
you need a different delimiter for your specific data, you will have to
use the "show branch" form of the function.
Joe Conway
2002-10-03 21:11:12 +04:00
|
|
|
else
|
|
|
|
/* default is no show, tilde for the delimiter */
|
|
|
|
branch_delim = pstrdup("~");
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
|
|
|
|
oldcontext = MemoryContextSwitchTo(per_query_ctx);
|
|
|
|
|
|
|
|
/* get the requested return tuple description */
|
|
|
|
tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
|
|
|
|
|
|
|
|
/* does it meet our needs */
|
2003-07-27 07:51:59 +04:00
|
|
|
validateConnectbyTupleDesc(tupdesc, show_branch, show_serial);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* OK, use it then */
|
|
|
|
attinmeta = TupleDescGetAttInMetadata(tupdesc);
|
|
|
|
|
|
|
|
/* OK, go to work */
|
|
|
|
rsinfo->returnMode = SFRM_Materialize;
|
|
|
|
rsinfo->setResult = connectby(relname,
|
|
|
|
key_fld,
|
|
|
|
parent_key_fld,
|
2003-08-04 04:43:34 +04:00
|
|
|
NULL,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
branch_delim,
|
|
|
|
start_with,
|
|
|
|
max_depth,
|
|
|
|
show_branch,
|
2003-07-27 07:51:59 +04:00
|
|
|
show_serial,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
per_query_ctx,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
rsinfo->allowedModes & SFRM_Materialize_Random,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
attinmeta);
|
|
|
|
rsinfo->setDesc = tupdesc;
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
/*
|
2002-09-05 00:31:48 +04:00
|
|
|
* SFRM_Materialize mode expects us to return a NULL Datum. The actual
|
2005-10-15 06:49:52 +04:00
|
|
|
* tuples are in our tuplestore and passed back through rsinfo->setResult.
|
|
|
|
* rsinfo->setDesc is set to the tuple description that we actually used
|
|
|
|
* to build our tuples with, so the caller can verify we did what it was
|
|
|
|
* expecting.
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
*/
|
|
|
|
return (Datum) 0;
|
|
|
|
}
|
|
|
|
|
2003-07-27 07:51:59 +04:00
|
|
|
PG_FUNCTION_INFO_V1(connectby_text_serial);
|
|
|
|
Datum
|
|
|
|
connectby_text_serial(PG_FUNCTION_ARGS)
|
|
|
|
{
|
2008-03-26 01:42:46 +03:00
|
|
|
char *relname = text_to_cstring(PG_GETARG_TEXT_PP(0));
|
|
|
|
char *key_fld = text_to_cstring(PG_GETARG_TEXT_PP(1));
|
|
|
|
char *parent_key_fld = text_to_cstring(PG_GETARG_TEXT_PP(2));
|
|
|
|
char *orderby_fld = text_to_cstring(PG_GETARG_TEXT_PP(3));
|
|
|
|
char *start_with = text_to_cstring(PG_GETARG_TEXT_PP(4));
|
2003-07-27 07:51:59 +04:00
|
|
|
int max_depth = PG_GETARG_INT32(5);
|
|
|
|
char *branch_delim = NULL;
|
|
|
|
bool show_branch = false;
|
|
|
|
bool show_serial = true;
|
|
|
|
ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
|
|
|
|
TupleDesc tupdesc;
|
|
|
|
AttInMetadata *attinmeta;
|
|
|
|
MemoryContext per_query_ctx;
|
|
|
|
MemoryContext oldcontext;
|
|
|
|
|
|
|
|
/* check to see if caller supports us returning a tuplestore */
|
2005-07-09 05:53:22 +04:00
|
|
|
if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
|
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
|
|
|
errmsg("set-valued function called in context that cannot accept a set")));
|
2008-10-29 01:02:06 +03:00
|
|
|
if (!(rsinfo->allowedModes & SFRM_Materialize) ||
|
|
|
|
rsinfo->expectedDesc == NULL)
|
2005-07-09 05:53:22 +04:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
|
2019-12-24 18:37:13 +03:00
|
|
|
errmsg("materialize mode required, but it is not allowed in this context")));
|
2003-07-27 07:51:59 +04:00
|
|
|
|
|
|
|
if (fcinfo->nargs == 7)
|
|
|
|
{
|
2008-03-26 01:42:46 +03:00
|
|
|
branch_delim = text_to_cstring(PG_GETARG_TEXT_PP(6));
|
2003-07-27 07:51:59 +04:00
|
|
|
show_branch = true;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
/* default is no show, tilde for the delimiter */
|
|
|
|
branch_delim = pstrdup("~");
|
|
|
|
|
|
|
|
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;
|
|
|
|
oldcontext = MemoryContextSwitchTo(per_query_ctx);
|
|
|
|
|
|
|
|
/* get the requested return tuple description */
|
|
|
|
tupdesc = CreateTupleDescCopy(rsinfo->expectedDesc);
|
|
|
|
|
|
|
|
/* does it meet our needs */
|
|
|
|
validateConnectbyTupleDesc(tupdesc, show_branch, show_serial);
|
|
|
|
|
|
|
|
/* OK, use it then */
|
|
|
|
attinmeta = TupleDescGetAttInMetadata(tupdesc);
|
|
|
|
|
|
|
|
/* OK, go to work */
|
|
|
|
rsinfo->returnMode = SFRM_Materialize;
|
|
|
|
rsinfo->setResult = connectby(relname,
|
|
|
|
key_fld,
|
|
|
|
parent_key_fld,
|
2003-08-04 04:43:34 +04:00
|
|
|
orderby_fld,
|
2003-07-27 07:51:59 +04:00
|
|
|
branch_delim,
|
|
|
|
start_with,
|
|
|
|
max_depth,
|
|
|
|
show_branch,
|
|
|
|
show_serial,
|
|
|
|
per_query_ctx,
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
rsinfo->allowedModes & SFRM_Materialize_Random,
|
2003-07-27 07:51:59 +04:00
|
|
|
attinmeta);
|
|
|
|
rsinfo->setDesc = tupdesc;
|
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* SFRM_Materialize mode expects us to return a NULL Datum. The actual
|
2005-10-15 06:49:52 +04:00
|
|
|
* tuples are in our tuplestore and passed back through rsinfo->setResult.
|
|
|
|
* rsinfo->setDesc is set to the tuple description that we actually used
|
|
|
|
* to build our tuples with, so the caller can verify we did what it was
|
|
|
|
* expecting.
|
2003-07-27 07:51:59 +04:00
|
|
|
*/
|
|
|
|
return (Datum) 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/*
|
|
|
|
* connectby - does the real work for connectby_text()
|
|
|
|
*/
|
|
|
|
static Tuplestorestate *
|
|
|
|
connectby(char *relname,
|
|
|
|
char *key_fld,
|
|
|
|
char *parent_key_fld,
|
2003-08-04 04:43:34 +04:00
|
|
|
char *orderby_fld,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
char *branch_delim,
|
|
|
|
char *start_with,
|
|
|
|
int max_depth,
|
|
|
|
bool show_branch,
|
2003-08-04 04:43:34 +04:00
|
|
|
bool show_serial,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
MemoryContext per_query_ctx,
|
2008-10-29 03:00:39 +03:00
|
|
|
bool randomAccess,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
AttInMetadata *attinmeta)
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
Tuplestorestate *tupstore = NULL;
|
|
|
|
int ret;
|
|
|
|
MemoryContext oldcontext;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2003-08-04 04:43:34 +04:00
|
|
|
int serial = 1;
|
2003-07-27 07:51:59 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* Connect to SPI manager */
|
|
|
|
if ((ret = SPI_connect()) < 0)
|
2003-07-24 21:52:50 +04:00
|
|
|
/* internal error */
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
elog(ERROR, "connectby: SPI_connect returned %d", ret);
|
|
|
|
|
|
|
|
/* switch to longer term context to create the tuple store */
|
|
|
|
oldcontext = MemoryContextSwitchTo(per_query_ctx);
|
|
|
|
|
|
|
|
/* initialize our tuplestore */
|
2008-10-29 03:00:39 +03:00
|
|
|
tupstore = tuplestore_begin_heap(randomAccess, false, work_mem);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
MemoryContextSwitchTo(oldcontext);
|
|
|
|
|
|
|
|
/* now go get the whole tree */
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
build_tuplestore_recursively(key_fld,
|
|
|
|
parent_key_fld,
|
|
|
|
relname,
|
|
|
|
orderby_fld,
|
|
|
|
branch_delim,
|
|
|
|
start_with,
|
|
|
|
start_with, /* current_branch */
|
Phase 2 of pgindent updates.
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:18:54 +03:00
|
|
|
0, /* initial level is 0 */
|
|
|
|
&serial, /* initial serial is 1 */
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
max_depth,
|
|
|
|
show_branch,
|
|
|
|
show_serial,
|
|
|
|
per_query_ctx,
|
|
|
|
attinmeta,
|
|
|
|
tupstore);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
SPI_finish();
|
|
|
|
|
|
|
|
return tupstore;
|
|
|
|
}
|
|
|
|
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
static void
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
build_tuplestore_recursively(char *key_fld,
|
|
|
|
char *parent_key_fld,
|
|
|
|
char *relname,
|
2003-07-27 07:51:59 +04:00
|
|
|
char *orderby_fld,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
char *branch_delim,
|
|
|
|
char *start_with,
|
|
|
|
char *branch,
|
|
|
|
int level,
|
2003-07-27 07:51:59 +04:00
|
|
|
int *serial,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
int max_depth,
|
|
|
|
bool show_branch,
|
2003-07-27 07:51:59 +04:00
|
|
|
bool show_serial,
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
MemoryContext per_query_ctx,
|
|
|
|
AttInMetadata *attinmeta,
|
|
|
|
Tuplestorestate *tupstore)
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
TupleDesc tupdesc = attinmeta->tupdesc;
|
|
|
|
int ret;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 proc;
|
2003-07-27 07:51:59 +04:00
|
|
|
int serial_column;
|
2006-10-04 04:30:14 +04:00
|
|
|
StringInfoData sql;
|
2003-10-02 07:51:40 +04:00
|
|
|
char **values;
|
|
|
|
char *current_key;
|
|
|
|
char *current_key_parent;
|
|
|
|
char current_level[INT32_STRLEN];
|
|
|
|
char serial_str[INT32_STRLEN];
|
|
|
|
char *current_branch;
|
|
|
|
HeapTuple tuple;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2002-09-05 00:31:48 +04:00
|
|
|
if (max_depth > 0 && level > max_depth)
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
return;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2006-03-01 09:51:01 +03:00
|
|
|
initStringInfo(&sql);
|
2003-10-02 07:51:40 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* Build initial sql statement */
|
2003-07-27 07:51:59 +04:00
|
|
|
if (!show_serial)
|
|
|
|
{
|
2006-03-01 09:51:01 +03:00
|
|
|
appendStringInfo(&sql, "SELECT %s, %s FROM %s WHERE %s = %s AND %s IS NOT NULL AND %s <> %s",
|
2003-08-04 04:43:34 +04:00
|
|
|
key_fld,
|
|
|
|
parent_key_fld,
|
|
|
|
relname,
|
|
|
|
parent_key_fld,
|
2004-02-24 08:25:36 +03:00
|
|
|
quote_literal_cstr(start_with),
|
2003-10-02 07:51:40 +04:00
|
|
|
key_fld, key_fld, parent_key_fld);
|
2003-08-04 04:43:34 +04:00
|
|
|
serial_column = 0;
|
2003-07-27 07:51:59 +04:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2006-03-01 09:51:01 +03:00
|
|
|
appendStringInfo(&sql, "SELECT %s, %s FROM %s WHERE %s = %s AND %s IS NOT NULL AND %s <> %s ORDER BY %s",
|
2003-08-04 04:43:34 +04:00
|
|
|
key_fld,
|
|
|
|
parent_key_fld,
|
|
|
|
relname,
|
|
|
|
parent_key_fld,
|
2004-02-24 08:25:36 +03:00
|
|
|
quote_literal_cstr(start_with),
|
2003-10-02 07:51:40 +04:00
|
|
|
key_fld, key_fld, parent_key_fld,
|
2003-08-04 04:43:34 +04:00
|
|
|
orderby_fld);
|
|
|
|
serial_column = 1;
|
2003-07-27 07:51:59 +04:00
|
|
|
}
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2003-10-02 07:51:40 +04:00
|
|
|
if (show_branch)
|
|
|
|
values = (char **) palloc((CONNECTBY_NCOLS + serial_column) * sizeof(char *));
|
|
|
|
else
|
|
|
|
values = (char **) palloc((CONNECTBY_NCOLS_NOBRANCH + serial_column) * sizeof(char *));
|
|
|
|
|
|
|
|
/* First time through, do a little setup */
|
|
|
|
if (level == 0)
|
|
|
|
{
|
|
|
|
/* root value is the one we initially start with */
|
|
|
|
values[0] = start_with;
|
|
|
|
|
|
|
|
/* root value has no parent */
|
|
|
|
values[1] = NULL;
|
|
|
|
|
|
|
|
/* root level is 0 */
|
|
|
|
sprintf(current_level, "%d", level);
|
|
|
|
values[2] = current_level;
|
|
|
|
|
|
|
|
/* root branch is just starting root value */
|
|
|
|
if (show_branch)
|
|
|
|
values[3] = start_with;
|
|
|
|
|
|
|
|
/* root starts the serial with 1 */
|
|
|
|
if (show_serial)
|
|
|
|
{
|
|
|
|
sprintf(serial_str, "%d", (*serial)++);
|
|
|
|
if (show_branch)
|
|
|
|
values[4] = serial_str;
|
|
|
|
else
|
|
|
|
values[3] = serial_str;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* construct the tuple */
|
|
|
|
tuple = BuildTupleFromCStrings(attinmeta, values);
|
|
|
|
|
|
|
|
/* now store it */
|
|
|
|
tuplestore_puttuple(tupstore, tuple);
|
|
|
|
|
|
|
|
/* increment level */
|
|
|
|
level++;
|
|
|
|
}
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* Retrieve the desired rows */
|
2006-03-01 09:51:01 +03:00
|
|
|
ret = SPI_execute(sql.data, true, 0);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
proc = SPI_processed;
|
|
|
|
|
|
|
|
/* Check for qualifying tuples */
|
|
|
|
if ((ret == SPI_OK_SELECT) && (proc > 0))
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
HeapTuple spi_tuple;
|
|
|
|
SPITupleTable *tuptable = SPI_tuptable;
|
|
|
|
TupleDesc spi_tupdesc = tuptable->tupdesc;
|
Widen query numbers-of-tuples-processed counters to uint64.
This patch widens SPI_processed, EState's es_processed field, PortalData's
portalPos field, FuncCallContext's call_cntr and max_calls fields,
ExecutorRun's count argument, PortalRunFetch's result, and the max number
of rows in a SPITupleTable to uint64, and deals with (I hope) all the
ensuing fallout. Some of these values were declared uint32 before, and
others "long".
I also removed PortalData's posOverflow field, since that logic seems
pretty useless given that portalPos is now always 64 bits.
The user-visible results are that command tags for SELECT etc will
correctly report tuple counts larger than 4G, as will plpgsql's GET
GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples
than that are still not exactly the norm, but they're becoming more
common.
Most values associated with FETCH/MOVE distances, such as PortalRun's count
argument and the count argument of most SPI functions that have one, remain
declared as "long". It's not clear whether it would be worth promoting
those to int64; but it would definitely be a large dollop of additional
API churn on top of this, and it would only help 32-bit platforms which
seem relatively less likely to see any benefit.
Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
2016-03-13 00:05:10 +03:00
|
|
|
uint64 i;
|
2006-10-04 04:30:14 +04:00
|
|
|
StringInfoData branchstr;
|
|
|
|
StringInfoData chk_branchstr;
|
|
|
|
StringInfoData chk_current_key;
|
> The previous patch fixed an infinite recursion bug in
> contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error
> seems to occur even if a table has commonplace tree data(see below).
>
> I would think the patch, ancestor check, should be
>
> if (strstr(branch_delim || branchstr->data || branch_delim,
> branch_delim || current_key || branch_delim))
>
> This is my image, not a real code. However, if branchstr->data includes
> branch_delim, my image will not be perfect.
Good point. Thank you Masaru for the suggested fix.
Attached is a patch to fix the bug found by Masaru. His example now
produces:
regression=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level
int,
branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+----------
11 | | 0 | 11
10 | 11 | 1 | 11-10
111 | 11 | 1 | 11-111
1 | 111 | 2 | 11-111-1
(4 rows)
While making the patch I also realized that the "no show branch" form of
the function was not going to work very well for recursion detection.
Therefore there is now a default branch delimiter ('~') that is used
internally, for that case, to enable recursion detection to work. If
you need a different delimiter for your specific data, you will have to
use the "show branch" form of the function.
Joe Conway
2002-10-03 21:11:12 +04:00
|
|
|
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
/*
|
|
|
|
* Check that return tupdesc is compatible with the one we got from
|
|
|
|
* the query.
|
|
|
|
*/
|
|
|
|
compatConnectbyTupleDescs(tupdesc, spi_tupdesc);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2007-03-03 22:32:55 +03:00
|
|
|
initStringInfo(&branchstr);
|
|
|
|
initStringInfo(&chk_branchstr);
|
|
|
|
initStringInfo(&chk_current_key);
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
for (i = 0; i < proc; i++)
|
|
|
|
{
|
> Now I'm testing connectby() in the /contrib/tablefunc in 7.3b1, which would
> be a useful function for many users. However, I found the fact that
> if connectby_tree has the following data, connectby() tries to search the end
> of roots without knowing that the relations are infinite(-5-9-10-11-9-10-11-)
.
> I hope connectby() supports a check routine to find infinite relations.
>
>
> CREATE TABLE connectby_tree(keyid int, parent_keyid int);
> INSERT INTO connectby_tree VALUES(1,NULL);
> INSERT INTO connectby_tree VALUES(2,1);
> INSERT INTO connectby_tree VALUES(3,1);
> INSERT INTO connectby_tree VALUES(4,2);
> INSERT INTO connectby_tree VALUES(5,2);
> INSERT INTO connectby_tree VALUES(6,4);
> INSERT INTO connectby_tree VALUES(7,3);
> INSERT INTO connectby_tree VALUES(8,6);
> INSERT INTO connectby_tree VALUES(9,5);
>
> INSERT INTO connectby_tree VALUES(10,9);
> INSERT INTO connectby_tree VALUES(11,10);
> INSERT INTO connectby_tree VALUES(9,11); <-- infinite
>
The attached patch fixes the infinite recursion bug in
contrib/tablefunc/tablefunc.c:connectby found by Masaru Sugawara.
test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 4, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+-------------
2 | | 0 | 2
4 | 2 | 1 | 2~4
6 | 4 | 2 | 2~4~6
8 | 6 | 3 | 2~4~6~8
5 | 2 | 1 | 2~5
9 | 5 | 2 | 2~5~9
10 | 9 | 3 | 2~5~9~10
11 | 10 | 4 | 2~5~9~10~11
(8 rows)
test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 5, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
ERROR: infinite recursion detected
I implemented it by checking the branch string for repeated keys
(whether or not the branch is returned). The performance hit was pretty
minimal -- about 1% for a moderately complex test case (220000 record
table, 9 level tree with 3800 members).
Joe Conway
2002-09-12 04:19:44 +04:00
|
|
|
/* initialize branch for this pass */
|
2013-10-31 18:55:59 +04:00
|
|
|
appendStringInfoString(&branchstr, branch);
|
2006-03-01 09:51:01 +03:00
|
|
|
appendStringInfo(&chk_branchstr, "%s%s%s", branch_delim, branch, branch_delim);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* get the next sql result tuple */
|
|
|
|
spi_tuple = tuptable->vals[i];
|
|
|
|
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
/* get the current key (might be NULL) */
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
current_key = SPI_getvalue(spi_tuple, spi_tupdesc, 1);
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
|
|
|
|
/* get the parent key (might be NULL) */
|
|
|
|
current_key_parent = SPI_getvalue(spi_tuple, spi_tupdesc, 2);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* get the current level */
|
|
|
|
sprintf(current_level, "%d", level);
|
|
|
|
|
> The previous patch fixed an infinite recursion bug in
> contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error
> seems to occur even if a table has commonplace tree data(see below).
>
> I would think the patch, ancestor check, should be
>
> if (strstr(branch_delim || branchstr->data || branch_delim,
> branch_delim || current_key || branch_delim))
>
> This is my image, not a real code. However, if branchstr->data includes
> branch_delim, my image will not be perfect.
Good point. Thank you Masaru for the suggested fix.
Attached is a patch to fix the bug found by Masaru. His example now
produces:
regression=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level
int,
branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+----------
11 | | 0 | 11
10 | 11 | 1 | 11-10
111 | 11 | 1 | 11-111
1 | 111 | 2 | 11-111-1
(4 rows)
While making the patch I also realized that the "no show branch" form of
the function was not going to work very well for recursion detection.
Therefore there is now a default branch delimiter ('~') that is used
internally, for that case, to enable recursion detection to work. If
you need a different delimiter for your specific data, you will have to
use the "show branch" form of the function.
Joe Conway
2002-10-03 21:11:12 +04:00
|
|
|
/* check to see if this key is also an ancestor */
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
if (current_key)
|
|
|
|
{
|
|
|
|
appendStringInfo(&chk_current_key, "%s%s%s",
|
|
|
|
branch_delim, current_key, branch_delim);
|
|
|
|
if (strstr(chk_branchstr.data, chk_current_key.data))
|
2015-08-03 06:49:19 +03:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_INVALID_RECURSION),
|
|
|
|
errmsg("infinite recursion detected")));
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
}
|
> The previous patch fixed an infinite recursion bug in
> contrib/tablefunc/tablefunc.c:connectby. But, other unmanageable error
> seems to occur even if a table has commonplace tree data(see below).
>
> I would think the patch, ancestor check, should be
>
> if (strstr(branch_delim || branchstr->data || branch_delim,
> branch_delim || current_key || branch_delim))
>
> This is my image, not a real code. However, if branchstr->data includes
> branch_delim, my image will not be perfect.
Good point. Thank you Masaru for the suggested fix.
Attached is a patch to fix the bug found by Masaru. His example now
produces:
regression=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '11', 0, '-') AS t(keyid int, parent_keyid int, level
int,
branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+----------
11 | | 0 | 11
10 | 11 | 1 | 11-10
111 | 11 | 1 | 11-111
1 | 111 | 2 | 11-111-1
(4 rows)
While making the patch I also realized that the "no show branch" form of
the function was not going to work very well for recursion detection.
Therefore there is now a default branch delimiter ('~') that is used
internally, for that case, to enable recursion detection to work. If
you need a different delimiter for your specific data, you will have to
use the "show branch" form of the function.
Joe Conway
2002-10-03 21:11:12 +04:00
|
|
|
|
|
|
|
/* OK, extend the branch */
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
if (current_key)
|
|
|
|
appendStringInfo(&branchstr, "%s%s", branch_delim, current_key);
|
2006-03-01 09:51:01 +03:00
|
|
|
current_branch = branchstr.data;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* build a tuple */
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
values[0] = current_key;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
values[1] = current_key_parent;
|
|
|
|
values[2] = current_level;
|
|
|
|
if (show_branch)
|
|
|
|
values[3] = current_branch;
|
2003-07-27 07:51:59 +04:00
|
|
|
if (show_serial)
|
|
|
|
{
|
|
|
|
sprintf(serial_str, "%d", (*serial)++);
|
|
|
|
if (show_branch)
|
|
|
|
values[4] = serial_str;
|
|
|
|
else
|
|
|
|
values[3] = serial_str;
|
|
|
|
}
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
tuple = BuildTupleFromCStrings(attinmeta, values);
|
|
|
|
|
|
|
|
/* store the tuple for later use */
|
|
|
|
tuplestore_puttuple(tupstore, tuple);
|
|
|
|
|
|
|
|
heap_freetuple(tuple);
|
|
|
|
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
/* recurse using current_key as the new start_with */
|
|
|
|
if (current_key)
|
|
|
|
build_tuplestore_recursively(key_fld,
|
|
|
|
parent_key_fld,
|
|
|
|
relname,
|
|
|
|
orderby_fld,
|
|
|
|
branch_delim,
|
|
|
|
current_key,
|
|
|
|
current_branch,
|
|
|
|
level + 1,
|
|
|
|
serial,
|
|
|
|
max_depth,
|
|
|
|
show_branch,
|
|
|
|
show_serial,
|
|
|
|
per_query_ctx,
|
|
|
|
attinmeta,
|
|
|
|
tupstore);
|
|
|
|
|
|
|
|
xpfree(current_key);
|
|
|
|
xpfree(current_key_parent);
|
> Now I'm testing connectby() in the /contrib/tablefunc in 7.3b1, which would
> be a useful function for many users. However, I found the fact that
> if connectby_tree has the following data, connectby() tries to search the end
> of roots without knowing that the relations are infinite(-5-9-10-11-9-10-11-)
.
> I hope connectby() supports a check routine to find infinite relations.
>
>
> CREATE TABLE connectby_tree(keyid int, parent_keyid int);
> INSERT INTO connectby_tree VALUES(1,NULL);
> INSERT INTO connectby_tree VALUES(2,1);
> INSERT INTO connectby_tree VALUES(3,1);
> INSERT INTO connectby_tree VALUES(4,2);
> INSERT INTO connectby_tree VALUES(5,2);
> INSERT INTO connectby_tree VALUES(6,4);
> INSERT INTO connectby_tree VALUES(7,3);
> INSERT INTO connectby_tree VALUES(8,6);
> INSERT INTO connectby_tree VALUES(9,5);
>
> INSERT INTO connectby_tree VALUES(10,9);
> INSERT INTO connectby_tree VALUES(11,10);
> INSERT INTO connectby_tree VALUES(9,11); <-- infinite
>
The attached patch fixes the infinite recursion bug in
contrib/tablefunc/tablefunc.c:connectby found by Masaru Sugawara.
test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 4, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
keyid | parent_keyid | level | branch
-------+--------------+-------+-------------
2 | | 0 | 2
4 | 2 | 1 | 2~4
6 | 4 | 2 | 2~4~6
8 | 6 | 3 | 2~4~6~8
5 | 2 | 1 | 2~5
9 | 5 | 2 | 2~5~9
10 | 9 | 3 | 2~5~9~10
11 | 10 | 4 | 2~5~9~10~11
(8 rows)
test=# SELECT * FROM connectby('connectby_tree', 'keyid',
'parent_keyid', '2', 5, '~') AS t(keyid int, parent_keyid int, level
int, branch text);
ERROR: infinite recursion detected
I implemented it by checking the branch string for repeated keys
(whether or not the branch is returned). The performance hit was pretty
minimal -- about 1% for a moderately complex test case (220000 record
table, 9 level tree with 3800 members).
Joe Conway
2002-09-12 04:19:44 +04:00
|
|
|
|
|
|
|
/* reset branch for next pass */
|
2007-03-03 22:32:55 +03:00
|
|
|
resetStringInfo(&branchstr);
|
|
|
|
resetStringInfo(&chk_branchstr);
|
|
|
|
resetStringInfo(&chk_current_key);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
}
|
2007-03-03 22:32:55 +03:00
|
|
|
|
|
|
|
xpfree(branchstr.data);
|
|
|
|
xpfree(chk_branchstr.data);
|
|
|
|
xpfree(chk_current_key.data);
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check expected (query runtime) tupdesc suitable for Connectby
|
|
|
|
*/
|
|
|
|
static void
|
2017-08-20 21:19:07 +03:00
|
|
|
validateConnectbyTupleDesc(TupleDesc td, bool show_branch, bool show_serial)
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
{
|
2003-08-04 04:43:34 +04:00
|
|
|
int serial_column = 0;
|
2003-07-27 07:51:59 +04:00
|
|
|
|
|
|
|
if (show_serial)
|
2003-08-04 04:43:34 +04:00
|
|
|
serial_column = 1;
|
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* are there the correct number of columns */
|
|
|
|
if (show_branch)
|
|
|
|
{
|
2017-08-20 21:19:07 +03:00
|
|
|
if (td->natts != (CONNECTBY_NCOLS + serial_column))
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("Query-specified return tuple has " \
|
|
|
|
"wrong number of columns.")));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
2017-08-20 21:19:07 +03:00
|
|
|
if (td->natts != CONNECTBY_NCOLS_NOBRANCH + serial_column)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("Query-specified return tuple has " \
|
|
|
|
"wrong number of columns.")));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
}
|
|
|
|
|
|
|
|
/* check that the types of the first two columns match */
|
2017-08-20 21:19:07 +03:00
|
|
|
if (TupleDescAttr(td, 0)->atttypid != TupleDescAttr(td, 1)->atttypid)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("First two columns must be the same type.")));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* check that the type of the third column is INT4 */
|
2017-08-20 21:19:07 +03:00
|
|
|
if (TupleDescAttr(td, 2)->atttypid != INT4OID)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("Third column must be type %s.",
|
2003-08-04 04:43:34 +04:00
|
|
|
format_type_be(INT4OID))));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2003-06-25 22:13:50 +04:00
|
|
|
/* check that the type of the fourth column is TEXT if applicable */
|
2017-08-20 21:19:07 +03:00
|
|
|
if (show_branch && TupleDescAttr(td, 3)->atttypid != TEXTOID)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("Fourth column must be type %s.",
|
2003-08-04 04:43:34 +04:00
|
|
|
format_type_be(TEXTOID))));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2003-07-27 07:51:59 +04:00
|
|
|
/* check that the type of the fifth column is INT4 */
|
2017-08-20 21:19:07 +03:00
|
|
|
if (show_branch && show_serial &&
|
|
|
|
TupleDescAttr(td, 4)->atttypid != INT4OID)
|
2015-08-03 06:49:19 +03:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
errmsg("query-specified return tuple not valid for Connectby: "
|
|
|
|
"fifth column must be type %s",
|
|
|
|
format_type_be(INT4OID))));
|
2003-07-27 07:51:59 +04:00
|
|
|
|
|
|
|
/* check that the type of the fifth column is INT4 */
|
2017-08-20 21:19:07 +03:00
|
|
|
if (!show_branch && show_serial &&
|
|
|
|
TupleDescAttr(td, 3)->atttypid != INT4OID)
|
2015-08-03 06:49:19 +03:00
|
|
|
ereport(ERROR,
|
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
errmsg("query-specified return tuple not valid for Connectby: "
|
|
|
|
"fourth column must be type %s",
|
|
|
|
format_type_be(INT4OID))));
|
2003-08-04 04:43:34 +04:00
|
|
|
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
/* OK, the tupdesc is valid for our purposes */
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Check if spi sql tupdesc and return tupdesc are compatible
|
|
|
|
*/
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
static void
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
compatConnectbyTupleDescs(TupleDesc ret_tupdesc, TupleDesc sql_tupdesc)
|
|
|
|
{
|
2002-09-05 00:31:48 +04:00
|
|
|
Oid ret_atttypid;
|
|
|
|
Oid sql_atttypid;
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
int32 ret_atttypmod;
|
|
|
|
int32 sql_atttypmod;
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
/*
|
|
|
|
* Result must have at least 2 columns.
|
|
|
|
*/
|
|
|
|
if (sql_tupdesc->natts < 2)
|
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("Query must return at least two columns.")));
|
|
|
|
|
|
|
|
/*
|
|
|
|
* These columns must match the result type indicated by the calling
|
|
|
|
* query.
|
|
|
|
*/
|
2017-08-20 21:19:07 +03:00
|
|
|
ret_atttypid = TupleDescAttr(ret_tupdesc, 0)->atttypid;
|
|
|
|
sql_atttypid = TupleDescAttr(sql_tupdesc, 0)->atttypid;
|
|
|
|
ret_atttypmod = TupleDescAttr(ret_tupdesc, 0)->atttypmod;
|
|
|
|
sql_atttypmod = TupleDescAttr(sql_tupdesc, 0)->atttypmod;
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
if (ret_atttypid != sql_atttypid ||
|
|
|
|
(ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod))
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
errdetail("SQL key field type %s does " \
|
|
|
|
"not match return key field type %s.",
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
format_type_with_typemod(ret_atttypid, ret_atttypmod),
|
|
|
|
format_type_with_typemod(sql_atttypid, sql_atttypmod))));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
2017-08-20 21:19:07 +03:00
|
|
|
ret_atttypid = TupleDescAttr(ret_tupdesc, 1)->atttypid;
|
|
|
|
sql_atttypid = TupleDescAttr(sql_tupdesc, 1)->atttypid;
|
|
|
|
ret_atttypmod = TupleDescAttr(ret_tupdesc, 1)->atttypmod;
|
|
|
|
sql_atttypmod = TupleDescAttr(sql_tupdesc, 1)->atttypmod;
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
if (ret_atttypid != sql_atttypid ||
|
|
|
|
(ret_atttypmod >= 0 && ret_atttypmod != sql_atttypmod))
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
Handle unexpected query results, especially NULLs, safely in connectby().
connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me
2015-01-30 04:18:33 +03:00
|
|
|
errdetail("SQL parent key field type %s does " \
|
|
|
|
"not match return parent key field type %s.",
|
Phase 3 of pgindent updates.
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
2017-06-21 22:35:54 +03:00
|
|
|
format_type_with_typemod(ret_atttypid, ret_atttypmod),
|
|
|
|
format_type_with_typemod(sql_atttypid, sql_atttypmod))));
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
|
|
|
|
/* OK, the two tupdescs are compatible for our purposes */
|
|
|
|
}
|
|
|
|
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/*
|
|
|
|
* Check if two tupdescs match in type of attributes
|
|
|
|
*/
|
|
|
|
static bool
|
Attached is an update to contrib/tablefunc. It introduces a new
function, connectby(), which can serve as a reference implementation for
the changes made in the last few days -- namely the ability of a
function to return an entire tuplestore, and the ability of a function
to make use of the query provided "expected" tuple description.
Description:
connectby(text relname, text keyid_fld, text parent_keyid_fld,
text start_with, int max_depth [, text branch_delim])
- returns keyid, parent_keyid, level, and an optional branch string
- requires anonymous composite type syntax in the FROM clause. See
the instructions in the documentation below.
Joe Conway
2002-09-02 09:44:05 +04:00
|
|
|
compatCrosstabTupleDescs(TupleDesc ret_tupdesc, TupleDesc sql_tupdesc)
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
{
|
|
|
|
int i;
|
2002-09-05 00:31:48 +04:00
|
|
|
Form_pg_attribute ret_attr;
|
|
|
|
Oid ret_atttypid;
|
|
|
|
Form_pg_attribute sql_attr;
|
|
|
|
Oid sql_atttypid;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
2008-12-01 04:30:18 +03:00
|
|
|
if (ret_tupdesc->natts < 2 ||
|
|
|
|
sql_tupdesc->natts < 3)
|
|
|
|
return false;
|
|
|
|
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
/* check the rowid types match */
|
2017-08-20 21:19:07 +03:00
|
|
|
ret_atttypid = TupleDescAttr(ret_tupdesc, 0)->atttypid;
|
|
|
|
sql_atttypid = TupleDescAttr(sql_tupdesc, 0)->atttypid;
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
if (ret_atttypid != sql_atttypid)
|
2003-07-24 21:52:50 +04:00
|
|
|
ereport(ERROR,
|
2015-08-03 06:49:19 +03:00
|
|
|
(errcode(ERRCODE_DATATYPE_MISMATCH),
|
2003-07-24 21:52:50 +04:00
|
|
|
errmsg("invalid return type"),
|
|
|
|
errdetail("SQL rowid datatype does not match " \
|
|
|
|
"return rowid datatype.")));
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
/*
|
2005-10-15 06:49:52 +04:00
|
|
|
* - attribute [1] of the sql tuple is the category; no need to check it -
|
|
|
|
* attribute [2] of the sql tuple should match attributes [1] to [natts]
|
|
|
|
* of the return tuple
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
*/
|
2017-08-20 21:19:07 +03:00
|
|
|
sql_attr = TupleDescAttr(sql_tupdesc, 2);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
for (i = 1; i < ret_tupdesc->natts; i++)
|
|
|
|
{
|
2017-08-20 21:19:07 +03:00
|
|
|
ret_attr = TupleDescAttr(ret_tupdesc, i);
|
As mentioned above, here is my contrib/tablefunc patch. It includes
three functions which exercise the tablefunc API.
show_all_settings()
- returns the same information as SHOW ALL, but as a query result
normal_rand(int numvals, float8 mean, float8 stddev, int seed)
- returns a set of normally distributed float8 values
- This routine implements Algorithm P (Polar method for normal
deviates) from Knuth's _The_Art_of_Computer_Programming_, Volume 2,
3rd ed., pages 122-126. Knuth cites his source as "The polar
method", G. E. P. Box, M. E. Muller, and G. Marsaglia,
_Annals_Math,_Stat._ 29 (1958), 610-611.
crosstabN(text sql)
- returns a set of row_name plus N category value columns
- crosstab2(), crosstab3(), and crosstab4() are defined for you,
but you can create additional crosstab functions per directions
in the README.
Joe Conway
2002-07-30 20:31:11 +04:00
|
|
|
|
|
|
|
if (ret_attr->atttypid != sql_attr->atttypid)
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* OK, the two tupdescs are compatible for our purposes */
|
|
|
|
return true;
|
|
|
|
}
|