version of crosstab. This fixes a major deficiency in real-world use of
the original version. Easiest to undestand with an illustration:
Data:
-------------------------------------------------------------------
select * from cth;
id | rowid | rowdt | attribute | val
----+-------+---------------------+----------------+---------------
1 | test1 | 2003-03-01 00:00:00 | temperature | 42
2 | test1 | 2003-03-01 00:00:00 | test_result | PASS
3 | test1 | 2003-03-01 00:00:00 | volts | 2.6987
4 | test2 | 2003-03-02 00:00:00 | temperature | 53
5 | test2 | 2003-03-02 00:00:00 | test_result | FAIL
6 | test2 | 2003-03-02 00:00:00 | test_startdate | 01 March 2003
7 | test2 | 2003-03-02 00:00:00 | volts | 3.1234
(7 rows)
Original crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1,2',4)
AS c(rowid text, temperature text, test_result text, test_startdate
text, volts text);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+----------------+--------
test1 | 42 | PASS | 2.6987 |
test2 | 53 | FAIL | 01 March 2003 | 3.1234
(2 rows)
Hashed crosstab:
-------------------------------------------------------------------
SELECT * FROM crosstab(
'SELECT rowid, attribute, val FROM cth ORDER BY 1',
'SELECT DISTINCT attribute FROM cth ORDER BY 1')
AS c(rowid text, temperature int4, test_result text, test_startdate
timestamp, volts float8);
rowid | temperature | test_result | test_startdate | volts
-------+-------------+-------------+---------------------+--------
test1 | 42 | PASS | | 2.6987
test2 | 53 | FAIL | 2003-03-01 00:00:00 | 3.1234
(2 rows)
Notice that the original crosstab slides data over to the left in the
result tuple when it encounters missing data. In order to work around
this you have to be make your source sql do all sorts of contortions
(cartesian join of distinct rowid with distinct attribute; left join
that back to the real source data). The new version avoids this by
building a hash table using a second distinct attribute query.
The new version also allows for "extra" columns (see the README) and
allows the result columns to be coerced into differing datatypes if they
are suitable (as shown above).
In testing a "real-world" data set (69 distinct rowid's, 27 distinct
categories/attributes, multiple missing data points) I saw about a
5-fold improvement in execution time (from about 2200 ms old, to 440 ms
new).
I left the original version intact because: 1) BC, 2) it is probably
slightly faster if you know that you have no missing attributes.
README and regression test adjustments included. If there are no
objections, please apply.
Joe Conway
now, my changes seem to work. Some possible minor bugs got squished
on the way but I can't be sure without more feedback from people who
really put the code to the test.
The new patch mostly simplifies variable handling and reduces code
duplication. Changes in the command parser eliminate some redundant
variables (boolean state + depth counter), replaces some
"else if" constructs with switches, and so on. It is meant to be
applied together with my previous patch, although I hope they don't
conflict; I went back to the CVS version for this one.
One more thing I thought should perhaps be changed: an IGNOREEOF
value of n will ignore only n-1 EOFs. I didn't want to touch this
for fear of breaking existing applications, but it does seem a tad
illogical.
Jeroen T. Vermeulen
changes to the SQL to retrieve attributes for older versions of Postgres is
probably wise. Also, please make sure that I have mapped the storage types
to the correct storage names, as this is relatively poorly documented.
I think that this patch might need to be considered for back-porting to
7.3.3 since at the moment, people will be losing valuable information after
upgrades.
Will dump:
CREATE TABLE test (
a text,
b text,
c text,
d text
);
ALTER TABLE ONLY test ALTER COLUMN a SET STATISTICS 55;
ALTER TABLE ONLY test ALTER COLUMN a SET STORAGE PLAIN;
ALTER TABLE ONLY test ALTER COLUMN b SET STATISTICS 1000;
ALTER TABLE ONLY test ALTER COLUMN c SET STORAGE EXTERNAL;
ALTER TABLE ONLY test ALTER COLUMN d SET STORAGE MAIN;
Christopher Kings-Lynne
what is capable using integer-datatime timestamps. It does attempt
to exercise the maximum allowable timestamp range.
Also is a small error check when converting a timestamp from external
to internal format that prevents out of range timestamps from being
entered.
Files patched:
Index: src/backend/utils/adt/timestamp.c
Added range check to prevent out of range timestamps
from being used.
Index: src/test/regress/sql/horology.sql
Index: src/test/regress/expected/horology-no-DST-before-1970.out
Index: src/test/regress/expected/horology-solaris-1947.out
Limited range of timestamps being checked to
Jan 1, 4713 BC to Dec 31, 294276
In creating this patch, I have seen some definite problems with integer
timestamps and how they react when used near their limits. For example,
the following statement gives the correct result:
SELECT timestamp without time zone 'Jan 1, 4713 BC'
+ interval '109203489 days' AS "Dec 31, 294276";
However, this statement which is the logical inverse of the above
gives incorrect results:
SELECT timestamp without time zone '12/31/294276'
- timestamp without time zone 'Jan 1, 4713 BC' AS "109203489 Days";
John Cochran
7.3.2). It removes some code duplication and #ifdeffing, and some
unstructured ugliness such as tacky breaks and an unneeded continue.
Breaks up a large function into smaller functions and reduces required
nesting levels, and kills a variable or two.
Jeroen T. Vermeulen
patch fix it -- but this patch doesn't contains tests or docs fixes. I
will send it later.
Fixed outputs:
select to_char(x, '9999.999') as x,
to_char(x, 'S9999.999') as s,
to_char(x, 'SG9999.999') as sg,
to_char(x, 'MI9999.999') as mi,
to_char(x, 'PL9999.999') as pl,
to_char(x, 'PLMI9999.999') as plmi,
to_char(x, '9999.999SG') as sg2,
to_char(x, '9999.999PL') as pl2,
to_char(x, '9999.999MI') as mi2 from num;
Karel Zak
> >
> > - Add check in pg_dump to see if the value returned is the max /min
> > values and replace with NO MAXVALUE, NO MINVALUE.
> >
> > - Change START and INCREMENT to use START WITH and INCREMENT BY syntax.
> > This makes it a touch easier to port to other databases with sequences
> > (Oracle). PostgreSQL supports both syntaxes already.
>
> + char bufm[100],
> + bufx[100];
>
> This seems to be an arbitary size. Why not set it to the actual maximum
> length?
>
> Also:
>
> + snprintf(bufm, 100, INT64_FORMAT, SEQ_MINVALUE);
> + snprintf(bufx, 100, INT64_FORMAT, SEQ_MAXVALUE);
>
> sizeof(bufm), sizeof(bufx) is probably the more
> maintenance-friendly/standard way to do it.
I changed the code to use sizeof - but will wait for a response from
Peter before changing the size. It's consistent throughout the sequence
code to be 100 for this purpose.
Rod Taylor <rbt@rbt.ca>
Envrironment and Files section, explained exactly what -w
does)
This is a patch which allows pg_ctl to make an intelligent
guess as to the proper port when running 'psql -l' to
determine if the database has started up (the -w flag).
The environment variable PGPORT is used. If that is not found,
it checks if a specific port has been set inside the postgresql.conf
file. If it is has not, it uses the port that Postgres was
compiled with.
Greg Sabino Mullane greg@turnstep.com
> weird behavior across fork boundaries; (b) the additional memory space
> that has to be duplicated into child processes will cost something per
> child launch, even if the child never uses it. But these are only
> arguments that it might not *always* be a prudent thing to do, not that
> we shouldn't give the DBA the tool to do it if he wants. So fire away.
Here is a patch for the above, including a documentation update. It
creates a new GUC variable "preload_libraries", that accepts a list in
the form:
preload_libraries = '$libdir/mylib1:initfunc,$libdir/mylib2'
If ":initfunc" is omitted or not found, no initialization function is
executed, but the library is still preloaded. If "$libdir/mylib" isn't
found, the postmaster refuses to start.
In my testing with PL/R, it reduces the first call to a PL/R function
(after connecting) from almost 2 seconds, down to about 8 ms.
Joe Conway
> like that patch still needs some work...
Yeah. I'm really, really, *really* sorry for submitting it in the state
it was in. I shouldn't have done that just before moving to another
country. I found the problem last night, but couldn't get to a Net
connection until now.
The problem is in src/bin/psql/common.c, around line 250-335 somewhere
depending on the version. The 2nd and 3rd clauses of the "while" loop
condition:
(rstatus == PGRES_COPY_IN) &&
(rstatus == PGRES_COPY_OUT))
should of course be:
(rstatus != PGRES_COPY_IN) &&
(rstatus != PGRES_COPY_OUT))
Jeroen T. Vermeulen
. replace CREATE OR REPLACE AGGREGATE with a separate DROP and CREATE
. add DROP for all CREATE OPERATORs
. use IMMUTABLE and STRICT instead of WITH (isStrict)
. add IMMUTABLE and STRICT to int_array_aggregate's accumulator function
Gregory Stark
< * Add GUC log_statement_duration to print statement and >= min duration
> * Add GUC log_statement_and_duration to print statement and >= min duration