185 lines
6.6 KiB
Plaintext
185 lines
6.6 KiB
Plaintext
<!-- $PostgreSQL: pgsql/doc/src/sgml/xaggr.sgml,v 1.35 2007/02/01 00:28:18 momjian Exp $ -->
|
|
|
|
<sect1 id="xaggr">
|
|
<title>User-Defined Aggregates</title>
|
|
|
|
<indexterm zone="xaggr">
|
|
<primary>aggregate function</primary>
|
|
<secondary>user-defined</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Aggregate functions in <productname>PostgreSQL</productname>
|
|
are expressed in terms of <firstterm>state values</firstterm>
|
|
and <firstterm>state transition functions</firstterm>.
|
|
That is, an aggregate operates using a state value that is updated
|
|
as each successive input row is processed.
|
|
To define a new aggregate
|
|
function, one selects a data type for the state value,
|
|
an initial value for the state, and a state transition
|
|
function. The state transition function is just an
|
|
ordinary function that could also be used outside the
|
|
context of the aggregate. A <firstterm>final function</firstterm>
|
|
can also be specified, in case the desired result of the aggregate
|
|
is different from the data that needs to be kept in the running
|
|
state value.
|
|
</para>
|
|
|
|
<para>
|
|
Thus, in addition to the argument and result data types seen by a user
|
|
of the aggregate, there is an internal state-value data type that
|
|
might be different from both the argument and result types.
|
|
</para>
|
|
|
|
<para>
|
|
If we define an aggregate that does not use a final function,
|
|
we have an aggregate that computes a running function of
|
|
the column values from each row. <function>sum</> is an
|
|
example of this kind of aggregate. <function>sum</> starts at
|
|
zero and always adds the current row's value to
|
|
its running total. For example, if we want to make a <function>sum</>
|
|
aggregate to work on a data type for complex numbers,
|
|
we only need the addition function for that data type.
|
|
The aggregate definition would be:
|
|
|
|
<screen>
|
|
CREATE AGGREGATE sum (complex)
|
|
(
|
|
sfunc = complex_add,
|
|
stype = complex,
|
|
initcond = '(0,0)'
|
|
);
|
|
|
|
SELECT sum(a) FROM test_complex;
|
|
|
|
sum
|
|
-----------
|
|
(34,53.9)
|
|
</screen>
|
|
|
|
(Notice that we are relying on function overloading: there is more than
|
|
one aggregate named <function>sum</>, but
|
|
<productname>PostgreSQL</productname> can figure out which kind
|
|
of sum applies to a column of type <type>complex</type>.)
|
|
</para>
|
|
|
|
<para>
|
|
The above definition of <function>sum</function> will return zero (the initial
|
|
state condition) if there are no nonnull input values.
|
|
Perhaps we want to return null in that case instead — the SQL standard
|
|
expects <function>sum</function> to behave that way. We can do this simply by
|
|
omitting the <literal>initcond</literal> phrase, so that the initial state
|
|
condition is null. Ordinarily this would mean that the <literal>sfunc</literal>
|
|
would need to check for a null state-condition input, but for
|
|
<function>sum</function> and some other simple aggregates like
|
|
<function>max</> and <function>min</>,
|
|
it is sufficient to insert the first nonnull input value into
|
|
the state variable and then start applying the transition function
|
|
at the second nonnull input value. <productname>PostgreSQL</productname>
|
|
will do that automatically if the initial condition is null and
|
|
the transition function is marked <quote>strict</> (i.e., not to be called
|
|
for null inputs).
|
|
</para>
|
|
|
|
<para>
|
|
Another bit of default behavior for a <quote>strict</> transition function
|
|
is that the previous state value is retained unchanged whenever a
|
|
null input value is encountered. Thus, null values are ignored. If you
|
|
need some other behavior for null inputs, do not declare your
|
|
transition function as strict; instead code it to test for null inputs and
|
|
do whatever is needed.
|
|
</para>
|
|
|
|
<para>
|
|
<function>avg</> (average) is a more complex example of an aggregate.
|
|
It requires
|
|
two pieces of running state: the sum of the inputs and the count
|
|
of the number of inputs. The final result is obtained by dividing
|
|
these quantities. Average is typically implemented by using a
|
|
two-element array as the state value. For example,
|
|
the built-in implementation of <function>avg(float8)</function>
|
|
looks like:
|
|
|
|
<programlisting>
|
|
CREATE AGGREGATE avg (float8)
|
|
(
|
|
sfunc = float8_accum,
|
|
stype = float8[],
|
|
finalfunc = float8_avg,
|
|
initcond = '{0,0}'
|
|
);
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Aggregate functions can use polymorphic
|
|
state transition functions or final functions, so that the same functions
|
|
can be used to implement multiple aggregates.
|
|
See <xref linkend="extend-types-polymorphic">
|
|
for an explanation of polymorphic functions.
|
|
Going a step further, the aggregate function itself can be specified
|
|
with polymorphic input type(s) and state type, allowing a single
|
|
aggregate definition to serve for multiple input data types.
|
|
Here is an example of a polymorphic aggregate:
|
|
|
|
<programlisting>
|
|
CREATE AGGREGATE array_accum (anyelement)
|
|
(
|
|
sfunc = array_append,
|
|
stype = anyarray,
|
|
initcond = '{}'
|
|
);
|
|
</programlisting>
|
|
|
|
Here, the actual state type for any aggregate call is the array type
|
|
having the actual input type as elements.
|
|
</para>
|
|
|
|
<para>
|
|
Here's the output using two different actual data types as arguments:
|
|
|
|
<programlisting>
|
|
SELECT attrelid::regclass, array_accum(attname)
|
|
FROM pg_attribute
|
|
WHERE attnum > 0 AND attrelid = 'pg_tablespace'::regclass
|
|
GROUP BY attrelid;
|
|
|
|
attrelid | array_accum
|
|
---------------+---------------------------------------
|
|
pg_tablespace | {spcname,spcowner,spclocation,spcacl}
|
|
(1 row)
|
|
|
|
SELECT attrelid::regclass, array_accum(atttypid)
|
|
FROM pg_attribute
|
|
WHERE attnum > 0 AND attrelid = 'pg_tablespace'::regclass
|
|
GROUP BY attrelid;
|
|
|
|
attrelid | array_accum
|
|
---------------+-----------------
|
|
pg_tablespace | {19,26,25,1034}
|
|
(1 row)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
A function written in C can detect that it is being called as an
|
|
aggregate transition or final function by seeing if it was passed
|
|
an <structname>AggState</> node as the function call <quote>context</>,
|
|
for example by:
|
|
<programlisting>
|
|
if (fcinfo->context && IsA(fcinfo->context, AggState))
|
|
</programlisting>
|
|
One reason for checking this is that when it is true, the first input
|
|
must be a temporary transition value and can therefore safely be modified
|
|
in-place rather than allocating a new copy. (This is the <emphasis>only</>
|
|
case where it is safe for a function to modify a pass-by-reference input.)
|
|
See <literal>int8inc()</> for an example.
|
|
</para>
|
|
|
|
<para>
|
|
For further details see the
|
|
<xref linkend="sql-createaggregate" endterm="sql-createaggregate-title">
|
|
command.
|
|
</para>
|
|
</sect1>
|