docs: Improve window function docs
Specifically, the behavior of general-purpose and statistical aggregates as window functions was not clearly documented, and terms were inconsistently used. Also add docs about the difference between cume_dist and percent_rank, rather than just the formulas. Discussion: 20170406214918.GA5757@momjian.us
This commit is contained in:
parent
a4777f3556
commit
1c1a4726eb
@ -328,8 +328,8 @@ COMMIT;
|
||||
A <firstterm>window function</> performs a calculation across a set of
|
||||
table rows that are somehow related to the current row. This is comparable
|
||||
to the type of calculation that can be done with an aggregate function.
|
||||
But unlike regular aggregate functions, use of a window function does not
|
||||
cause rows to become grouped into a single output row — the
|
||||
However, window functions do not cause rows to become grouped into a single
|
||||
output row like non-window aggregate calls would. Instead, the
|
||||
rows retain their separate identities. Behind the scenes, the window
|
||||
function is able to access more than just the current row of the query
|
||||
result.
|
||||
@ -363,20 +363,19 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps
|
||||
<structname>empsalary</>, and there is one output row for each row in the
|
||||
table. The fourth column represents an average taken across all the table
|
||||
rows that have the same <structfield>depname</> value as the current row.
|
||||
(This actually is the same function as the regular <function>avg</>
|
||||
aggregate function, but the <literal>OVER</> clause causes it to be
|
||||
treated as a window function and computed across an appropriate set of
|
||||
rows.)
|
||||
(This actually is the same function as the non-window <function>avg</>
|
||||
aggregate, but the <literal>OVER</> clause causes it to be
|
||||
treated as a window function and computed across the window frame.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
A window function call always contains an <literal>OVER</> clause
|
||||
directly following the window function's name and argument(s). This is what
|
||||
syntactically distinguishes it from a regular function or aggregate
|
||||
function. The <literal>OVER</> clause determines exactly how the
|
||||
syntactically distinguishes it from a normal function or non-window
|
||||
aggregate. The <literal>OVER</> clause determines exactly how the
|
||||
rows of the query are split up for processing by the window function.
|
||||
The <literal>PARTITION BY</> list within <literal>OVER</> specifies
|
||||
dividing the rows into groups, or partitions, that share the same
|
||||
The <literal>PARTITION BY</> clause within <literal>OVER</>
|
||||
divides the rows into groups, or partitions, that share the same
|
||||
values of the <literal>PARTITION BY</> expression(s). For each row,
|
||||
the window function is computed across the rows that fall into the
|
||||
same partition as the current row.
|
||||
@ -411,8 +410,8 @@ FROM empsalary;
|
||||
</screen>
|
||||
|
||||
As shown here, the <function>rank</> function produces a numerical rank
|
||||
within the current row's partition for each distinct <literal>ORDER BY</>
|
||||
value, in the order defined by the <literal>ORDER BY</> clause.
|
||||
for each distinct <literal>ORDER BY</> value in the current row's
|
||||
partition, using the order defined by the <literal>ORDER BY</> clause.
|
||||
<function>rank</> needs no explicit parameter, because its behavior
|
||||
is entirely determined by the <literal>OVER</> clause.
|
||||
</para>
|
||||
@ -424,20 +423,20 @@ FROM empsalary;
|
||||
if any. For example, a row removed because it does not meet the
|
||||
<literal>WHERE</> condition is not seen by any window function.
|
||||
A query can contain multiple window functions that slice up the data
|
||||
in different ways by means of different <literal>OVER</> clauses, but
|
||||
in different ways using different <literal>OVER</> clauses, but
|
||||
they all act on the same collection of rows defined by this virtual table.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We already saw that <literal>ORDER BY</> can be omitted if the ordering
|
||||
of rows is not important. It is also possible to omit <literal>PARTITION
|
||||
BY</>, in which case there is just one partition containing all the rows.
|
||||
BY</>, in which case there is a single partition containing all rows.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There is another important concept associated with window functions:
|
||||
for each row, there is a set of rows within its partition called its
|
||||
<firstterm>window frame</>. Many (but not all) window functions act only
|
||||
<firstterm>window frame</>. Some window functions act only
|
||||
on the rows of the window frame, rather than of the whole partition.
|
||||
By default, if <literal>ORDER BY</> is supplied then the frame consists of
|
||||
all rows from the start of the partition up through the current row, plus
|
||||
@ -515,7 +514,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
|
||||
elsewhere, such as in <literal>GROUP BY</>, <literal>HAVING</>
|
||||
and <literal>WHERE</literal> clauses. This is because they logically
|
||||
execute after the processing of those clauses. Also, window functions
|
||||
execute after regular aggregate functions. This means it is valid to
|
||||
execute after non-window aggregate functions. This means it is valid to
|
||||
include an aggregate function call in the arguments of a window function,
|
||||
but not vice versa.
|
||||
</para>
|
||||
|
@ -13166,13 +13166,14 @@ NULL baz</literallayout>(3 rows)</entry>
|
||||
|
||||
<para>
|
||||
<firstterm>Aggregate functions</firstterm> compute a single result
|
||||
from a set of input values. The built-in normal aggregate functions
|
||||
are listed in
|
||||
<xref linkend="functions-aggregate-table"> and
|
||||
<xref linkend="functions-aggregate-statistics-table">.
|
||||
The built-in ordered-set aggregate functions
|
||||
are listed in <xref linkend="functions-orderedset-table"> and
|
||||
<xref linkend="functions-hypothetical-table">. Grouping operations,
|
||||
from a set of input values. The built-in general-purpose aggregate
|
||||
functions are listed in <xref linkend="functions-aggregate-table">
|
||||
and statistical aggregates in <xref
|
||||
linkend="functions-aggregate-statistics-table">.
|
||||
The built-in within-group ordered-set aggregate functions
|
||||
are listed in <xref linkend="functions-orderedset-table">
|
||||
while the built-in within-group hypothetical-set ones are in <xref
|
||||
linkend="functions-hypothetical-table">. Grouping operations,
|
||||
which are closely related to aggregate functions, are listed in
|
||||
<xref linkend="functions-grouping-table">.
|
||||
The special syntax considerations for aggregate
|
||||
@ -14413,17 +14414,19 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
|
||||
<para>
|
||||
The built-in window functions are listed in
|
||||
<xref linkend="functions-window-table">. Note that these functions
|
||||
<emphasis>must</> be invoked using window function syntax; that is an
|
||||
<emphasis>must</> be invoked using window function syntax, i.e., an
|
||||
<literal>OVER</> clause is required.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
In addition to these functions, any built-in or user-defined normal
|
||||
aggregate function (but not ordered-set or hypothetical-set aggregates)
|
||||
In addition to these functions, any built-in or user-defined
|
||||
general-purpose or statistical
|
||||
aggregate (i.e., not ordered-set or hypothetical-set aggregates)
|
||||
can be used as a window function; see
|
||||
<xref linkend="functions-aggregate"> for a list of the built-in aggregates.
|
||||
Aggregate functions act as window functions only when an <literal>OVER</>
|
||||
clause follows the call; otherwise they act as regular aggregates.
|
||||
clause follows the call; otherwise they act as non-window aggregates
|
||||
and return a single row for the entire set.
|
||||
</para>
|
||||
|
||||
<table id="functions-window-table">
|
||||
@ -14488,7 +14491,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
|
||||
<entry>
|
||||
<type>double precision</type>
|
||||
</entry>
|
||||
<entry>relative rank of the current row: (<function>rank</> - 1) / (total rows - 1)</entry>
|
||||
<entry>relative rank of the current row: (<function>rank</> - 1) / (total partition rows - 1)</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
@ -14501,7 +14504,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
|
||||
<entry>
|
||||
<type>double precision</type>
|
||||
</entry>
|
||||
<entry>relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)</entry>
|
||||
<entry>cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows</entry>
|
||||
</row>
|
||||
|
||||
<row>
|
||||
@ -14634,9 +14637,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
|
||||
All of the functions listed in
|
||||
<xref linkend="functions-window-table"> depend on the sort ordering
|
||||
specified by the <literal>ORDER BY</> clause of the associated window
|
||||
definition. Rows that are not distinct in the <literal>ORDER BY</>
|
||||
ordering are said to be <firstterm>peers</>; the four ranking functions
|
||||
are defined so that they give the same answer for any two peer rows.
|
||||
definition. Rows that are not distinct when considering only the
|
||||
<literal>ORDER BY</> columns are said to be <firstterm>peers</>.
|
||||
The four ranking functions (including <function>cume_dist</>) are
|
||||
defined so that they give the same answer for all peer rows.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -14679,6 +14683,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
<function>cume_dist</> computes the fraction of partition rows that
|
||||
are less than or equal to the current row and its peers, while
|
||||
<function>percent_rank</> computes the fraction of partition rows that
|
||||
are less than the current row, assuming the current row does not exist
|
||||
in the partition.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="functions-subquery">
|
||||
|
@ -1664,7 +1664,8 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
|
||||
<para>
|
||||
Placing <literal>ORDER BY</> within the aggregate's regular argument
|
||||
list, as described so far, is used when ordering the input rows for
|
||||
a <quote>normal</> aggregate for which ordering is optional. There is a
|
||||
general-purpose and statistical aggregates, for which ordering is
|
||||
optional. There is a
|
||||
subclass of aggregate functions called <firstterm>ordered-set
|
||||
aggregates</> for which an <replaceable>order_by_clause</replaceable>
|
||||
is <emphasis>required</>, usually because the aggregate's computation is
|
||||
@ -1675,7 +1676,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
|
||||
inside <literal>WITHIN GROUP (...)</>, as shown in the final syntax
|
||||
alternative above. The expressions in
|
||||
the <replaceable>order_by_clause</replaceable> are evaluated once per
|
||||
input row just like normal aggregate arguments, sorted as per
|
||||
input row just like regular aggregate arguments, sorted as per
|
||||
the <replaceable>order_by_clause</replaceable>'s requirements, and fed
|
||||
to the aggregate function as input arguments. (This is unlike the case
|
||||
for a non-<literal>WITHIN GROUP</> <replaceable>order_by_clause</>,
|
||||
@ -1683,7 +1684,7 @@ SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
|
||||
argument expressions preceding <literal>WITHIN GROUP</>, if any, are
|
||||
called <firstterm>direct arguments</> to distinguish them from
|
||||
the <firstterm>aggregated arguments</> listed in
|
||||
the <replaceable>order_by_clause</replaceable>. Unlike normal aggregate
|
||||
the <replaceable>order_by_clause</replaceable>. Unlike regular aggregate
|
||||
arguments, direct arguments are evaluated only once per aggregate call,
|
||||
not once per input row. This means that they can contain variables only
|
||||
if those variables are grouped by <literal>GROUP BY</>; this restriction
|
||||
@ -1779,10 +1780,10 @@ FROM generate_series(1,10) AS s(i);
|
||||
<para>
|
||||
A <firstterm>window function call</firstterm> represents the application
|
||||
of an aggregate-like function over some portion of the rows selected
|
||||
by a query. Unlike regular aggregate function calls, this is not tied
|
||||
by a query. Unlike non-window aggregate calls, this is not tied
|
||||
to grouping of the selected rows into a single output row — each
|
||||
row remains separate in the query output. However the window function
|
||||
is able to scan all the rows that would be part of the current row's
|
||||
has access to all the rows that would be part of the current row's
|
||||
group according to the grouping specification (<literal>PARTITION BY</>
|
||||
list) of the window function call.
|
||||
The syntax of a window function call is one of the following:
|
||||
@ -1831,20 +1832,20 @@ UNBOUNDED FOLLOWING
|
||||
named window in the <literal>WINDOW</literal> clause; see the
|
||||
<xref linkend="sql-select"> reference page for details. It's worth
|
||||
pointing out that <literal>OVER wname</> is not exactly equivalent to
|
||||
<literal>OVER (wname)</>; the latter implies copying and modifying the
|
||||
<literal>OVER (wname ...)</>; the latter implies copying and modifying the
|
||||
window definition, and will be rejected if the referenced window
|
||||
specification includes a frame clause.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The <literal>PARTITION BY</> option groups the rows of the query into
|
||||
The <literal>PARTITION BY</> clause groups the rows of the query into
|
||||
<firstterm>partitions</>, which are processed separately by the window
|
||||
function. <literal>PARTITION BY</> works similarly to a query-level
|
||||
<literal>GROUP BY</> clause, except that its expressions are always just
|
||||
expressions and cannot be output-column names or numbers.
|
||||
Without <literal>PARTITION BY</>, all rows produced by the query are
|
||||
treated as a single partition.
|
||||
The <literal>ORDER BY</> option determines the order in which the rows
|
||||
The <literal>ORDER BY</> clause determines the order in which the rows
|
||||
of a partition are processed by the window function. It works similarly
|
||||
to a query-level <literal>ORDER BY</> clause, but likewise cannot use
|
||||
output-column names or numbers. Without <literal>ORDER BY</>, rows are
|
||||
@ -1921,17 +1922,17 @@ UNBOUNDED FOLLOWING
|
||||
<para>
|
||||
The built-in window functions are described in <xref
|
||||
linkend="functions-window-table">. Other window functions can be added by
|
||||
the user. Also, any built-in or user-defined normal aggregate function
|
||||
can be used as a window function. Ordered-set aggregates presently
|
||||
cannot be used as window functions, however.
|
||||
the user. Also, any built-in or user-defined general-purpose or
|
||||
statistical aggregate can be used as a window function. (Ordered-set
|
||||
and hypothetical-set aggregates cannot presently be used as window functions.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The syntaxes using <literal>*</> are used for calling parameter-less
|
||||
aggregate functions as window functions, for example
|
||||
<literal>count(*) OVER (PARTITION BY x ORDER BY y)</>.
|
||||
The asterisk (<literal>*</>) is customarily not used for non-aggregate window functions.
|
||||
Aggregate window functions, unlike normal aggregate functions, do not
|
||||
The asterisk (<literal>*</>) is customarily not used for
|
||||
window-specific functions. Window-specific functions do not
|
||||
allow <literal>DISTINCT</> or <literal>ORDER BY</> to be used within the
|
||||
function argument list.
|
||||
</para>
|
||||
|
Loading…
x
Reference in New Issue
Block a user