1531 lines
52 KiB
Plaintext
1531 lines
52 KiB
Plaintext
<!--
|
|
$Header: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v 1.68 2002/08/23 04:27:19 momjian Exp $
|
|
-->
|
|
|
|
<chapter id="sql-syntax">
|
|
<title>SQL Syntax</title>
|
|
|
|
<indexterm zone="sql-syntax">
|
|
<primary>syntax</primary>
|
|
<secondary>SQL</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
This chapter describes the syntax of SQL. It forms the foundation
|
|
for understanding the following chapters which will go into detail
|
|
about how the SQL commands are applied to define and modify data.
|
|
</para>
|
|
|
|
<para>
|
|
We also advise users who are already familiar with SQL to read this
|
|
chapter carefully because there are several rules and concepts that
|
|
are implemented inconsistently among SQL databases or that are
|
|
specific to <productname>PostgreSQL</productname>.
|
|
</para>
|
|
|
|
<sect1 id="sql-syntax-lexical">
|
|
<title>Lexical Structure</title>
|
|
|
|
<para>
|
|
SQL input consists of a sequence of
|
|
<firstterm>commands</firstterm>. A command is composed of a
|
|
sequence of <firstterm>tokens</firstterm>, terminated by a
|
|
semicolon (<quote>;</quote>). The end of the input stream also
|
|
terminates a command. Which tokens are valid depends on the syntax
|
|
of the particular command.
|
|
</para>
|
|
|
|
<para>
|
|
A token can be a <firstterm>key word</firstterm>, an
|
|
<firstterm>identifier</firstterm>, a <firstterm>quoted
|
|
identifier</firstterm>, a <firstterm>literal</firstterm> (or
|
|
constant), or a special character symbol. Tokens are normally
|
|
separated by whitespace (space, tab, newline), but need not be if
|
|
there is no ambiguity (which is generally only the case if a
|
|
special character is adjacent to some other token type).
|
|
</para>
|
|
|
|
<para>
|
|
Additionally, <firstterm>comments</firstterm> can occur in SQL
|
|
input. They are not tokens, they are effectively equivalent to
|
|
whitespace.
|
|
</para>
|
|
|
|
<para>
|
|
For example, the following is (syntactically) valid SQL input:
|
|
<programlisting>
|
|
SELECT * FROM MY_TABLE;
|
|
UPDATE MY_TABLE SET A = 5;
|
|
INSERT INTO MY_TABLE VALUES (3, 'hi there');
|
|
</programlisting>
|
|
This is a sequence of three commands, one per line (although this
|
|
is not required; more than one command can be on a line, and
|
|
commands can usefully be split across lines).
|
|
</para>
|
|
|
|
<para>
|
|
The SQL syntax is not very consistent regarding what tokens
|
|
identify commands and which are operands or parameters. The first
|
|
few tokens are generally the command name, so in the above example
|
|
we would usually speak of a <quote>SELECT</quote>, an
|
|
<quote>UPDATE</quote>, and an <quote>INSERT</quote> command. But
|
|
for instance the <command>UPDATE</command> command always requires
|
|
a <token>SET</token> token to appear in a certain position, and
|
|
this particular variation of <command>INSERT</command> also
|
|
requires a <token>VALUES</token> in order to be complete. The
|
|
precise syntax rules for each command are described in the
|
|
<citetitle>Reference Manual</citetitle>.
|
|
</para>
|
|
|
|
<sect2 id="sql-syntax-identifiers">
|
|
<title>Identifiers and Key Words</title>
|
|
|
|
<indexterm zone="sql-syntax-identifiers">
|
|
<primary>identifiers</primary>
|
|
</indexterm>
|
|
|
|
<indexterm zone="sql-syntax-identifiers">
|
|
<primary>key words</primary>
|
|
<secondary>syntax</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
|
|
<token>VALUES</token> in the example above are examples of
|
|
<firstterm>key words</firstterm>, that is, words that have a fixed
|
|
meaning in the SQL language. The tokens <token>MY_TABLE</token>
|
|
and <token>A</token> are examples of
|
|
<firstterm>identifiers</firstterm>. They identify names of
|
|
tables, columns, or other database objects, depending on the
|
|
command they are used in. Therefore they are sometimes simply
|
|
called <quote>names</quote>. Key words and identifiers have the
|
|
same lexical structure, meaning that one cannot know whether a
|
|
token is an identifier or a key word without knowing the language.
|
|
A complete list of key words can be found in <xref
|
|
linkend="sql-keywords-appendix">.
|
|
</para>
|
|
|
|
<para>
|
|
SQL identifiers and key words must begin with a letter
|
|
(<literal>a</literal>-<literal>z</literal>, but also letters with
|
|
diacritical marks and non-Latin letters) or an underscore
|
|
(<literal>_</literal>). Subsequent characters in an identifier or
|
|
key word can be letters, digits
|
|
(<literal>0</literal>-<literal>9</literal>), or underscores,
|
|
although the SQL standard will not define a key word that contains
|
|
digits or starts or ends with an underscore.
|
|
</para>
|
|
|
|
<para>
|
|
The system uses no more than <symbol>NAMEDATALEN</symbol>-1
|
|
characters of an identifier; longer names can be written in
|
|
commands, but they will be truncated. By default,
|
|
<symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier length
|
|
is 63 (but at the time the system is built,
|
|
<symbol>NAMEDATALEN</symbol> can be changed in
|
|
<filename>src/include/postgres_ext.h</filename>).
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>case sensitivity</primary>
|
|
<secondary>SQL commands</secondary>
|
|
</indexterm>
|
|
Identifier and key word names are case insensitive. Therefore
|
|
<programlisting>
|
|
UPDATE MY_TABLE SET A = 5;
|
|
</programlisting>
|
|
can equivalently be written as
|
|
<programlisting>
|
|
uPDaTE my_TabLE SeT a = 5;
|
|
</programlisting>
|
|
A convention often used is to write key words in upper
|
|
case and names in lower case, e.g.,
|
|
<programlisting>
|
|
UPDATE my_table SET a = 5;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>quotes</primary>
|
|
<secondary>and identifiers</secondary>
|
|
</indexterm>
|
|
There is a second kind of identifier: the <firstterm>delimited
|
|
identifier</firstterm> or <firstterm>quoted
|
|
identifier</firstterm>. It is formed by enclosing an arbitrary
|
|
sequence of characters in double-quotes
|
|
(<literal>"</literal>). <!-- " font-lock mania --> A delimited
|
|
identifier is always an identifier, never a key word. So
|
|
<literal>"select"</literal> could be used to refer to a column or
|
|
table named <quote>select</quote>, whereas an unquoted
|
|
<literal>select</literal> would be taken as a key word and
|
|
would therefore provoke a parse error when used where a table or
|
|
column name is expected. The example can be written with quoted
|
|
identifiers like this:
|
|
<programlisting>
|
|
UPDATE "my_table" SET "a" = 5;
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Quoted identifiers can contain any character other than a double
|
|
quote itself. This allows constructing table or column names that
|
|
would otherwise not be possible, such as ones containing spaces or
|
|
ampersands. The length limitation still applies.
|
|
</para>
|
|
|
|
<para>
|
|
Quoting an identifier also makes it case-sensitive, whereas
|
|
unquoted names are always folded to lower case. For example, the
|
|
identifiers <literal>FOO</literal>, <literal>foo</literal> and
|
|
<literal>"foo"</literal> are considered the same by
|
|
<productname>PostgreSQL</productname>, but <literal>"Foo"</literal>
|
|
and <literal>"FOO"</literal> are different from these three and
|
|
each other.
|
|
<footnote>
|
|
<para>
|
|
The folding of unquoted names to lower case in <productname>PostgreSQL</>
|
|
is incompatible with the SQL standard, which says that unquoted
|
|
names should be folded to upper case. Thus, <literal>foo</literal>
|
|
should be equivalent to <literal>"FOO"</literal> not
|
|
<literal>"foo"</literal> according to the standard. If you want to
|
|
write portable applications you are advised to always quote a particular
|
|
name or never quote it.
|
|
</para>
|
|
</footnote>
|
|
</para>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="sql-syntax-constants">
|
|
<title>Constants</title>
|
|
|
|
<indexterm zone="sql-syntax-constants">
|
|
<primary>constants</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
There are four kinds of <firstterm>implicitly-typed
|
|
constants</firstterm> in <productname>PostgreSQL</productname>:
|
|
strings, bit strings, integers, and floating-point numbers.
|
|
Constants can also be specified with explicit types, which can
|
|
enable more accurate representation and more efficient handling by
|
|
the system. The implicit constants are described below; explicit
|
|
constants are discussed afterwards.
|
|
</para>
|
|
|
|
<sect3 id="sql-syntax-strings">
|
|
<title>String Constants</title>
|
|
|
|
<indexterm zone="sql-syntax-strings">
|
|
<primary>character strings</primary>
|
|
<secondary>constants</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<indexterm>
|
|
<primary>quotes</primary>
|
|
<secondary>escaping</secondary>
|
|
</indexterm>
|
|
A string constant in SQL is an arbitrary sequence of characters
|
|
bounded by single quotes (<quote>'</quote>), e.g., <literal>'This
|
|
is a string'</literal>. SQL allows single quotes to be embedded
|
|
in strings by typing two adjacent single quotes (e.g.,
|
|
<literal>'Dianne''s horse'</literal>). In
|
|
<productname>PostgreSQL</productname> single quotes may
|
|
alternatively be escaped with a backslash (<quote>\</quote>,
|
|
e.g., <literal>'Dianne\'s horse'</literal>).
|
|
</para>
|
|
|
|
<para>
|
|
C-style backslash escapes are also available:
|
|
<literal>\b</literal> is a backspace, <literal>\f</literal> is a
|
|
form feed, <literal>\n</literal> is a newline,
|
|
<literal>\r</literal> is a carriage return, <literal>\t</literal>
|
|
is a tab, and <literal>\<replaceable>xxx</replaceable></literal>,
|
|
where <replaceable>xxx</replaceable> is an octal number, is the
|
|
character with the corresponding ASCII code. Any other character
|
|
following a backslash is taken literally. Thus, to include a
|
|
backslash in a string constant, type two backslashes.
|
|
</para>
|
|
|
|
<para>
|
|
The character with the code zero cannot be in a string constant.
|
|
</para>
|
|
|
|
<para>
|
|
Two string constants that are only separated by whitespace
|
|
<emphasis>with at least one newline</emphasis> are concatenated
|
|
and effectively treated as if the string had been written in one
|
|
constant. For example:
|
|
<programlisting>
|
|
SELECT 'foo'
|
|
'bar';
|
|
</programlisting>
|
|
is equivalent to
|
|
<programlisting>
|
|
SELECT 'foobar';
|
|
</programlisting>
|
|
but
|
|
<programlisting>
|
|
SELECT 'foo' 'bar';
|
|
</programlisting>
|
|
is not valid syntax, and <productname>PostgreSQL</productname> is
|
|
consistent with <acronym>SQL9x</acronym> in this regard.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="sql-syntax-bit-strings">
|
|
<title>Bit-String Constants</title>
|
|
|
|
<indexterm zone="sql-syntax-bit-strings">
|
|
<primary>bit strings</primary>
|
|
<secondary>constants</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Bit-string constants look like string constants with a
|
|
<literal>B</literal> (upper or lower case) immediately before the
|
|
opening quote (no intervening whitespace), e.g.,
|
|
<literal>B'1001'</literal>. The only characters allowed within
|
|
bit-string constants are <literal>0</literal> and
|
|
<literal>1</literal>. Bit-string constants can be continued
|
|
across lines in the same way as regular string constants.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Integer Constants</title>
|
|
|
|
<para>
|
|
Integer constants in SQL are sequences of decimal digits (0
|
|
though 9) with no decimal point and no exponent. The range of legal values
|
|
depends on which integer data type is used, but the plain
|
|
<type>integer</type> type accepts values ranging from -2147483648
|
|
to +2147483647. (The optional plus or minus sign is actually a
|
|
separate unary operator and not part of the integer constant.)
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Floating-Point Constants</title>
|
|
|
|
<indexterm>
|
|
<primary>floating point</primary>
|
|
<secondary>constants</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Floating-point constants are accepted in these general forms:
|
|
<synopsis>
|
|
<replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
|
|
<optional><replaceable>digits</replaceable></optional>.<replaceable>digits</replaceable><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
|
|
<replaceable>digits</replaceable>e<optional>+-</optional><replaceable>digits</replaceable>
|
|
</synopsis>
|
|
where <replaceable>digits</replaceable> is one or more decimal
|
|
digits. At least one digit must be before or after the decimal
|
|
point. At least one digit must follow the exponent delimiter
|
|
(<literal>e</literal>) if that field is present.
|
|
Thus, a floating-point constant is distinguished from an integer
|
|
constant by the presence of either the decimal point or the
|
|
exponent clause (or both). There must not be a space or other
|
|
characters embedded in the constant.
|
|
</para>
|
|
|
|
<para>
|
|
These are some examples of valid floating-point constants:
|
|
<literallayout>
|
|
3.5
|
|
4.
|
|
.001
|
|
5e2
|
|
1.925e-3
|
|
</literallayout>
|
|
</para>
|
|
|
|
<para>
|
|
Floating-point constants are of type <type>DOUBLE
|
|
PRECISION</type>. <type>REAL</type> can be specified explicitly
|
|
by using <acronym>SQL</acronym> string notation or
|
|
<productname>PostgreSQL</productname> type notation:
|
|
|
|
<programlisting>
|
|
REAL '1.23' -- string style
|
|
'1.23'::REAL -- PostgreSQL (historical) style
|
|
</programlisting>
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3 id="sql-syntax-constants-generic">
|
|
<title>Constants of Other Types</title>
|
|
|
|
<indexterm>
|
|
<primary>data types</primary>
|
|
<secondary>constants</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A constant of an <emphasis>arbitrary</emphasis> type can be
|
|
entered using any one of the following notations:
|
|
<synopsis>
|
|
<replaceable>type</replaceable> '<replaceable>string</replaceable>'
|
|
'<replaceable>string</replaceable>'::<replaceable>type</replaceable>
|
|
CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
|
|
</synopsis>
|
|
The string's text is passed to the input conversion
|
|
routine for the type called <replaceable>type</replaceable>. The
|
|
result is a constant of the indicated type. The explicit type
|
|
cast may be omitted if there is no ambiguity as to the type the
|
|
constant must be (for example, when it is passed as an argument
|
|
to a non-overloaded function), in which case it is automatically
|
|
coerced.
|
|
</para>
|
|
|
|
<para>
|
|
It is also possible to specify a type coercion using a function-like
|
|
syntax:
|
|
<synopsis>
|
|
<replaceable>typename</replaceable> ( '<replaceable>string</replaceable>' )
|
|
</synopsis>
|
|
but not all type names may be used in this way; see <xref
|
|
linkend="sql-syntax-type-casts"> for details.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>::</literal>, <literal>CAST()</literal>, and
|
|
function-call syntaxes can also be used to specify run-time type
|
|
conversions of arbitrary expressions, as discussed in <xref
|
|
linkend="sql-syntax-type-casts">. But the form
|
|
<replaceable>type</replaceable> '<replaceable>string</replaceable>'
|
|
can only be used to specify the type of a literal constant.
|
|
Another restriction on
|
|
<replaceable>type</replaceable> '<replaceable>string</replaceable>'
|
|
is that it does not work for array types; use <literal>::</literal>
|
|
or <literal>CAST()</literal> to specify the type of an array constant.
|
|
</para>
|
|
</sect3>
|
|
|
|
<sect3>
|
|
<title>Array constants</title>
|
|
|
|
<indexterm>
|
|
<primary>arrays</primary>
|
|
<secondary>constants</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The general format of an array constant is the following:
|
|
<synopsis>
|
|
'{ <replaceable>val1</replaceable> <replaceable>delim</replaceable> <replaceable>val2</replaceable> <replaceable>delim</replaceable> ... }'
|
|
</synopsis>
|
|
where <replaceable>delim</replaceable> is the delimiter character
|
|
for the type, as recorded in its <literal>pg_type</literal>
|
|
entry. (For all built-in types, this is the comma character
|
|
<quote><literal>,</literal></>.) Each <replaceable>val</replaceable> is either a constant
|
|
of the array element type, or a subarray. An example of an
|
|
array constant is
|
|
<programlisting>
|
|
'{{1,2,3},{4,5,6},{7,8,9}}'
|
|
</programlisting>
|
|
This constant is a two-dimensional, 3-by-3 array consisting of three
|
|
subarrays of integers.
|
|
</para>
|
|
|
|
<para>
|
|
Individual array elements can be placed between double-quote
|
|
marks (<literal>"</literal>) <!-- " --> to avoid ambiguity
|
|
problems with respect to whitespace. Without quote marks, the
|
|
array-value parser will skip leading whitespace.
|
|
</para>
|
|
|
|
<para>
|
|
(Array constants are actually only a special case of the generic
|
|
type constants discussed in the previous section. The constant
|
|
is initially treated as a string and passed to the array input
|
|
conversion routine. An explicit type specification might be
|
|
necessary.)
|
|
</para>
|
|
</sect3>
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="sql-syntax-operators">
|
|
<title>Operators</title>
|
|
|
|
<indexterm zone="sql-syntax-operators">
|
|
<primary>operators</primary>
|
|
<secondary>syntax</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
An operator is a sequence of up to <symbol>NAMEDATALEN</symbol>-1
|
|
(63 by default) characters from the following list:
|
|
<literallayout>
|
|
+ - * / < > = ~ ! @ # % ^ & | ` ? $
|
|
</literallayout>
|
|
|
|
There are a few restrictions on operator names, however:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<literal>$</> (dollar) cannot be a single-character operator, although it
|
|
can be part of a multiple-character operator name.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
<literal>--</literal> and <literal>/*</literal> cannot appear
|
|
anywhere in an operator name, since they will be taken as the
|
|
start of a comment.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A multiple-character operator name cannot end in <literal>+</> or <literal>-</>,
|
|
unless the name also contains at least one of these characters:
|
|
<literallayout>
|
|
~ ! @ # % ^ & | ` ? $
|
|
</literallayout>
|
|
For example, <literal>@-</literal> is an allowed operator name,
|
|
but <literal>*-</literal> is not. This restriction allows
|
|
<productname>PostgreSQL</productname> to parse SQL-compliant
|
|
queries without requiring spaces between tokens.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
When working with non-SQL-standard operator names, you will usually
|
|
need to separate adjacent operators with spaces to avoid ambiguity.
|
|
For example, if you have defined a left unary operator named <literal>@</literal>,
|
|
you cannot write <literal>X*@Y</literal>; you must write
|
|
<literal>X* @Y</literal> to ensure that
|
|
<productname>PostgreSQL</productname> reads it as two operator names
|
|
not one.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Special Characters</title>
|
|
|
|
<para>
|
|
Some characters that are not alphanumeric have a special meaning
|
|
that is different from being an operator. Details on the usage can
|
|
be found at the location where the respective syntax element is
|
|
described. This section only exists to advise the existence and
|
|
summarize the purposes of these characters.
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
A dollar sign (<literal>$</literal>) followed by digits is used
|
|
to represent the positional parameters in the body of a function
|
|
definition. In other contexts the dollar sign may be part of an
|
|
operator name.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Parentheses (<literal>()</literal>) have their usual meaning to
|
|
group expressions and enforce precedence. In some cases
|
|
parentheses are required as part of the fixed syntax of a
|
|
particular SQL command.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Brackets (<literal>[]</literal>) are used to select the elements
|
|
of an array. See <xref linkend="arrays"> for more information
|
|
on arrays.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
Commas (<literal>,</literal>) are used in some syntactical
|
|
constructs to separate the elements of a list.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The semicolon (<literal>;</literal>) terminates an SQL command.
|
|
It cannot appear anywhere within a command, except within a
|
|
string constant or quoted identifier.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The colon (<literal>:</literal>) is used to select
|
|
<quote>slices</quote> from arrays. (See <xref
|
|
linkend="arrays">.) In certain SQL dialects (such as Embedded
|
|
SQL), the colon is used to prefix variable names.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The asterisk (<literal>*</literal>) has a special meaning when
|
|
used in the <command>SELECT</command> command or with the
|
|
<function>COUNT</function> aggregate function.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
The period (<literal>.</literal>) is used in floating-point
|
|
constants, and to separate table and column names.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="sql-syntax-comments">
|
|
<title>Comments</title>
|
|
|
|
<indexterm zone="sql-syntax-comments">
|
|
<primary>comments</primary>
|
|
<secondary>in SQL</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A comment is an arbitrary sequence of characters beginning with
|
|
double dashes and extending to the end of the line, e.g.:
|
|
<programlisting>
|
|
-- This is a standard SQL92 comment
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
Alternatively, C-style block comments can be used:
|
|
<programlisting>
|
|
/* multiline comment
|
|
* with nesting: /* nested block comment */
|
|
*/
|
|
</programlisting>
|
|
where the comment begins with <literal>/*</literal> and extends to
|
|
the matching occurrence of <literal>*/</literal>. These block
|
|
comments nest, as specified in SQL99 but unlike C, so that one can
|
|
comment out larger blocks of code that may contain existing block
|
|
comments.
|
|
</para>
|
|
|
|
<para>
|
|
A comment is removed from the input stream before further syntax
|
|
analysis and is effectively replaced by whitespace.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="sql-precedence">
|
|
<title>Lexical Precedence</title>
|
|
|
|
<indexterm zone="sql-precedence">
|
|
<primary>operators</primary>
|
|
<secondary>precedence</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The precedence and associativity of the operators is hard-wired
|
|
into the parser. Most operators have the same precedence and are
|
|
left-associative. This may lead to non-intuitive behavior; for
|
|
example the Boolean operators <literal><</> and <literal>></> have a different
|
|
precedence than the Boolean operators <literal><=</> and <literal>>=</>. Also,
|
|
you will sometimes need to add parentheses when using combinations
|
|
of binary and unary operators. For instance
|
|
<programlisting>
|
|
SELECT 5 ! - 6;
|
|
</programlisting>
|
|
will be parsed as
|
|
<programlisting>
|
|
SELECT 5 ! (- 6);
|
|
</programlisting>
|
|
because the parser has no idea -- until it is too late -- that
|
|
<token>!</token> is defined as a postfix operator, not an infix one.
|
|
To get the desired behavior in this case, you must write
|
|
<programlisting>
|
|
SELECT (5 !) - 6;
|
|
</programlisting>
|
|
This is the price one pays for extensibility.
|
|
</para>
|
|
|
|
<table tocentry="1">
|
|
<title>Operator Precedence (decreasing)</title>
|
|
|
|
<tgroup cols="3">
|
|
<thead>
|
|
<row>
|
|
<entry>Operator/Element</entry>
|
|
<entry>Associativity</entry>
|
|
<entry>Description</entry>
|
|
</row>
|
|
</thead>
|
|
|
|
<tbody>
|
|
<row>
|
|
<entry><token>.</token></entry>
|
|
<entry>left</entry>
|
|
<entry>table/column name separator</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>::</token></entry>
|
|
<entry>left</entry>
|
|
<entry><productname>PostgreSQL</productname>-style typecast</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>[</token> <token>]</token></entry>
|
|
<entry>left</entry>
|
|
<entry>array element selection</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>-</token></entry>
|
|
<entry>right</entry>
|
|
<entry>unary minus</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>^</token></entry>
|
|
<entry>left</entry>
|
|
<entry>exponentiation</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>*</token> <token>/</token> <token>%</token></entry>
|
|
<entry>left</entry>
|
|
<entry>multiplication, division, modulo</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>+</token> <token>-</token></entry>
|
|
<entry>left</entry>
|
|
<entry>addition, subtraction</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>IS</token></entry>
|
|
<entry></entry>
|
|
<entry>test for TRUE, FALSE, UNKNOWN, NULL</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>ISNULL</token></entry>
|
|
<entry></entry>
|
|
<entry>test for NULL</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>NOTNULL</token></entry>
|
|
<entry></entry>
|
|
<entry>test for NOT NULL</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry>(any other)</entry>
|
|
<entry>left</entry>
|
|
<entry>all other native and user-defined operators</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>IN</token></entry>
|
|
<entry></entry>
|
|
<entry>set membership</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>BETWEEN</token></entry>
|
|
<entry></entry>
|
|
<entry>containment</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>OVERLAPS</token></entry>
|
|
<entry></entry>
|
|
<entry>time interval overlap</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>LIKE</token> <token>ILIKE</token> <token>SIMILAR</token></entry>
|
|
<entry></entry>
|
|
<entry>string pattern matching</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token><</token> <token>></token></entry>
|
|
<entry></entry>
|
|
<entry>less than, greater than</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>=</token></entry>
|
|
<entry>right</entry>
|
|
<entry>equality, assignment</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>NOT</token></entry>
|
|
<entry>right</entry>
|
|
<entry>logical negation</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>AND</token></entry>
|
|
<entry>left</entry>
|
|
<entry>logical conjunction</entry>
|
|
</row>
|
|
|
|
<row>
|
|
<entry><token>OR</token></entry>
|
|
<entry>left</entry>
|
|
<entry>logical disjunction</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<para>
|
|
Note that the operator precedence rules also apply to user-defined
|
|
operators that have the same names as the built-in operators
|
|
mentioned above. For example, if you define a
|
|
<quote>+</quote> operator for some custom data type it will have
|
|
the same precedence as the built-in <quote>+</quote> operator, no
|
|
matter what yours does.
|
|
</para>
|
|
|
|
<para>
|
|
When a schema-qualified operator name is used in the
|
|
<literal>OPERATOR</> syntax, as for example in
|
|
<programlisting>
|
|
SELECT 3 OPERATOR(pg_catalog.+) 4;
|
|
</programlisting>
|
|
the <literal>OPERATOR</> construct is taken to have the default precedence
|
|
shown above for <quote>any other</> operator. This is true no matter
|
|
which specific operator name appears inside <literal>OPERATOR()</>.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="sql-naming">
|
|
<title>Schemas and Naming Conventions</title>
|
|
|
|
<indexterm>
|
|
<primary>schemas</primary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>search path</primary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>namespaces</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A <productname>PostgreSQL</productname> database cluster (installation)
|
|
contains one or more named databases. Users and groups of users are
|
|
shared across the entire cluster, but no other data is shared across
|
|
databases. Any given client connection to the server can access
|
|
only the data in a single database, the one specified in the connection
|
|
request.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Users of a cluster do not necessarily have the privilege to access every
|
|
database in the cluster. Sharing of user names means that there
|
|
cannot be different users named, say, <literal>joe</> in two databases
|
|
in the same cluster; but the system can be configured to allow
|
|
<literal>joe</> access to only some of the databases.
|
|
</para>
|
|
</note>
|
|
|
|
<sect2>
|
|
<title>Schema Object Names</title>
|
|
|
|
<para>
|
|
A database contains one or more named <firstterm>schemas</>, which
|
|
in turn contain tables. Schemas also contain other kinds of named
|
|
objects, including datatypes, functions, and operators. The same
|
|
object name can be used in different schemas without conflict; for
|
|
example, both <literal>schema1</> and <literal>myschema</> may
|
|
contain tables named <literal>mytable</>. Unlike databases, schemas
|
|
are not rigidly separated: a user may access objects in any of the
|
|
schemas in the database he is connected to, if he has privileges
|
|
to do so.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>qualified names</primary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>names</primary>
|
|
<secondary>qualified</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
To name a table precisely, write a <firstterm>qualified name</> consisting
|
|
of the schema name and table name separated by a dot:
|
|
<synopsis>
|
|
<replaceable>schema</><literal>.</><replaceable>table</>
|
|
</synopsis>
|
|
Actually, the even more general syntax
|
|
<synopsis>
|
|
<replaceable>database</><literal>.</><replaceable>schema</><literal>.</><replaceable>table</>
|
|
</synopsis>
|
|
can be used too, but at present this is just for pro-forma compliance
|
|
with the SQL standard; if you write a database name it must be the
|
|
same as the database you are connected to.
|
|
</para>
|
|
|
|
<indexterm>
|
|
<primary>unqualified names</primary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>names</primary>
|
|
<secondary>unqualified</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
Qualified names are tedious to write, and it's often best not to
|
|
wire a particular schema name into applications anyway. Therefore
|
|
tables are often referred to by <firstterm>unqualified names</>,
|
|
which consist of just the table name. The system determines which table
|
|
is meant by following a <firstterm>search path</>, which is a list
|
|
of schemas to look in. The first matching table in the search path
|
|
is taken to be the one wanted. If there is no match in the search
|
|
path, an error is reported, even if matching table names exist
|
|
in other schemas in the database.
|
|
</para>
|
|
|
|
<para>
|
|
The first schema named in the search path is called the current schema.
|
|
Aside from being the first schema searched, it is also the schema in
|
|
which new tables will be created if the <command>CREATE TABLE</>
|
|
command does not specify a schema name.
|
|
</para>
|
|
|
|
<para>
|
|
The search path works in the same way for datatype names, function names,
|
|
and operator names as it does for table names. Datatype and function
|
|
names can be qualified in exactly the same way as table names. If you
|
|
need to write a qualified operator name in an expression, there is a
|
|
special provision: you must write
|
|
<synopsis>
|
|
<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operator</><literal>)</>
|
|
</synopsis>
|
|
This is needed to avoid syntactic ambiguity. An example is
|
|
<programlisting>
|
|
SELECT 3 OPERATOR(pg_catalog.+) 4;
|
|
</programlisting>
|
|
In practice one usually relies on the search path for operators,
|
|
so as not to have to write anything so ugly as that.
|
|
</para>
|
|
|
|
<para>
|
|
The standard search path in <productname>PostgreSQL</productname>
|
|
contains first the schema having the same name as the session user
|
|
(if it exists), and second the schema named <literal>public</>
|
|
(if it exists, which it does by default). This arrangement allows
|
|
a flexible combination of private and shared tables. If no per-user
|
|
schemas are created then all user tables will exist in the shared
|
|
<literal>public</> schema, providing behavior that is backwards-compatible
|
|
with pre-7.3 <productname>PostgreSQL</productname> releases.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
There is no concept of a <literal>public</> schema in the SQL standard.
|
|
To achieve closest conformance to the standard, the DBA should
|
|
create per-user schemas for every user, and not use (perhaps even
|
|
remove) the <literal>public</> schema.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
In addition to <literal>public</> and user-created schemas, each database
|
|
contains a
|
|
<literal>pg_catalog</> schema, which contains the system tables
|
|
and all the built-in datatypes, functions, and operators.
|
|
<literal>pg_catalog</> is always effectively part of the search path.
|
|
If it is not named explicitly in the path then it is implicitly searched
|
|
<emphasis>before</> searching the path's schemas. This ensures that
|
|
built-in names will always be findable. However, you may explicitly
|
|
place <literal>pg_catalog</> at the end of your search path if you
|
|
prefer to have user-defined names override built-in names.
|
|
</para>
|
|
|
|
<para>
|
|
The search path is determined by the GUC variable SEARCH_PATH and
|
|
may be changed at any time. See <xref linkend="set-search-path">.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="sql-reserved-names">
|
|
<title>Reserved names</title>
|
|
|
|
<indexterm>
|
|
<primary>reserved names</primary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>names</primary>
|
|
<secondary>reserved</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
There are several restrictions on the names that can be chosen for
|
|
user-defined database objects. These restrictions vary depending
|
|
on the kind of object. (Note that these restrictions are
|
|
separate from whether the name is a key word or not; quoting a
|
|
name will not allow you to escape these restrictions.)
|
|
</para>
|
|
|
|
<para>
|
|
Schema names beginning with <literal>pg_</> are reserved for system
|
|
purposes and may not be created by users.
|
|
</para>
|
|
|
|
<para>
|
|
In <productname>PostgreSQL</productname> versions before 7.3, table
|
|
names beginning with <literal>pg_</> were reserved. This is no longer
|
|
true: you may create such a table name if you wish, in any non-system
|
|
schema. However, it's best to continue to avoid such names,
|
|
to ensure that you won't suffer a conflict if some future version
|
|
defines a system catalog named the same as your table. (With the
|
|
default search path, an unqualified reference to your table name
|
|
would be resolved as the system catalog instead.) System catalogs will
|
|
continue to follow the convention of having names beginning with
|
|
<literal>pg_</>, so that they will not conflict with unqualified
|
|
user-table names so long as users avoid the <literal>pg_</> prefix.
|
|
</para>
|
|
|
|
<para>
|
|
Every table has several <firstterm>system columns</> that are
|
|
implicitly defined by the system. Therefore, these names cannot
|
|
be used as names of user-defined columns:
|
|
|
|
<indexterm>
|
|
<primary>columns</primary>
|
|
<secondary>system columns</secondary>
|
|
</indexterm>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><structfield>oid</></term>
|
|
<listitem>
|
|
<para>
|
|
<indexterm>
|
|
<primary>OID</primary>
|
|
</indexterm>
|
|
The object identifier (object ID) of a row. This is a serial number
|
|
that is automatically added by <productname>PostgreSQL</productname> to all table rows (unless
|
|
the table was created WITHOUT OIDS, in which case this column is
|
|
not present). See <xref linkend="datatype-oid"> for more info.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>tableoid</></term>
|
|
<listitem>
|
|
<para>
|
|
The OID of the table containing this row. This attribute is
|
|
particularly handy for queries that select from inheritance
|
|
hierarchies, since without it, it's difficult to tell which
|
|
individual table a row came from. The
|
|
<structfield>tableoid</structfield> can be joined against the
|
|
<structfield>oid</structfield> column of
|
|
<classname>pg_class</classname> to obtain the table name.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>xmin</></term>
|
|
<listitem>
|
|
<para>
|
|
The identity (transaction ID) of the inserting transaction for
|
|
this tuple. (Note: A tuple is an individual state of a row;
|
|
each update of a row creates a new tuple for the same logical row.)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>cmin</></term>
|
|
<listitem>
|
|
<para>
|
|
The command identifier (starting at zero) within the inserting
|
|
transaction.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>xmax</></term>
|
|
<listitem>
|
|
<para>
|
|
The identity (transaction ID) of the deleting transaction,
|
|
or zero for an undeleted tuple. It is possible for this field
|
|
to be nonzero in a visible tuple: that usually indicates that the
|
|
deleting transaction hasn't committed yet, or that an attempted
|
|
deletion was rolled back.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>cmax</></term>
|
|
<listitem>
|
|
<para>
|
|
The command identifier within the deleting transaction, or zero.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><structfield>ctid</></term>
|
|
<listitem>
|
|
<para>
|
|
The physical location of the tuple within its table.
|
|
Note that although the <structfield>ctid</structfield>
|
|
can be used to locate the tuple very quickly, a row's
|
|
<structfield>ctid</structfield> will change each time it is updated
|
|
or moved by <command>VACUUM FULL</>.
|
|
Therefore <structfield>ctid</structfield> is useless as a long-term
|
|
row identifier.
|
|
The OID, or even better a user-defined serial number, should
|
|
be used to identify logical rows.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="sql-expressions">
|
|
<title>Value Expressions</title>
|
|
|
|
<para>
|
|
Value expressions are used in a variety of contexts, such
|
|
as in the target list of the <command>SELECT</command> command, as
|
|
new column values in <command>INSERT</command> or
|
|
<command>UPDATE</command>, or in search conditions in a number of
|
|
commands. The result of a value expression is sometimes called a
|
|
<firstterm>scalar</firstterm>, to distinguish it from the result of
|
|
a table expression (which is a table). Value expressions are
|
|
therefore also called <firstterm>scalar expressions</firstterm> (or
|
|
even simply <firstterm>expressions</firstterm>). The expression
|
|
syntax allows the calculation of values from primitive parts using
|
|
arithmetic, logical, set, and other operations.
|
|
</para>
|
|
|
|
<para>
|
|
A value expression is one of the following:
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
A constant or literal value; see <xref linkend="sql-syntax-constants">.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A column reference.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A positional parameter reference, in the body of a function declaration.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
An operator invocation.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A function call.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
An aggregate expression.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A type cast.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>
|
|
A scalar subquery.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<synopsis>( <replaceable>expression</replaceable> )</synopsis>
|
|
<para>
|
|
Parentheses are used to group subexpressions and override precedence.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
<para>
|
|
In addition to this list, there are a number of constructs that can
|
|
be classified as an expression but do not follow any general syntax
|
|
rules. These generally have the semantics of a function or
|
|
operator and are explained in the appropriate location in <xref
|
|
linkend="functions">. An example is the <literal>IS NULL</literal>
|
|
clause.
|
|
</para>
|
|
|
|
<para>
|
|
We have already discussed constants in <xref
|
|
linkend="sql-syntax-constants">. The following sections discuss
|
|
the remaining options.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Column References</title>
|
|
|
|
<para>
|
|
A column can be referenced in the form:
|
|
<synopsis>
|
|
<replaceable>correlation</replaceable>.<replaceable>columnname</replaceable> `['<replaceable>subscript</replaceable>`]'
|
|
</synopsis>
|
|
|
|
<replaceable>correlation</replaceable> is the name of a
|
|
table (possibly qualified), or an alias for a table defined by means of a
|
|
FROM clause, or
|
|
the key words <literal>NEW</literal> or <literal>OLD</literal>.
|
|
(NEW and OLD can only appear in the action portion of a rule,
|
|
while other correlation names can be used in any SQL statement.)
|
|
The correlation name and separating dot may be omitted if the column name
|
|
is unique
|
|
across all the tables being used in the current query. If
|
|
<replaceable>column</replaceable> is of an array type, then the
|
|
optional <replaceable>subscript</replaceable> selects a specific
|
|
element or elements in the array. If no subscript is provided, then the
|
|
whole array is selected. (See <xref linkend="arrays"> for more about
|
|
arrays.)
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Positional Parameters</title>
|
|
|
|
<para>
|
|
A positional parameter reference is used to indicate a parameter
|
|
in an SQL function. Typically this is used in SQL function
|
|
definition statements. The form of a parameter is:
|
|
<synopsis>
|
|
$<replaceable>number</replaceable>
|
|
</synopsis>
|
|
</para>
|
|
|
|
<para>
|
|
For example, consider the definition of a function,
|
|
<function>dept</function>, as
|
|
|
|
<programlisting>
|
|
CREATE FUNCTION dept (text) RETURNS dept
|
|
AS 'SELECT * FROM dept WHERE name = $1'
|
|
LANGUAGE SQL;
|
|
</programlisting>
|
|
|
|
Here the <literal>$1</literal> will be replaced by the first
|
|
function argument when the function is invoked.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Operator Invocations</title>
|
|
|
|
<para>
|
|
There are three possible syntaxes for an operator invocation:
|
|
<simplelist>
|
|
<member><replaceable>expression</replaceable> <replaceable>operator</replaceable> <replaceable>expression</replaceable> (binary infix operator)</member>
|
|
<member><replaceable>operator</replaceable> <replaceable>expression</replaceable> (unary prefix operator)</member>
|
|
<member><replaceable>expression</replaceable> <replaceable>operator</replaceable> (unary postfix operator)</member>
|
|
</simplelist>
|
|
where the <replaceable>operator</replaceable> token follows the syntax
|
|
rules of <xref linkend="sql-syntax-operators">, or is one of the
|
|
keywords <token>AND</token>, <token>OR</token>, and
|
|
<token>NOT</token>, or is a qualified operator name
|
|
<synopsis>
|
|
<literal>OPERATOR(</><replaceable>schema</><literal>.</><replaceable>operatorname</><literal>)</>
|
|
</synopsis>
|
|
Which particular operators exist and whether
|
|
they are unary or binary depends on what operators have been
|
|
defined by the system or the user. <xref linkend="functions">
|
|
describes the built-in operators.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Function Calls</title>
|
|
|
|
<para>
|
|
The syntax for a function call is the name of a function
|
|
(possibly qualified with a schema name), followed by its argument list
|
|
enclosed in parentheses:
|
|
|
|
<synopsis>
|
|
<replaceable>function</replaceable> (<optional><replaceable>expression</replaceable> <optional>, <replaceable>expression</replaceable> ... </optional></optional> )
|
|
</synopsis>
|
|
</para>
|
|
|
|
<para>
|
|
For example, the following computes the square root of 2:
|
|
<programlisting>
|
|
sqrt(2)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
The list of built-in functions is in <xref linkend="functions">.
|
|
Other functions may be added by the user.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="syntax-aggregates">
|
|
<title>Aggregate Expressions</title>
|
|
|
|
<indexterm zone="syntax-aggregates">
|
|
<primary>aggregate functions</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
An <firstterm>aggregate expression</firstterm> represents the
|
|
application of an aggregate function across the rows selected by a
|
|
query. An aggregate function reduces multiple inputs to a single
|
|
output value, such as the sum or average of the inputs. The
|
|
syntax of an aggregate expression is one of the following:
|
|
|
|
<simplelist>
|
|
<member><replaceable>aggregate_name</replaceable> (<replaceable>expression</replaceable>)</member>
|
|
<member><replaceable>aggregate_name</replaceable> (ALL <replaceable>expression</replaceable>)</member>
|
|
<member><replaceable>aggregate_name</replaceable> (DISTINCT <replaceable>expression</replaceable>)</member>
|
|
<member><replaceable>aggregate_name</replaceable> ( * )</member>
|
|
</simplelist>
|
|
|
|
where <replaceable>aggregate_name</replaceable> is a previously
|
|
defined aggregate (possibly a qualified name), and
|
|
<replaceable>expression</replaceable> is
|
|
any value expression that does not itself contain an aggregate
|
|
expression.
|
|
</para>
|
|
|
|
<para>
|
|
The first form of aggregate expression invokes the aggregate
|
|
across all input rows for which the given expression yields a
|
|
non-NULL value. (Actually, it is up to the aggregate function
|
|
whether to ignore NULLs or not --- but all the standard ones do.)
|
|
The second form is the same as the first, since
|
|
<literal>ALL</literal> is the default. The third form invokes the
|
|
aggregate for all distinct non-NULL values of the expression found
|
|
in the input rows. The last form invokes the aggregate once for
|
|
each input row regardless of NULL or non-NULL values; since no
|
|
particular input value is specified, it is generally only useful
|
|
for the <function>count()</function> aggregate function.
|
|
</para>
|
|
|
|
<para>
|
|
For example, <literal>count(*)</literal> yields the total number
|
|
of input rows; <literal>count(f1)</literal> yields the number of
|
|
input rows in which <literal>f1</literal> is non-NULL;
|
|
<literal>count(distinct f1)</literal> yields the number of
|
|
distinct non-NULL values of <literal>f1</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
The predefined aggregate functions are described in <xref
|
|
linkend="functions-aggregate">. Other aggregate functions may be added
|
|
by the user.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2 id="sql-syntax-type-casts">
|
|
<title>Type Casts</title>
|
|
|
|
<indexterm>
|
|
<primary>data types</primary>
|
|
<secondary>type casts</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
A type cast specifies a conversion from one data type to another.
|
|
<productname>PostgreSQL</productname> accepts two equivalent syntaxes
|
|
for type casts:
|
|
<synopsis>
|
|
CAST ( <replaceable>expression</replaceable> AS <replaceable>type</replaceable> )
|
|
<replaceable>expression</replaceable>::<replaceable>type</replaceable>
|
|
</synopsis>
|
|
The <literal>CAST</> syntax conforms to SQL92; the syntax with
|
|
<literal>::</literal> is historical <productname>PostgreSQL</productname>
|
|
usage.
|
|
</para>
|
|
|
|
<para>
|
|
When a cast is applied to a value expression of a known type, it
|
|
represents a run-time type conversion. The cast will succeed only
|
|
if a suitable type conversion function is available. Notice that this
|
|
is subtly different from the use of casts with constants, as shown in
|
|
<xref linkend="sql-syntax-constants-generic">. A cast applied to an
|
|
unadorned string literal represents the initial assignment of a type
|
|
to a literal constant value, and so it will succeed for any type
|
|
(if the contents of the string literal are acceptable input syntax for the
|
|
data type).
|
|
</para>
|
|
|
|
<para>
|
|
An explicit type cast may usually be omitted if there is no ambiguity as
|
|
to the type that a value expression must produce (for example, when it is
|
|
assigned to a table column); the system will automatically apply a
|
|
type cast in such cases. However, automatic casting is only done for
|
|
cast functions that are marked <quote>okay to apply implicitly</>
|
|
in the system catalogs. Other cast functions must be invoked with
|
|
explicit casting syntax. This restriction is intended to prevent
|
|
surprising conversions from being applied silently.
|
|
</para>
|
|
|
|
<para>
|
|
It is also possible to specify a type cast using a function-like
|
|
syntax:
|
|
<synopsis>
|
|
<replaceable>typename</replaceable> ( <replaceable>expression</replaceable> )
|
|
</synopsis>
|
|
However, this only works for types whose names are also valid as
|
|
function names. For example, <literal>double precision</literal>
|
|
can't be used this way, but the equivalent <literal>float8</literal>
|
|
can. Also, the names <literal>interval</>, <literal>time</>, and
|
|
<literal>timestamp</> can only be used in this fashion if they are
|
|
double-quoted, because of syntactic conflicts. Therefore, the use of
|
|
the function-like cast syntax leads to inconsistencies and should
|
|
probably be avoided in new applications.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Scalar Subqueries</title>
|
|
|
|
<para>
|
|
A scalar subquery is an ordinary
|
|
<command>SELECT</command> in parentheses that returns exactly one
|
|
row with one column. The <command>SELECT</command> query is executed
|
|
and the single returned value is used in the surrounding value expression.
|
|
It is an error to use a query that
|
|
returns more than one row or more than one column as a scalar subquery.
|
|
(But if, during a particular execution, the subquery returns no rows,
|
|
there is no error; the scalar result is taken to be NULL.)
|
|
The subquery can refer to variables from the surrounding query,
|
|
which will act as constants during any one evaluation of the subquery.
|
|
See also <xref linkend="functions-subquery">.
|
|
</para>
|
|
|
|
<para>
|
|
For example, the following finds the largest city population in each
|
|
state:
|
|
<programlisting>
|
|
SELECT name, (SELECT max(pop) FROM cities WHERE cities.state = states.name)
|
|
FROM states;
|
|
</programlisting>
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Expression Evaluation</title>
|
|
|
|
<para>
|
|
The order of evaluation of subexpressions is not defined. In
|
|
particular, the inputs of an operator or function are not necessarily
|
|
evaluated left-to-right or in any other fixed order.
|
|
</para>
|
|
|
|
<para>
|
|
Furthermore, if the result of an expression can be determined by
|
|
evaluating only some parts of it, then other subexpressions
|
|
might not be evaluated at all. For instance, if one wrote
|
|
<programlisting>
|
|
SELECT true OR somefunc();
|
|
</programlisting>
|
|
then <literal>somefunc()</literal> would (probably) not be called
|
|
at all. The same would be the case if one wrote
|
|
<programlisting>
|
|
SELECT somefunc() OR true;
|
|
</programlisting>
|
|
Note that this is not the same as the left-to-right
|
|
<quote>short-circuiting</quote> of Boolean operators that is found
|
|
in some programming languages.
|
|
</para>
|
|
|
|
<para>
|
|
As a consequence, it is unwise to use functions with side effects
|
|
as part of complex expressions. It is particularly dangerous to
|
|
rely on side effects or evaluation order in WHERE and HAVING clauses,
|
|
since those clauses are extensively reprocessed as part of
|
|
developing an execution plan. Boolean
|
|
expressions (AND/OR/NOT combinations) in those clauses may be reorganized
|
|
in any manner allowed by the laws of Boolean algebra.
|
|
</para>
|
|
|
|
<para>
|
|
When it is essential to force evaluation order, a CASE construct may
|
|
be used. For example, this is an untrustworthy way of trying to
|
|
avoid division by zero in a WHERE clause:
|
|
<programlisting>
|
|
SELECT ... WHERE x <> 0 AND y/x > 1.5;
|
|
</programlisting>
|
|
but this is safe:
|
|
<programlisting>
|
|
SELECT ... WHERE CASE WHEN x <> 0 THEN y/x > 1.5 ELSE false END;
|
|
</programlisting>
|
|
A CASE construct used in this fashion will defeat optimization attempts,
|
|
so it should only be done when necessary.
|
|
</para>
|
|
</sect2>
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode:sgml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:t
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-default-dtd-file:"./reference.ced"
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:("/usr/lib/sgml/catalog")
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|