352 lines
12 KiB
Plaintext
352 lines
12 KiB
Plaintext
<!-- $Header: /cvsroot/pgsql/doc/src/sgml/array.sgml,v 1.25 2003/03/13 01:30:26 petere Exp $ -->
|
|
|
|
<sect1 id="arrays">
|
|
<title>Arrays</title>
|
|
|
|
<indexterm>
|
|
<primary>arrays</primary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
<productname>PostgreSQL</productname> allows columns of a table to be
|
|
defined as variable-length multidimensional arrays. Arrays of any
|
|
built-in type or user-defined type can be created.
|
|
</para>
|
|
|
|
<sect2>
|
|
<title>Declaration of Array Types</title>
|
|
|
|
<para>
|
|
To illustrate the use of array types, we create this table:
|
|
<programlisting>
|
|
CREATE TABLE sal_emp (
|
|
name text,
|
|
pay_by_quarter integer[],
|
|
schedule text[][]
|
|
);
|
|
</programlisting>
|
|
As shown, an array data type is named by appending square brackets
|
|
(<literal>[]</>) to the data type name of the array elements. The
|
|
above command will create a table named
|
|
<structname>sal_emp</structname> with a column of type
|
|
<type>text</type> (<structfield>name</structfield>), a
|
|
one-dimensional array of type <type>integer</type>
|
|
(<structfield>pay_by_quarter</structfield>), which represents the
|
|
employee's salary by quarter, and a two-dimensional array of
|
|
<type>text</type> (<structfield>schedule</structfield>), which
|
|
represents the employee's weekly schedule.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Array Value Input</title>
|
|
|
|
<para>
|
|
Now we can show some <command>INSERT</command> statements. To write an array
|
|
value, we enclose the element values within curly braces and separate them
|
|
by commas. If you know C, this is not unlike the syntax for
|
|
initializing structures. (More details appear below.)
|
|
|
|
<programlisting>
|
|
INSERT INTO sal_emp
|
|
VALUES ('Bill',
|
|
'{10000, 10000, 10000, 10000}',
|
|
'{{"meeting", "lunch"}, {}}');
|
|
|
|
INSERT INTO sal_emp
|
|
VALUES ('Carol',
|
|
'{20000, 25000, 25000, 25000}',
|
|
'{{"talk", "consult"}, {"meeting"}}');
|
|
</programlisting>
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
A limitation of the present array implementation is that individual
|
|
elements of an array cannot be SQL null values. The entire array can be set
|
|
to null, but you can't have an array with some elements null and some
|
|
not. Fixing this is on the to-do list.
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Array Value References</title>
|
|
|
|
<para>
|
|
Now, we can run some queries on the table.
|
|
First, we show how to access a single element of an array at a time.
|
|
This query retrieves the names of the employees whose pay changed in
|
|
the second quarter:
|
|
|
|
<programlisting>
|
|
SELECT name FROM sal_emp WHERE pay_by_quarter[1] <> pay_by_quarter[2];
|
|
|
|
name
|
|
-------
|
|
Carol
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
The array subscript numbers are written within square brackets.
|
|
By default <productname>PostgreSQL</productname> uses the
|
|
one-based numbering convention for arrays, that is,
|
|
an array of <replaceable>n</> elements starts with <literal>array[1]</literal> and
|
|
ends with <literal>array[<replaceable>n</>]</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
This query retrieves the third quarter pay of all employees:
|
|
|
|
<programlisting>
|
|
SELECT pay_by_quarter[3] FROM sal_emp;
|
|
|
|
pay_by_quarter
|
|
----------------
|
|
10000
|
|
25000
|
|
(2 rows)
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
We can also access arbitrary rectangular slices of an array, or
|
|
subarrays. An array slice is denoted by writing
|
|
<literal><replaceable>lower-bound</replaceable>:<replaceable>upper-bound</replaceable></literal>
|
|
for one or more array dimensions. For example, this query retrieves the first
|
|
item on Bill's schedule for the first two days of the week:
|
|
|
|
<programlisting>
|
|
SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
|
|
|
|
schedule
|
|
--------------------
|
|
{{meeting},{""}}
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
We could also have written
|
|
|
|
<programlisting>
|
|
SELECT schedule[1:2][1] FROM sal_emp WHERE name = 'Bill';
|
|
</programlisting>
|
|
|
|
with the same result. An array subscripting operation is always taken to
|
|
represent an array slice if any of the subscripts are written in the
|
|
form
|
|
<literal><replaceable>lower</replaceable>:<replaceable>upper</replaceable></literal>.
|
|
A lower bound of 1 is assumed for any subscript where only one value
|
|
is specified.
|
|
</para>
|
|
|
|
<para>
|
|
An array value can be replaced completely:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter = '{25000,25000,27000,27000}'
|
|
WHERE name = 'Carol';
|
|
</programlisting>
|
|
|
|
or updated at a single element:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter[4] = 15000
|
|
WHERE name = 'Bill';
|
|
</programListing>
|
|
|
|
or updated in a slice:
|
|
|
|
<programlisting>
|
|
UPDATE sal_emp SET pay_by_quarter[1:2] = '{27000,27000}'
|
|
WHERE name = 'Carol';
|
|
</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
An array can be enlarged by assigning to an element adjacent to
|
|
those already present, or by assigning to a slice that is adjacent
|
|
to or overlaps the data already present. For example, if an array
|
|
value currently has 4 elements, it will have five elements after an
|
|
update that assigns to <literal>array[5]</>. Currently, enlargement in
|
|
this fashion is only allowed for one-dimensional arrays, not
|
|
multidimensional arrays.
|
|
</para>
|
|
|
|
<para>
|
|
Array slice assignment allows creation of arrays that do not use one-based
|
|
subscripts. For example one might assign to <literal>array[-2:7]</> to
|
|
create an array with subscript values running from -2 to 7.
|
|
</para>
|
|
|
|
<para>
|
|
The syntax for <command>CREATE TABLE</command> allows fixed-length
|
|
arrays to be defined:
|
|
|
|
<programlisting>
|
|
CREATE TABLE tictactoe (
|
|
squares integer[3][3]
|
|
);
|
|
</programlisting>
|
|
|
|
However, the current implementation does not enforce the array size
|
|
limits --- the behavior is the same as for arrays of unspecified
|
|
length.
|
|
</para>
|
|
|
|
<para>
|
|
Actually, the current implementation does not enforce the declared
|
|
number of dimensions either. Arrays of a particular element type are
|
|
all considered to be of the same type, regardless of size or number
|
|
of dimensions. So, declaring number of dimensions or sizes in
|
|
<command>CREATE TABLE</command> is simply documentation, it does not
|
|
affect runtime behavior.
|
|
</para>
|
|
|
|
<para>
|
|
The current dimensions of any array value can be retrieved with the
|
|
<function>array_dims</function> function:
|
|
|
|
<programlisting>
|
|
SELECT array_dims(schedule) FROM sal_emp WHERE name = 'Carol';
|
|
|
|
array_dims
|
|
------------
|
|
[1:2][1:1]
|
|
(1 row)
|
|
</programlisting>
|
|
|
|
<function>array_dims</function> produces a <type>text</type> result,
|
|
which is convenient for people to read but perhaps not so convenient
|
|
for programs. <function>array_upper</function> and <function>
|
|
array_lower</function> return the upper/lower bound of the
|
|
given array dimension, respectively.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Searching in Arrays</title>
|
|
|
|
<para>
|
|
To search for a value in an array, you must check each value of the
|
|
array. This can be done by hand (if you know the size of the array).
|
|
For example:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE pay_by_quarter[1] = 10000 OR
|
|
pay_by_quarter[2] = 10000 OR
|
|
pay_by_quarter[3] = 10000 OR
|
|
pay_by_quarter[4] = 10000;
|
|
</programlisting>
|
|
|
|
However, this quickly becomes tedious for large arrays, and is not
|
|
helpful if the size of the array is unknown. Although it is not built
|
|
into <productname>PostgreSQL</productname>,
|
|
there is an extension available that defines new functions and
|
|
operators for iterating over array values. Using this, the above
|
|
query could be:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE pay_by_quarter[1:4] *= 10000;
|
|
</programlisting>
|
|
|
|
To search the entire array (not just specified slices), you could
|
|
use:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE pay_by_quarter *= 10000;
|
|
</programlisting>
|
|
|
|
In addition, you could find rows where the array had all values
|
|
equal to 10 000 with:
|
|
|
|
<programlisting>
|
|
SELECT * FROM sal_emp WHERE pay_by_quarter **= 10000;
|
|
</programlisting>
|
|
|
|
To install this optional module, look in the
|
|
<filename>contrib/array</filename> directory of the
|
|
<productname>PostgreSQL</productname> source distribution.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
Arrays are not sets; using arrays in the manner described in the
|
|
previous paragraph is often a sign of database misdesign. The
|
|
array field should generally be split off into a separate table.
|
|
Tables can obviously be searched easily.
|
|
</para>
|
|
</tip>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Array Input and Output Syntax</title>
|
|
|
|
<para>
|
|
The external representation of an array value consists of items that
|
|
are interpreted according to the I/O conversion rules for the array's
|
|
element type, plus decoration that indicates the array structure.
|
|
The decoration consists of curly braces (<literal>{</> and <literal>}</>)
|
|
around the array value plus delimiter characters between adjacent items.
|
|
The delimiter character is usually a comma (<literal>,</>) but can be
|
|
something else: it is determined by the <literal>typdelim</> setting
|
|
for the array's element type. (Among the standard data types provided
|
|
in the <productname>PostgreSQL</productname> distribution, type
|
|
<literal>box</> uses a semicolon (<literal>;</>) but all the others
|
|
use comma.) In a multidimensional array, each dimension (row, plane,
|
|
cube, etc.) gets its own level of curly braces, and delimiters
|
|
must be written between adjacent curly-braced entities of the same level.
|
|
You may write whitespace before a left brace, after a right
|
|
brace, or before any individual item string. Whitespace after an item
|
|
is not ignored, however: after skipping leading whitespace, everything
|
|
up to the next right brace or delimiter is taken as the item value.
|
|
</para>
|
|
</sect2>
|
|
|
|
<sect2>
|
|
<title>Quoting Array Elements</title>
|
|
|
|
<para>
|
|
As shown above, when writing an array value you may write double
|
|
quotes around any individual array
|
|
element. You <emphasis>must</> do so if the element value would otherwise
|
|
confuse the array-value parser. For example, elements containing curly
|
|
braces, commas (or whatever the delimiter character is), double quotes,
|
|
backslashes, or leading white space must be double-quoted. To put a double
|
|
quote or backslash in an array element value, precede it with a backslash.
|
|
Alternatively, you can use backslash-escaping to protect all data characters
|
|
that would otherwise be taken as array syntax or ignorable white space.
|
|
</para>
|
|
|
|
<para>
|
|
The array output routine will put double quotes around element values
|
|
if they are empty strings or contain curly braces, delimiter characters,
|
|
double quotes, backslashes, or white space. Double quotes and backslashes
|
|
embedded in element values will be backslash-escaped. For numeric
|
|
data types it is safe to assume that double quotes will never appear, but
|
|
for textual data types one should be prepared to cope with either presence
|
|
or absence of quotes. (This is a change in behavior from pre-7.2
|
|
<productname>PostgreSQL</productname> releases.)
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
Remember that what you write in an SQL command will first be interpreted
|
|
as a string literal, and then as an array. This doubles the number of
|
|
backslashes you need. For example, to insert a <type>text</> array
|
|
value containing a backslash and a double quote, you'd need to write
|
|
<programlisting>
|
|
INSERT ... VALUES ('{"\\\\","\\""}');
|
|
</programlisting>
|
|
The string-literal processor removes one level of backslashes, so that
|
|
what arrives at the array-value parser looks like <literal>{"\\","\""}</>.
|
|
In turn, the strings fed to the <type>text</> data type's input routine
|
|
become <literal>\</> and <literal>"</> respectively. (If we were working
|
|
with a data type whose input routine also treated backslashes specially,
|
|
<type>bytea</> for example, we might need as many as eight backslashes
|
|
in the command to get one backslash into the stored array element.)
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
|
|
</sect1>
|