From 0add759825e746c76671badfcf38c329ee7f151a Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Thu, 23 Dec 2004 23:07:38 +0000 Subject: [PATCH] More minor updates and copy-editing. --- doc/src/sgml/func.sgml | 111 +++++++++++++++---------------------- doc/src/sgml/indices.sgml | 6 +- doc/src/sgml/mvcc.sgml | 10 ++-- doc/src/sgml/perform.sgml | 35 ++++++------ doc/src/sgml/typeconv.sgml | 70 ++++++++++++----------- 5 files changed, 108 insertions(+), 124 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2de07c04ae..96fc4ea698 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,5 +1,5 @@ @@ -1347,7 +1347,8 @@ PostgreSQL documentation The to_ascii function supports conversion from - LATIN1, LATIN2, and WIN1250 only. + LATIN1, LATIN2, LATIN9, + and WIN1250 encodings only. @@ -2483,11 +2484,11 @@ cast(-44 as bit(12)) 111111010100 There are three separate approaches to pattern matching provided by PostgreSQL: the traditional SQL LIKE operator, the - more recent >SIMILAR TO operator (since + more recent SIMILAR TO operator (added in SQL:1999), and POSIX-style regular expressions. Additionally, a pattern matching function, substring, is available, using either - SIMILAR TO-style or POSIX-style regular + SIMILAR TO-style or POSIX-style regular expressions. @@ -2544,7 +2545,7 @@ cast(-44 as bit(12)) 111111010100 LIKE pattern matches always cover the entire - string. To match a pattern anywhere within a string, the + string. To match a sequence anywhere within a string, the pattern must therefore start and end with a percent sign. @@ -2578,7 +2579,7 @@ cast(-44 as bit(12)) 111111010100 The key word ILIKE can be used instead of - LIKE to make the match case insensitive according + LIKE to make the match case-insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension. @@ -2818,9 +2819,11 @@ substring('foobar' from '#"o_b#"%' for '#') NULL @@ -3073,7 +3076,7 @@ substring('foobar' from 'o(.)b') o The forms using {...} - are known as bounds. + are known as bounds. The numbers m and n within a bound are unsigned decimal integers with permissible values from 0 to 255 inclusive. @@ -3603,9 +3606,10 @@ substring('foobar' from 'o(.)b') o Normally the flavor of RE being used is determined by regex_flavor. However, this can be overridden by a director prefix. - If an RE of any flavor begins with ***:, - the rest of the RE is taken as an ARE. - If an RE of any flavor begins with ***=, + If an RE begins with ***:, + the rest of the RE is taken as an ARE regardless of + regex_flavor. + If an RE begins with ***=, the rest of the RE is taken to be a literal string, with all characters considered ordinary characters. @@ -3703,8 +3707,8 @@ substring('foobar' from 'o(.)b') o Embedded options take effect at the ) terminating the sequence. - They are available only at the start of an ARE, - and may not be used later within it. + They may appear only at the start of an ARE (after the + ***: director if any). @@ -3732,13 +3736,13 @@ substring('foobar' from 'o(.)b') o - white space and comments are illegal within multi-character symbols, - like the ARE (?: or the BRE \( + white space and comments cannot appear within multi-character symbols, + such as (?: - Expanded-syntax white-space characters are blank, tab, newline, and + For this purpose, white-space characters are blank, tab, newline, and any character that belongs to the space character class. @@ -4330,7 +4334,7 @@ substring('foobar' from 'o(.)b') o - Usage notes for the date/time formatting: + Usage notes for date/time formatting: @@ -4506,7 +4510,7 @@ substring('foobar' from 'o(.)b') o - Usage notes for the numeric formatting: + Usage notes for numeric formatting: @@ -5068,10 +5072,10 @@ EXTRACT (field FROM source The extract function retrieves subfields - from date/time values, such as year or hour. - source is a value expression that - evaluates to type timestamp or interval. - (Expressions of type date or time will + such as year or hour from date/time values. + source must be a value expression of + type timestamp, time, or interval. + (Expressions of type date will be cast to timestamp and can therefore be used as well.) field is an identifier or string that selects what field to extract from the source value. @@ -5699,7 +5703,7 @@ SELECT TIMESTAMP 'now'; - + You do not want to use the third form when specifying a DEFAULT clause while creating a table. The system will convert now @@ -5710,7 +5714,7 @@ SELECT TIMESTAMP 'now'; because they are function calls. Thus they will give the desired behavior of defaulting to the time of row insertion. - + @@ -6803,7 +6807,7 @@ SELECT NULLIF(value, '(none)') ... shows the functions available for use with array types. See - for more discussion and examples for the use of these functions. + for more discussion and examples of the use of these functions. @@ -6827,10 +6831,7 @@ SELECT NULLIF(value, '(none)') ... anyarray - - concatenate two arrays, returning NULL - for NULL inputs - + concatenate two arraysarray_cat(ARRAY[1,2,3], ARRAY[4,5]){1,2,3,4,5} @@ -6842,10 +6843,7 @@ SELECT NULLIF(value, '(none)') ... anyarray - - append an element to the end of an array, returning - NULL for NULL inputs - + append an element to the end of an arrayarray_append(ARRAY[1,2], 3){1,2,3} @@ -6857,10 +6855,7 @@ SELECT NULLIF(value, '(none)') ... anyarray - - append an element to the beginning of an array, returning - NULL for NULL inputs - + append an element to the beginning of an arrayarray_prepend(1, ARRAY[2,3]){1,2,3} @@ -6872,10 +6867,7 @@ SELECT NULLIF(value, '(none)') ... text - - returns a text representation of array dimension lower and upper bounds, - generating an ERROR for NULL inputs - + returns a text representation of array's dimensionsarray_dims(array[[1,2,3], [4,5,6]])[1:2][1:3] @@ -6887,10 +6879,7 @@ SELECT NULLIF(value, '(none)') ... integer - - returns lower bound of the requested array dimension, returning - NULL for NULL inputs - + returns lower bound of the requested array dimensionarray_lower(array_prepend(0, ARRAY[1,2,3]), 1)0 @@ -6902,10 +6891,7 @@ SELECT NULLIF(value, '(none)') ... integer - - returns upper bound of the requested array dimension, returning - NULL for NULL inputs - + returns upper bound of the requested array dimensionarray_upper(ARRAY[1,2,3,4], 1)4 @@ -6917,10 +6903,7 @@ SELECT NULLIF(value, '(none)') ... text - - concatenates array elements using provided delimiter, returning - NULL for NULL inputs - + concatenates array elements using provided delimiterarray_to_string(array[1, 2, 3], '~^~')1~^~2~^~3 @@ -6932,10 +6915,7 @@ SELECT NULLIF(value, '(none)') ... text[] - - splits string into array elements using provided delimiter, returning - NULL for NULL inputs - + splits string into array elements using provided delimiterstring_to_array( 'xx~^~yy~^~zz', '~^~'){xx,yy,zz} @@ -7181,7 +7161,7 @@ SELECT NULLIF(value, '(none)') ... It should be noted that except for count, these functions return a null value when no rows are selected. In particular, sum of no rows returns null, not - zero as one might expect. The function coalesce may be + zero as one might expect. The coalesce function may be used to substitute zero for null when necessary. @@ -8045,9 +8025,10 @@ select current_date + s.a as dates from generate_series(0,14,7) as s(a); - The session_user is the user that initiated a - database connection; it is fixed for the duration of that - connection. The current_user is the user identifier + The session_user is normally the user who initiated + the current database connection; but superusers can change this setting + with . + The current_user is the user identifier that is applicable for permission checking. Normally, it is equal to the session user, but it changes during the execution of functions with the attribute SECURITY DEFINER. @@ -8106,8 +8087,8 @@ SET search_path TO schema , schema, .. inet_server_addr returns the IP address on which the server accepted the current connection, and inet_server_port returns the port number. - All these functions return NULL if the connection is via a Unix-domain - socket. + All these functions return NULL if the current connection is via a + Unix-domain socket. @@ -8325,7 +8306,7 @@ SELECT has_function_privilege('joeuser', 'myfunc(int, text)', 'execute'); - To evaluate whether a user holds a grant option on the privilege, + To test whether a user holds a grant option on the privilege, append WITH GRANT OPTION to the privilege key word; for example 'UPDATE WITH GRANT OPTION'. diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index f35e65aba1..e37d2b85e8 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -1,4 +1,4 @@ - + Indexes @@ -71,7 +71,7 @@ CREATE INDEX test1_id_index ON test1 (id); - Once the index is created, no further intervention is required: the + Once an index is created, no further intervention is required: the system will update the index when the table is modified, and it will use the index in queries when it thinks this would be more efficient than a sequential table scan. But you may have to run the @@ -761,7 +761,7 @@ CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target) - It is especially fatal to use proportionally reduced data sets. + It is especially fatal to use very small test data sets. While selecting 1000 out of 100000 rows could be a candidate for an index, selecting 1 out of 100 rows will hardly be, because the 100 rows will probably fit within a single disk page, and there diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml index 3b3b68a97f..01f697d426 100644 --- a/doc/src/sgml/mvcc.sgml +++ b/doc/src/sgml/mvcc.sgml @@ -1,5 +1,5 @@ @@ -206,9 +206,9 @@ $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.45 2004/11/15 06:32:14 neilc Exp $
- In PostgreSQL, you can use all four - possible transaction isolation levels. Internally, there are only - two distinct isolation levels, which correspond to the levels Read + In PostgreSQL, you can request any of the + four standard transaction isolation levels. But internally, there are + only two distinct isolation levels, which correspond to the levels Read Committed and Serializable. When you select the level Read Uncommitted you really get Read Committed, and when you select Repeatable Read you really get Serializable, so the actual @@ -217,7 +217,7 @@ $PostgreSQL: pgsql/doc/src/sgml/mvcc.sgml,v 2.45 2004/11/15 06:32:14 neilc Exp $ define which phenomena must not happen, they do not define which phenomena must happen. The reason that PostgreSQL only provides two isolation levels is that this is the only - sensible way to map the isolation levels to the multiversion + sensible way to map the standard isolation levels to the multiversion concurrency control architecture. The behavior of the available isolation levels is detailed in the following subsections. diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 687d322812..5448913586 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -1,5 +1,5 @@ @@ -78,7 +78,7 @@ $PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.48 2004/12/01 19:00:27 tgl Exp estimates are converted into disk-page units using some fairly arbitrary fudge factors. If you want to experiment with these factors, see the list of run-time configuration parameters in - .) + .)
@@ -657,16 +657,6 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; point would be rolled back, so you won't be stuck with partially loaded data. - - - If you are issuing a large sequence of INSERT - commands to bulk load some data, also consider using to create a - prepared INSERT statement. Since you are - executing the same command multiple times, it is more efficient to - prepare the command once and then use EXECUTE - as many times as required. - @@ -683,12 +673,20 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; use this method to populate a table.
+ + If you cannot use COPY, it may help to use to create a + prepared INSERT statement, and then use + EXECUTE as many times as required. This avoids + some of the overhead of repeatedly parsing and planning + INSERT. + + Note that loading a large number of rows using COPY is almost always faster than using - INSERT, even if multiple - INSERT commands are batched into a single - transaction. + INSERT, even if PREPARE is used and + multiple insertions are batched into a single transaction. @@ -719,10 +717,10 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Temporarily increasing the - configuration variable when restoring large amounts of data can + configuration variable when loading large amounts of data can lead to improved performance. This is because when a B-tree index is created from scratch, the existing content of the table needs - to be sorted. Allowing the external merge sort to use more memory + to be sorted. Allowing the merge sort to use more memory means that fewer merge passes will be required. A larger setting for maintenance_work_mem may also speed up validation of foreign-key constraints. @@ -754,8 +752,7 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse; Whenever you have significantly altered the distribution of data within a table, running is strongly recommended. This - includes when bulk loading large amounts of data into - PostgreSQL. Running + includes bulk loading large amounts of data into the table. Running ANALYZE (or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner may diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml index 05fb4f4a0c..ae0e3aea4d 100644 --- a/doc/src/sgml/typeconv.sgml +++ b/doc/src/sgml/typeconv.sgml @@ -1,5 +1,5 @@ @@ -22,8 +22,7 @@ In many cases a user will not need to understand the details of the type conversion mechanism. However, the implicit conversions done by PostgreSQL can affect the results of a query. When necessary, these results -can be tailored by a user or programmer -using explicit type conversion. +can be tailored by using explicit type conversion. @@ -43,16 +42,17 @@ has an associated data type which determines its behavior and allowed usage. PostgreSQL has an extensible type system that is much more general and flexible than other SQL implementations. Hence, most type conversion behavior in PostgreSQL -should be governed by general rules rather than by ad hoc heuristics, to allow +is governed by general rules rather than by ad hoc +heuristics. This allows mixed-type expressions to be meaningful even with user-defined types. -The PostgreSQL scanner/parser decodes lexical -elements into only five fundamental categories: integers, floating-point numbers, strings, -names, and key words. Constants of most non-numeric types are first classified as -strings. The SQL language definition allows specifying type -names with strings, and this mechanism can be used in +The PostgreSQL scanner/parser divides lexical +elements into only five fundamental categories: integers, non-integer numbers, +strings, identifiers, and key words. Constants of most non-numeric types are +first classified as strings. The SQL language definition +allows specifying type names with strings, and this mechanism can be used in PostgreSQL to start the parser down the correct path. For example, the query @@ -79,28 +79,30 @@ parser: +Function calls + + + +Much of the PostgreSQL type system is built around a +rich set of functions. Functions can have one or more arguments. +Since PostgreSQL permits function +overloading, the function name alone does not uniquely identify the function +to be called; the parser must select the right function based on the data +types of the supplied arguments. + + + + + Operators PostgreSQL allows expressions with prefix and postfix unary (one-argument) operators, -as well as binary (two-argument) operators. - - - - - -Function calls - - - -Much of the PostgreSQL type system is built around a -rich set of functions. Function calls can have one or more arguments. -Since PostgreSQL permits function -overloading, the function name alone does not uniquely identify the function -to be called; the parser must select the right function based on the data -types of the supplied arguments. +as well as binary (two-argument) operators. Like functions, operators can +be overloaded, and so the same problem of selecting the right operator +exists. @@ -125,7 +127,7 @@ with, and perhaps converted to, the types of the target columns. Since all query results from a unionized SELECT statement must appear in a single set of columns, the types of the results of each SELECT clause must be matched up and converted to a uniform set. -Similarly, the branch expressions of a CASE construct must be +Similarly, the result expressions of a CASE construct must be converted to a common type so that the CASE expression as a whole has a known output type. The same holds for ARRAY constructs. @@ -728,12 +730,16 @@ type. -If the target is a fixed-length type (e.g., char or varchar -declared with a length) then try to find a sizing function for the target -type. A sizing function is a function of the same name as the type, -taking two arguments of which the first is that type and the second is of type -integer, and returning the same type. If one is found, it is applied, -passing the column's declared length as the second parameter. +Check to see if there is a sizing cast for the target type. A sizing +cast is a cast from that type to itself. If one is found in the +pg_cast catalog, apply it to the expression before storing +into the destination column. The implementation function for such a cast +always takes an extra parameter of type integer, which receives +the destination column's declared length (actually, its +atttypmod value; the interpretation of +atttypmod varies for different datatypes). The cast function +is responsible for applying any length-dependent semantics such as size +checking or truncation.