diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 3c9c3ce527..016993c0eb 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -1,5 +1,5 @@ @@ -965,23 +965,296 @@ SELECT b, char_length(b) FROM test2; - - Binary Data + + Binary Strings + + The bytea data type allows storage of binary strings. + + + + Binary String Types + + + + Type Name + Storage + Description + + + + + bytea + 4 bytes plus the actual string + Variable (not specifically limited) + length binary string + + + +
- The bytea data type allows storage of binary data, - specifically allowing storage of NULLs which are entered as - '\\000'. The first backslash is interpreted by the - single quotes, and the second is recognized by bytea and - precedes a three digit octal value. For a similar reason, a - backslash must be entered into a field as '\\\\' or - '\\134'. You may also have to escape line feeds and - carriage return if your interface automatically translates these. It - can store values of any length. Bytea is a non-standard - data type. + A binary string is a sequence of octets that does not have either a + character set or collation associated with it. Bytea specifically + allows storage of NULLs and other 'non-printable' ASCII + characters. + + + Certain ASCII characters MUST be escaped (but all + characters MAY be escaped) when used as part of a string literal in an + SQL statement. In general, to escape a character, it + is converted into the three digit octal number equal to the decimal + ASCII value, and preceeded by two backslashes. The + single quote (') and backslash (\) characters have special alternate + escape sequences. Details are in + . + + + + <acronym>SQL</acronym> Literal Escaped <acronym>ASCII</acronym> + Characters + + + + Decimal ASCII Value + Description + Input Escaped Representation + Example + Printed Result + + + + + + 0 + null byte + '\\000' + select '\\000'::bytea; + \000 + + + + 39 + single quote + '\\'' or '\\047' + select '\''::bytea; + ' + + + + 92 + backslash + '\\\\' or '\\134' + select '\\\\'::bytea; + \\ + + + + +
+ + + Note that the result in each of the examples above was exactly one + byte in length, even though the output representation of the null byte + and backslash are more than one character. Bytea output characters + are also escaped. In general, each "non-printable" character is + converted into the three digit octal number equal to its decimal + ASCII value, and preceeded by one backslash. Most + "printable" characters are represented by their standard + ASCII representation. The backslash (\) character + has a special alternate output representation. Details are in + . + + + + <acronym>SQL</acronym> Output Escaped <acronym>ASCII</acronym> + Characters + + + + Decimal ASCII Value + Description + Output Escaped Representation + Example + Printed Result + + + + + + + 39 + single quote + ' + select '\\047'::bytea; + ' + + + + 92 + backslash + \\ + select '\\134'::bytea; + \\ + + + + 0 to 31 and 127 to 255 + non-printable characters + \### (octal value) + select '\\001'::bytea; + \001 + + + + 32 to 126 + printable characters + ASCII representation + select '\\176'::bytea; + ~ + + + + +
+ + + SQL string literals (input strings) must be + preceeded with two backslashes due to the fact that they must pass + through two parsers in the PostgreSQL backend. The first backslash + is interpreted as an escape character by the string literal parser, + and therefore is consumed, leaving the characters that follow it. + The second backslash is recognized by bytea input function + as the prefix of a three digit octal value. For example, a string + literal passed to the backend as '\\001' becomes + '\001' after passing through the string literal + parser. The '\001' is then sent to the bytea + input function, where it is converted to a single byte with a decimal + ASCII value of 1. + + + + For a similar reason, a backslash must be input as + '\\\\' (or '\\134'). The first + and third backslashes are interpreted as escape characters by the + string literal parser, and therefore are consumed, leaving the + second and forth backslashes untouched. The second and forth + backslashes are recognized by bytea input function as a single + backslash. For example, a string literal passed to the backend as + '\\\\' becomes '\\' after passing + through the string literal parser. The '\\' is then + sent to the bytea input function, where it is converted to a single + byte with a decimal ASCII value of 92. + + + + A single quote is a bit different in that it must be input as + '\'' (or '\\134'), NOT as + '\\''. This is because, while the literal parser + interprets the single quote as a special character, and will consume + the single backslash, the bytea input function does NOT recognize + a single quote as a special character. Therefore a string + literal passed to the backend as '\'' becomes + ''' after passing through the string literal + parser. The ''' is then sent to the bytea + input function, where it is retains its single byte decimal + ASCII value of 39. + + + + Depending on the front end to PostgreSQL you use, you may have + additional work to do in terms of escaping and unescaping bytea + strings. For example, you may also have to escape line feeds and + carriage return if your interface automatically translates these. + Or you may have to double up on backslashes if the parser for your + language or choice also treats them as an escape character. + + + + Compatibility + + Bytea provides most of the functionality of the SQL99 binary string + type per SQL99 section 4.3. A comparison of PostgreSQL bytea and SQL99 + Binary Strings is presented in + . + + + + Comparison of SQL99 Binary String and BYTEA types + + + + SQL99 + BYTEA + + + + + + Name of data type BINARY LARGE OBJECT or BLOB + Name of data type BYTEA + + + + Sequence of octets that does not have either a character set + or collation associated with it. + same + + + + Described by a binary data type descriptor containing the + name of the data type and the maximum length + in octets + Described by a binary data type descriptor containing the + name of the data type with no specific maximum length + + + + + All binary strings are mutually comparable in accordance + with the rules of comparison predicates. + same + + + + Binary string values can only be compared for equality. + + Binary string values can be compared for equality, greater + than, greater than or equal, less than, less than or equal + + + + + Operators operating on and returning binary strings + include concatenation, substring, overlay, and trim + Operators operating on and returning binary strings + include concatenation, substring, and trim. The + 'leading' and 'trailing' + arguments for trim are not yet implemented. + + + + + Other operators involving binary strings + include length, position, and the like predicate + same + + + + A binary string literal is comprised of an even number of + hexidecimal digits, in single quotes, preceeded by "X", + e.g. X'1a43fe' + A binary string literal is comprised of ASCII characters + escaped according to the rules shown in + + + + +
+
+ Date/Time Types diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e9d9b47aeb..8af9fd0679 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1,5 +1,5 @@ @@ -1133,7 +1133,7 @@ Postgres documentation text Encodes binary data to ASCII-only representation. Supported - types are: 'base64', 'hex'. + types are: 'base64', 'hex', 'escape'. encode('123\\000\\001', 'base64') MTIzAAE= @@ -1164,6 +1164,186 @@ Postgres documentation + + Binary String Functions and Operators + + + This section describes functions and operators for examining and + manipulating binary string values. Strings in this context include + values of the type BYTEA. + + + + SQL defines some string functions with a special syntax where + certain keywords rather than commas are used to separate the + arguments. Details are in . + Some functions are also implemented using the regular syntax for + function invocation. (See .) + + + + <acronym>SQL</acronym> Binary String Functions and Operators + + + + Function + Return Type + Description + Example + Result + + + + + + string || string + bytea + + string concatenation + + binary strings + concatenation + + + '\\\\Postgre'::bytea || '\\047SQL\\000'::bytea + \\Postgre'SQL\000 + + + + octet_length(string) + integer + number of bytes in binary string + octet_length('jo\\000se'::bytea) + 5 + + + + position(substring in string) + integer + location of specified substring + position('\\000om'::bytea in 'Th\\000omas'::bytea) + 3 + + + + substring(string from integer for integer) + bytea + + extract substring + + substring + + + substring('Th\\000omas'::bytea from 2 for 3) + h\000o + + + + + trim(both + characters from + string) + + bytea + + Removes the longest string containing only the + characters from the + beginning/end/both ends of the string. + + trim('\\000'::bytea from '\\000Tom\\000'::bytea) + Tom + + + + +
+ + + Additional binary string manipulation functions are available and are + listed below. Some of them are used internally to implement the + SQL-standard string functions listed above. + + + + Other Binary String Functions + + + + Function + Return Type + Description + Example + Result + + + + + + btrim(string bytea, trim bytea) + bytea + + Remove (trim) the longest string consisting only of characters + in trim from the start and end of + string. + + btrim('\\000trim\\000'::bytea,'\\000'::bytea) + trim + + + + length(string) + integer + + length of binary string + + binary strings + length + + + length + binary strings + binary strings, length + + + length('jo\\000se'::bytea) + 5 + + + + + encode(string bytea, + type text) + + text + + Encodes binary string to ASCII-only representation. Supported + types are: 'base64', 'hex', 'escape'. + + encode('123\\000456'::bytea, 'escape') + 123\000456 + + + + + decode(string text, + type text) + + bytea + + Decodes binary string from string previously + encoded with encode(). Parameter type is same as in encode(). + + decode('123\\000456', 'escape') + 123\000456 + + + + +
+ +
+ + Pattern Matching