860 lines
38 KiB
Plaintext
860 lines
38 KiB
Plaintext
|
|
Frequently Asked Questions (FAQ) for PostgreSQL
|
|
|
|
Last updated: Mon Mar 3 11:22:50 EST 2008
|
|
|
|
Current maintainer: Bruce Momjian (bruce@momjian.us)
|
|
|
|
The most recent version of this document can be viewed at
|
|
http://www.postgresql.org/files/documentation/faqs/FAQ.html.
|
|
|
|
Platform-specific questions are answered at
|
|
http://www.postgresql.org/docs/faq/.
|
|
_________________________________________________________________
|
|
|
|
General Questions
|
|
|
|
1.1) What is PostgreSQL? How is it pronounced? What is Postgres?
|
|
1.2) Who controls PostgreSQL?
|
|
1.3) What is the copyright of PostgreSQL?
|
|
1.4) What platforms does PostgreSQL support?
|
|
1.5) Where can I get PostgreSQL?
|
|
1.6) What is the most recent release?
|
|
1.7) Where can I get support?
|
|
1.8) How do I submit a bug report?
|
|
1.9) How do I find out about known bugs or missing features?
|
|
1.10) What documentation is available?
|
|
1.11) How can I learn SQL?
|
|
1.12) How do I submit a patch or join the development team?
|
|
1.13) How does PostgreSQL compare to other DBMSs?
|
|
1.14) Will PostgreSQL handle recent daylight saving time changes in
|
|
various countries?
|
|
1.15) How do I unsubscribe from the PostgreSQL email lists? How do I
|
|
avoid receiving duplicate emails?
|
|
|
|
User Client Questions
|
|
|
|
2.1) What interfaces are available for PostgreSQL?
|
|
2.2) What tools are available for using PostgreSQL with Web pages?
|
|
2.3) Does PostgreSQL have a graphical user interface?
|
|
|
|
Administrative Questions
|
|
|
|
3.1) How do I install PostgreSQL somewhere other than
|
|
/usr/local/pgsql?
|
|
3.2) How do I control connections from other hosts?
|
|
3.3) How do I tune the database engine for better performance?
|
|
3.4) What debugging features are available?
|
|
3.5) Why do I get "Sorry, too many clients" when trying to connect?
|
|
3.6 What is the upgrade process for PostgreSQL?
|
|
3.7) What computer hardware should I use?
|
|
|
|
Operational Questions
|
|
|
|
4.1) How do I SELECT only the first few rows of a query? A random row?
|
|
4.2) How do I find out what tables, indexes, databases, and users are
|
|
defined? How do I see the queries used by psql to display them?
|
|
4.3) How do you change a column's data type?
|
|
4.4) What is the maximum size for a row, a table, and a database?
|
|
4.5) How much database disk space is required to store data from a
|
|
typical text file?
|
|
4.6) Why are my queries slow? Why don't they use my indexes?
|
|
4.7) How do I see how the query optimizer is evaluating my query?
|
|
4.8) How do I perform regular expression searches and case-insensitive
|
|
regular expression searches? How do I use an index for
|
|
case-insensitive searches?
|
|
4.9) In a query, how do I detect if a field is NULL? How do I
|
|
concatenate possible NULLs? How can I sort on whether a field is NULL
|
|
or not?
|
|
4.10) What is the difference between the various character types?
|
|
4.11.1) How do I create a serial/auto-incrementing field?
|
|
4.11.2) How do I get the value of a SERIAL insert?
|
|
4.11.3) Doesn't currval() lead to a race condition with other users?
|
|
4.11.4) Why aren't my sequence numbers reused on transaction abort?
|
|
Why are there gaps in the numbering of my sequence/SERIAL column?
|
|
4.12) What is an OID? What is a CTID?
|
|
4.13) Why do I get the error "ERROR: Memory exhausted in
|
|
AllocSetAlloc()"?
|
|
4.14) How do I tell what PostgreSQL version I am running?
|
|
4.15) How do I create a column that will default to the current time?
|
|
4.16) How do I perform an outer join?
|
|
4.17) How do I perform queries using multiple databases?
|
|
4.18) How do I return multiple rows or columns from a function?
|
|
4.19) Why do I get "relation with OID ##### does not exist" errors
|
|
when accessing temporary tables in PL/PgSQL functions?
|
|
4.20) What replication solutions are available?
|
|
4.21) Why are my table and column names not recognized in my query?
|
|
Why is capitalization not preserved?
|
|
_________________________________________________________________
|
|
|
|
General Questions
|
|
|
|
1.1) What is PostgreSQL? How is it pronounced? What is Postgres?
|
|
|
|
PostgreSQL is pronounced Post-Gres-Q-L. (For those curious about how
|
|
to say "PostgreSQL", an audio file is available.)
|
|
|
|
PostgreSQL is an object-relational database system that has the
|
|
features of traditional commercial database systems with enhancements
|
|
to be found in next-generation DBMS systems. PostgreSQL is free and
|
|
the complete source code is available.
|
|
|
|
PostgreSQL development is performed by a team of mostly volunteer
|
|
developers spread throughout the world and communicating via the
|
|
Internet. It is a community project and is not controlled by any
|
|
company. To get involved, see the developer's FAQ at
|
|
http://www.postgresql.org/docs/faqs.FAQ_DEV.html
|
|
|
|
Postgres is a widely-used nickname for PostgreSQL. It was the original
|
|
name of the project at Berkeley and is strongly preferred over other
|
|
nicknames. If you find 'PostgreSQL' hard to pronounce, call it
|
|
'Postgres' instead.
|
|
|
|
1.2) Who controls PostgreSQL?
|
|
|
|
If you are looking for a PostgreSQL gatekeeper, central committee, or
|
|
controlling company, give up --- there isn't one. We do have a core
|
|
committee and CVS committers, but these groups are more for
|
|
administrative purposes than control. The project is directed by the
|
|
community of developers and users, which anyone can join. All you need
|
|
to do is subscribe to the mailing lists and participate in the
|
|
discussions. (See the Developer's FAQ for information on how to get
|
|
involved in PostgreSQL development.)
|
|
|
|
1.3) What is the copyright of PostgreSQL?
|
|
|
|
PostgreSQL is distributed under the classic BSD license. Basically, it
|
|
allows users to do anything they want with the code, including
|
|
reselling binaries without the source code. The only restriction is
|
|
that you not hold us legally liable for problems with the software.
|
|
There is also the requirement that this copyright appear in all copies
|
|
of the software. Here is the actual BSD license we use:
|
|
|
|
PostgreSQL Data Base Management System
|
|
|
|
Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
|
|
Portions Copyright (c) 1994-1996 Regents of the University of
|
|
California
|
|
|
|
Permission to use, copy, modify, and distribute this software and its
|
|
documentation for any purpose, without fee, and without a written
|
|
agreement is hereby granted, provided that the above copyright notice
|
|
and this paragraph and the following two paragraphs appear in all
|
|
copies.
|
|
|
|
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY
|
|
FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES,
|
|
INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND
|
|
ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN
|
|
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
|
|
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE
|
|
PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF
|
|
CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,
|
|
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
|
|
1.4) What platforms does PostgreSQL support?
|
|
|
|
In general, any modern Unix-compatible platform should be able to run
|
|
PostgreSQL. The platforms that had received explicit testing at the
|
|
time of release are listed in the installation instructions.
|
|
|
|
PostgreSQL also runs natively on Microsoft Windows NT-based operating
|
|
systems like Win2000 SP4, WinXP, and Win2003. A prepackaged installer
|
|
is available at http://pgfoundry.org/projects/pginstaller. MSDOS-based
|
|
versions of Windows (Win95, Win98, WinMe) can run PostgreSQL using
|
|
Cygwin.
|
|
|
|
There is also a Novell Netware 6 port at http://forge.novell.com, and
|
|
an OS/2 (eComStation) version at
|
|
http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=postgre
|
|
SQL&stype=all&sort=type&dir=%2F.
|
|
|
|
1.5) Where can I get PostgreSQL?
|
|
|
|
Via web browser, use http://www.postgresql.org/ftp/, and via ftp, use
|
|
ftp://ftp.postgresql.org/pub/.
|
|
|
|
1.6) What is the most recent release?
|
|
|
|
The latest release of PostgreSQL is version 8.3.
|
|
|
|
We plan to have a major release every year, with minor releases every
|
|
few months.
|
|
|
|
1.7) Where can I get support?
|
|
|
|
The PostgreSQL community provides assistance to many of its users via
|
|
email. The main web site to subscribe to the email lists is
|
|
http://www.postgresql.org/community/lists/. The general or bugs lists
|
|
are a good place to start.
|
|
|
|
The major IRC channel is #postgresql on Freenode (irc.freenode.net).
|
|
To connect you can use the Unix program irc -c '#postgresql' "$USER"
|
|
irc.freenode.net or use any other IRC clients. A Spanish one also
|
|
exists on the same network, (#postgresql-es), a French one,
|
|
(#postgresqlfr), and a Brazilian one, (#postgresql-br). There is also
|
|
a PostgreSQL channel on EFNet.
|
|
|
|
A list of commercial support companies is available at
|
|
http://www.postgresql.org/support/professional_support.
|
|
|
|
1.8) How do I submit a bug report?
|
|
|
|
Visit the PostgreSQL bug form at
|
|
http://www.postgresql.org/support/submitbug. Also check out our ftp
|
|
site ftp://ftp.postgresql.org/pub/ to see if there is a more recent
|
|
PostgreSQL version.
|
|
|
|
Bugs submitted using the bug form or posted to any PostgreSQL mailing
|
|
list typically generates one of the following replies:
|
|
* It is not a bug, and why
|
|
* It is a known bug and is already on the TODO list
|
|
* The bug has been fixed in the current release
|
|
* The bug has been fixed but is not packaged yet in an official
|
|
release
|
|
* A request is made for more detailed information:
|
|
+ Operating system
|
|
+ PostgreSQL version
|
|
+ Reproducible test case
|
|
+ Debugging information
|
|
+ Debugger backtrace output
|
|
* The bug is new. The following might happen:
|
|
+ A patch is created and will be included in the next major or
|
|
minor release
|
|
+ The bug cannot be fixed immediately and is added to the TODO
|
|
list
|
|
|
|
1.9) How do I find out about known bugs or missing features?
|
|
|
|
PostgreSQL supports an extended subset of SQL:2003. See our TODO list
|
|
for known bugs, missing features, and future plans.
|
|
|
|
A feature request usually results in one of the following replies:
|
|
* The feature is already on the TODO list
|
|
* The feature is not desired because:
|
|
+ It duplicates existing functionality that already follows the
|
|
SQL standard
|
|
+ The feature would increase code complexity but add little
|
|
benefit
|
|
+ The feature would be insecure or unreliable
|
|
* The new feature is added to the TODO list
|
|
|
|
PostgreSQL does not use a bug tracking system because we find it more
|
|
efficient to respond directly to email and keep the TODO list
|
|
up-to-date. In practice, bugs don't last very long in the software,
|
|
and bugs that affect a large number of users are fixed rapidly. The
|
|
only place to find all changes, improvements, and fixes in a
|
|
PostgreSQL release is to read the CVS log messages. Even the release
|
|
notes do not list every change made to the software.
|
|
|
|
1.10) What documentation is available?
|
|
|
|
PostgreSQL includes extensive documentation, including a large manual,
|
|
manual pages, and some test examples. See the /doc directory. You can
|
|
also browse the manuals online at http://www.postgresql.org/docs.
|
|
|
|
There are two PostgreSQL books available online at
|
|
http://www.postgresql.org/docs/books/awbook.html and
|
|
http://www.commandprompt.com/ppbook/. There are a number of PostgreSQL
|
|
books available for purchase. One of the most popular ones is by Korry
|
|
Douglas. A list of book reviews can be found at
|
|
http://www.postgresql.org/docs/books/. There is also a collection of
|
|
PostgreSQL technical articles at
|
|
http://www.postgresql.org/docs/techdocs.
|
|
|
|
The command line client program psql has some \d commands to show
|
|
information about types, operators, functions, aggregates, etc. - use
|
|
\? to display the available commands.
|
|
|
|
Our web site contains even more documentation.
|
|
|
|
1.11) How can I learn SQL?
|
|
|
|
First, consider the PostgreSQL-specific books mentioned above. Many of
|
|
our users also like The Practical SQL Handbook, Bowman, Judith S., et
|
|
al., Addison-Wesley. Others like The Complete Reference SQL, Groff et
|
|
al., McGraw-Hill.
|
|
|
|
There are also many nice tutorials available online:
|
|
* http://www.intermedia.net/support/sql/sqltut.shtm
|
|
* http://sqlcourse.com
|
|
* http://www.w3schools.com/sql/default.asp
|
|
* http://mysite.verizon.net/Graeme_Birchall/id1.html
|
|
|
|
1.12) How do I submit a patch or join the development team?
|
|
|
|
See the Developer's FAQ.
|
|
|
|
1.13) How does PostgreSQL compare to other DBMSs?
|
|
|
|
There are several ways of measuring software: features, performance,
|
|
reliability, support, and price.
|
|
|
|
Features
|
|
PostgreSQL has most features present in large commercial DBMSs,
|
|
like transactions, subselects, triggers, views, foreign key
|
|
referential integrity, and sophisticated locking. We have some
|
|
features they do not have, like user-defined types,
|
|
inheritance, rules, and multi-version concurrency control to
|
|
reduce lock contention.
|
|
|
|
Performance
|
|
PostgreSQL's performance is comparable to other commercial and
|
|
open source databases. It is faster for some things, slower for
|
|
others. Our performance is usually +/-10% compared to other
|
|
databases.
|
|
|
|
Reliability
|
|
We realize that a DBMS must be reliable, or it is worthless. We
|
|
strive to release well-tested, stable code that has a minimum
|
|
of bugs. Each release has at least one month of beta testing,
|
|
and our release history shows that we can provide stable, solid
|
|
releases that are ready for production use. We believe we
|
|
compare favorably to other database software in this area.
|
|
|
|
Support
|
|
Our mailing lists provide contact with a large group of
|
|
developers and users to help resolve any problems encountered.
|
|
While we cannot guarantee a fix, commercial DBMSs do not always
|
|
supply a fix either. Direct access to developers, the user
|
|
community, manuals, and the source code often make PostgreSQL
|
|
support superior to other DBMSs. There is commercial
|
|
per-incident support available for those who need it. (See FAQ
|
|
section 1.7.)
|
|
|
|
Price
|
|
We are free for all use, both commercial and non-commercial.
|
|
You can add our code to your product with no limitations,
|
|
except those outlined in our BSD-style license stated above.
|
|
|
|
1.14) Will PostgreSQL handle recent daylight saving time changes in various
|
|
countries?
|
|
|
|
USA daylight saving time changes are included in PostgreSQL release
|
|
8.0.[4+], and all later major releases, e.g. 8.1. Canada and Western
|
|
Australia changes are included in 8.0.[10+], 8.1.[6+], and all later
|
|
major releases. PostgreSQL releases prior to 8.0 use the operating
|
|
system's timezone database for daylight saving information.
|
|
|
|
1.15) How do I unsubscribe from the PostgreSQL email lists? How do I avoid
|
|
receiving duplicate emails?
|
|
|
|
The PostgreSQL Majordomo page allows subscribing or unsubscribing from
|
|
any of the PostgreSQL email lists. (You might need to have your
|
|
Majordomo password emailed to you to log in.)
|
|
|
|
All PostgreSQL email lists are configured so a group reply goes to the
|
|
email list and the original email author. This is done so users
|
|
receive the quickest possible email replies. If you would prefer not
|
|
to receive duplicate email from the list in cases where you already
|
|
receive an email directly, check eliminatecc from the Majordomo Change
|
|
Settings page. You can also prevent yourself from receiving copies of
|
|
emails you post to the lists by unchecking selfcopy.
|
|
_________________________________________________________________
|
|
|
|
User Client Questions
|
|
|
|
2.1) What interfaces are available for PostgreSQL?
|
|
|
|
The PostgreSQL install includes only the C and embedded C interfaces.
|
|
All other interfaces are independent projects that are downloaded
|
|
separately; being separate allows them to have their own release
|
|
schedule and development teams.
|
|
|
|
Some programming languages like PHP include an interface to
|
|
PostgreSQL. Interfaces for languages like Perl, TCL, Python, and many
|
|
others are available at http://pgfoundry.org.
|
|
|
|
2.2) What tools are available for using PostgreSQL with Web pages?
|
|
|
|
A nice introduction to Database-backed Web pages can be seen at:
|
|
http://www.webreview.com
|
|
|
|
For Web integration, PHP (http://www.php.net) is an excellent
|
|
interface.
|
|
|
|
For complex cases, many use the Perl and DBD::Pg with CGI.pm or
|
|
mod_perl.
|
|
|
|
2.3) Does PostgreSQL have a graphical user interface?
|
|
|
|
There are a large number of GUI Tools that are available for
|
|
PostgreSQL from both commercial and open source developers. A detailed
|
|
list can be found in the PostgreSQL Community Documentation
|
|
_________________________________________________________________
|
|
|
|
Administrative Questions
|
|
|
|
3.1) How do I install PostgreSQL somewhere other than /usr/local/pgsql?
|
|
|
|
Specify the --prefix option when running configure.
|
|
|
|
3.2) How do I control connections from other hosts?
|
|
|
|
By default, PostgreSQL only allows connections from the local machine
|
|
using Unix domain sockets or TCP/IP connections. Other machines will
|
|
not be able to connect unless you modify listen_addresses in the
|
|
postgresql.conf file, enable host-based authentication by modifying
|
|
the $PGDATA/pg_hba.conf file, and restart the database server.
|
|
|
|
3.3) How do I tune the database engine for better performance?
|
|
|
|
There are three major areas for potential performance improvement:
|
|
|
|
Query Changes
|
|
This involves modifying queries to obtain better performance:
|
|
|
|
+ Creation of indexes, including expression and partial indexes
|
|
+ Use of COPY instead of multiple INSERTs
|
|
+ Grouping of multiple statements into a single transaction to
|
|
reduce commit overhead
|
|
+ Use of CLUSTER when retrieving many rows from an index
|
|
+ Use of LIMIT for returning a subset of a query's output
|
|
+ Use of Prepared queries
|
|
+ Use of ANALYZE to maintain accurate optimizer statistics
|
|
+ Regular use of VACUUM or pg_autovacuum
|
|
+ Dropping of indexes during large data changes
|
|
|
|
Server Configuration
|
|
A number of postgresql.conf settings affect performance. For
|
|
more details, see Administration Guide/Server Run-time
|
|
Environment/Run-time Configuration for a full listing, and for
|
|
commentary see
|
|
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_co
|
|
nf_e.html and
|
|
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html.
|
|
|
|
Hardware Selection
|
|
The effect of hardware on performance is detailed in
|
|
http://www.powerpostgresql.com/PerfList/ and
|
|
http://momjian.us/main/writings/pgsql/hw_performance/index.html
|
|
.
|
|
|
|
3.4) What debugging features are available?
|
|
|
|
There are many log_* server configuration variables that enable
|
|
printing of query and process statistics which can be very useful for
|
|
debugging and performance measurements.
|
|
|
|
3.5) Why do I get "Sorry, too many clients" when trying to connect?
|
|
|
|
You have reached the default limit of 100 database sessions. You need
|
|
to increase the server's limit on how many concurrent backend
|
|
processes it can start by changing the max_connections value in
|
|
postgresql.conf and restarting the server.
|
|
|
|
3.6) What is the upgrade process for PostgreSQL?
|
|
|
|
See http://www.postgresql.org/support/versioning for a general
|
|
discussion about upgrading, and
|
|
http://www.postgresql.org/docs/current/static/install-upgrading.html
|
|
for specific instructions.
|
|
|
|
3.7) What computer hardware should I use?
|
|
|
|
Because PC hardware is mostly compatible, people tend to believe that
|
|
all PC hardware is of equal quality. It is not. ECC RAM, SCSI, and
|
|
quality motherboards are more reliable and have better performance
|
|
than less expensive hardware. PostgreSQL will run on almost any
|
|
hardware, but if reliability and performance are important it is wise
|
|
to research your hardware options thoroughly. Our email lists can be
|
|
used to discuss hardware options and tradeoffs.
|
|
_________________________________________________________________
|
|
|
|
Operational Questions
|
|
|
|
4.1) How do I SELECT only the first few rows of a query? A random row?
|
|
|
|
To retrieve only a few rows, if you know at the number of rows needed
|
|
at the time of the SELECT use LIMIT . If an index matches the ORDER BY
|
|
it is possible the entire query does not have to be executed. If you
|
|
don't know the number of rows at SELECT time, use a cursor and FETCH.
|
|
|
|
To SELECT a random row, use:
|
|
SELECT col
|
|
FROM tab
|
|
ORDER BY random()
|
|
LIMIT 1;
|
|
|
|
4.2) How do I find out what tables, indexes, databases, and users are
|
|
defined? How do I see the queries used by psql to display them?
|
|
|
|
Use the \dt command to see tables in psql. For a complete list of
|
|
commands inside psql you can use \?. Alternatively you can read the
|
|
source code for psql in file pgsql/src/bin/psql/describe.c, it
|
|
contains SQL commands that generate the output for psql's backslash
|
|
commands. You can also start psql with the -E option so it will print
|
|
out the queries it uses to execute the commands you give. PostgreSQL
|
|
also provides an SQL compliant INFORMATION SCHEMA interface you can
|
|
query to get information about the database.
|
|
|
|
There are also system tables beginning with pg_ that describe these
|
|
too.
|
|
|
|
Use psql -l will list all databases.
|
|
|
|
Also try the file pgsql/src/tutorial/syscat.source. It illustrates
|
|
many of the SELECTs needed to get information from the database system
|
|
tables.
|
|
|
|
4.3) How do you change a column's data type?
|
|
|
|
Changing the data type of a column can be done easily in 8.0 and later
|
|
with ALTER TABLE ALTER COLUMN TYPE.
|
|
|
|
In earlier releases, do this:
|
|
BEGIN;
|
|
ALTER TABLE tab ADD COLUMN new_col new_data_type;
|
|
UPDATE tab SET new_col = CAST(old_col AS new_data_type);
|
|
ALTER TABLE tab DROP COLUMN old_col;
|
|
COMMIT;
|
|
|
|
You might then want to do VACUUM FULL tab to reclaim the disk space
|
|
used by the expired rows.
|
|
|
|
4.4) What is the maximum size for a row, a table, and a database?
|
|
|
|
These are the limits:
|
|
|
|
Maximum size for a database? unlimited (32 TB databases exist)
|
|
Maximum size for a table? 32 TB
|
|
Maximum size for a row? 400 GB
|
|
Maximum size for a field? 1 GB
|
|
Maximum number of rows in a table? unlimited
|
|
Maximum number of columns in a table? 250-1600 depending on column
|
|
types
|
|
Maximum number of indexes on a table? unlimited
|
|
|
|
Of course, these are not actually unlimited, but limited to available
|
|
disk space and memory/swap space. Performance may suffer when these
|
|
values get unusually large.
|
|
|
|
The maximum table size of 32 TB does not require large file support
|
|
from the operating system. Large tables are stored as multiple 1 GB
|
|
files so file system size limits are not important.
|
|
|
|
The maximum table size, row size, and maximum number of columns can be
|
|
quadrupled by increasing the default block size to 32k. The maximum
|
|
table size can also be increased using table partitioning.
|
|
|
|
One limitation is that indexes can not be created on columns longer
|
|
than about 2,000 characters. Fortunately, such indexes are rarely
|
|
needed. Uniqueness is best guaranteed by a function index of an MD5
|
|
hash of the long column, and full text indexing allows for searching
|
|
of words within the column.
|
|
|
|
4.5) How much database disk space is required to store data from a typical
|
|
text file?
|
|
|
|
A PostgreSQL database may require up to five times the disk space to
|
|
store data from a text file.
|
|
|
|
As an example, consider a file of 100,000 lines with an integer and
|
|
text description on each line. Suppose the text string avergages
|
|
twenty bytes in length. The flat file would be 2.8 MB. The size of the
|
|
PostgreSQL database file containing this data can be estimated as 5.2
|
|
MB:
|
|
24 bytes: each row header (approximate)
|
|
24 bytes: one int field and one text field
|
|
+ 4 bytes: pointer on page to tuple
|
|
----------------------------------------
|
|
52 bytes per row
|
|
|
|
The data page size in PostgreSQL is 8192 bytes (8 KB), so:
|
|
|
|
8192 bytes per page
|
|
------------------- = 158 rows per database page (rounded down)
|
|
52 bytes per row
|
|
|
|
100000 data rows
|
|
-------------------- = 633 database pages (rounded up)
|
|
158 rows per page
|
|
|
|
633 database pages * 8192 bytes per page = 5,185,536 bytes (5.2 MB)
|
|
|
|
Indexes do not require as much overhead, but do contain the data that
|
|
is being indexed, so they can be large also.
|
|
|
|
NULLs are stored as bitmaps, so they use very little space.
|
|
|
|
4.6) Why are my queries slow? Why don't they use my indexes?
|
|
|
|
Indexes are not used by every query. Indexes are used only if the
|
|
table is larger than a minimum size, and the query selects only a
|
|
small percentage of the rows in the table. This is because the random
|
|
disk access caused by an index scan can be slower than a straight read
|
|
through the table, or sequential scan.
|
|
|
|
To determine if an index should be used, PostgreSQL must have
|
|
statistics about the table. These statistics are collected using
|
|
VACUUM ANALYZE, or simply ANALYZE. Using statistics, the optimizer
|
|
knows how many rows are in the table, and can better determine if
|
|
indexes should be used. Statistics are also valuable in determining
|
|
optimal join order and join methods. Statistics collection should be
|
|
performed periodically as the contents of the table change.
|
|
|
|
Indexes are normally not used for ORDER BY or to perform joins. A
|
|
sequential scan followed by an explicit sort is usually faster than an
|
|
index scan of a large table. However, LIMIT combined with ORDER BY
|
|
often will use an index because only a small portion of the table is
|
|
returned.
|
|
|
|
If you believe the optimizer is incorrect in choosing a sequential
|
|
scan, use SET enable_seqscan TO 'off' and run query again to see if an
|
|
index scan is indeed faster.
|
|
|
|
When using wild-card operators such as LIKE or ~, indexes can only be
|
|
used in certain circumstances:
|
|
* The beginning of the search string must be anchored to the start
|
|
of the string, i.e.
|
|
+ LIKE patterns must not start with %.
|
|
+ ~ (regular expression) patterns must start with ^.
|
|
* The search string can not start with a character class, e.g.
|
|
[a-e].
|
|
* Case-insensitive searches such as ILIKE and ~* do not utilize
|
|
indexes. Instead, use expression indexes, which are described in
|
|
section 4.8.
|
|
* The default C locale must be used during initdb because it is not
|
|
possible to know the next-greatest character in a non-C locale.
|
|
You can create a special text_pattern_ops index for such cases
|
|
that work only for LIKE indexing. It is also possible to use full
|
|
text indexing for word searches.
|
|
|
|
4.7) How do I see how the query optimizer is evaluating my query?
|
|
|
|
See the EXPLAIN manual page.
|
|
|
|
4.8) How do I perform regular expression searches and case-insensitive
|
|
regular expression searches? How do I use an index for case-insensitive
|
|
searches?
|
|
|
|
The ~ operator does regular expression matching, and ~* does
|
|
case-insensitive regular expression matching. The case-insensitive
|
|
variant of LIKE is called ILIKE.
|
|
|
|
Case-insensitive equality comparisons are normally expressed as:
|
|
SELECT *
|
|
FROM tab
|
|
WHERE lower(col) = 'abc';
|
|
|
|
This will not use an standard index. However, if you create an
|
|
expression index, it will be used:
|
|
CREATE INDEX tabindex ON tab (lower(col));
|
|
|
|
If the above index is created as UNIQUE, though the column can store
|
|
upper and lowercase characters, it can not have identical values that
|
|
differ only in case. To force a particular case to be stored in the
|
|
column, use a CHECK constraint or a trigger.
|
|
|
|
4.9) In a query, how do I detect if a field is NULL? How do I concatenate
|
|
possible NULLs? How can I sort on whether a field is NULL or not?
|
|
|
|
You test the column with IS NULL and IS NOT NULL, like this:
|
|
SELECT *
|
|
FROM tab
|
|
WHERE col IS NULL;
|
|
|
|
To concatentate with possible NULLs, use COALESCE(), like this:
|
|
SELECT COALESCE(col1, '') || COALESCE(col2, '')
|
|
FROM tab
|
|
|
|
To sort by the NULL status, use the IS NULL and IS NOT NULL modifiers
|
|
in your ORDER BY clause. Things that are true will sort higher than
|
|
things that are false, so the following will put NULL entries at the
|
|
top of the resulting list:
|
|
SELECT *
|
|
FROM tab
|
|
ORDER BY (col IS NOT NULL)
|
|
|
|
4.10) What is the difference between the various character types?
|
|
|
|
Type Internal Name Notes
|
|
VARCHAR(n) varchar size specifies maximum length, no padding
|
|
CHAR(n) bpchar blank padded to the specified fixed length
|
|
TEXT text no specific upper limit on length
|
|
BYTEA bytea variable-length byte array (null-byte safe)
|
|
"char" char one character
|
|
|
|
You will see the internal name when examining system catalogs and in
|
|
some error messages.
|
|
|
|
The first four types above are "varlena" types (i.e., the first four
|
|
bytes on disk are the length, followed by the data). Thus the actual
|
|
space used is slightly greater than the declared size. However, long
|
|
values are also subject to compression, so the space on disk might
|
|
also be less than expected.
|
|
VARCHAR(n) is best when storing variable-length strings and it limits
|
|
how long a string can be. TEXT is for strings of unlimited length,
|
|
with a maximum of one gigabyte.
|
|
|
|
CHAR(n) is for storing strings that are all the same length. CHAR(n)
|
|
pads with blanks to the specified length, while VARCHAR(n) only stores
|
|
the characters supplied. BYTEA is for storing binary data,
|
|
particularly values that include NULL bytes. All the types described
|
|
here have similar performance characteristics.
|
|
|
|
4.11.1) How do I create a serial/auto-incrementing field?
|
|
|
|
PostgreSQL supports a SERIAL data type. It auto-creates a sequence.
|
|
For example, this:
|
|
CREATE TABLE person (
|
|
id SERIAL,
|
|
name TEXT
|
|
);
|
|
|
|
is automatically translated into this:
|
|
CREATE SEQUENCE person_id_seq;
|
|
CREATE TABLE person (
|
|
id INT4 NOT NULL DEFAULT nextval('person_id_seq'),
|
|
name TEXT
|
|
);
|
|
|
|
Automatically created sequence are named <table>_<serialcolumn>_seq,
|
|
where table and serialcolumn are the names of the table and SERIAL
|
|
column, respectively. See the create_sequence manual page for more
|
|
information about sequences.
|
|
|
|
4.11.2) How do I get the value of a SERIAL insert?
|
|
|
|
The simplest way is to retrieve the assigned SERIAL value with
|
|
RETURNING. Using the example table in 4.11.1, it would look like this:
|
|
INSERT INTO person (name) VALUES ('Blaise Pascal') RETURNING id;
|
|
|
|
You can also call nextval() and use that value in the INSERT, or call
|
|
currval() after the INSERT.
|
|
|
|
4.11.3) Doesn't currval() lead to a race condition with other users?
|
|
|
|
No. currval() returns the current value assigned by your session, not
|
|
by all sessions.
|
|
|
|
4.11.4) Why aren't my sequence numbers reused on transaction abort? Why are
|
|
there gaps in the numbering of my sequence/SERIAL column?
|
|
|
|
To improve concurrency, sequence values are given out to running
|
|
transactions as needed and are not locked until the transaction
|
|
completes. This causes gaps in numbering from aborted transactions.
|
|
|
|
4.12) What is an OID? What is a CTID?
|
|
|
|
If a table is created WITH OIDS, each row gets a unique a OID. OIDs
|
|
are automatically assigned unique 4-byte integers that are unique
|
|
across the entire installation. However, they overflow at 4 billion,
|
|
and then the OIDs start being duplicated. PostgreSQL uses OIDs to link
|
|
its internal system tables together.
|
|
|
|
To uniquely number rows in user tables, it is best to use SERIAL
|
|
rather than OIDs because SERIAL sequences are unique only within a
|
|
single table. and are therefore less likely to overflow. SERIAL8 is
|
|
available for storing eight-byte sequence values.
|
|
|
|
CTIDs are used to identify specific physical rows with block and
|
|
offset values. CTIDs change after rows are modified or reloaded. They
|
|
are used by index entries to point to physical rows.
|
|
|
|
4.13) Why do I get the error "ERROR: Memory exhausted in AllocSetAlloc()"?
|
|
|
|
You probably have run out of virtual memory on your system, or your
|
|
kernel has a low limit for certain resources. Try this before starting
|
|
the server:
|
|
ulimit -d 262144
|
|
limit datasize 256m
|
|
|
|
Depending on your shell, only one of these may succeed, but it will
|
|
set your process data segment limit much higher and perhaps allow the
|
|
query to complete. This command applies to the current process, and
|
|
all subprocesses created after the command is run. If you are having a
|
|
problem with the SQL client because the backend is returning too much
|
|
data, try it before starting the client.
|
|
|
|
4.14) How do I tell what PostgreSQL version I am running?
|
|
|
|
From psql, type SELECT version();
|
|
|
|
4.15) How do I create a column that will default to the current time?
|
|
|
|
Use CURRENT_TIMESTAMP:
|
|
CREATE TABLE test (x int, modtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
|
|
|
|
4.16) How do I perform an outer join?
|
|
|
|
PostgreSQL supports outer joins using the SQL standard syntax. Here
|
|
are two examples:
|
|
SELECT *
|
|
FROM t1 LEFT OUTER JOIN t2 ON (t1.col = t2.col);
|
|
|
|
or
|
|
SELECT *
|
|
FROM t1 LEFT OUTER JOIN t2 USING (col);
|
|
|
|
These identical queries join t1.col to t2.col, and also return any
|
|
unjoined rows in t1 (those with no match in t2). A RIGHT join would
|
|
add unjoined rows of t2. A FULL join would return the matched rows
|
|
plus all unjoined rows from t1 and t2. The word OUTER is optional and
|
|
is assumed in LEFT, RIGHT, and FULL joins. Ordinary joins are called
|
|
INNER joins.
|
|
|
|
4.17) How do I perform queries using multiple databases?
|
|
|
|
There is no way to query a database other than the current one.
|
|
Because PostgreSQL loads database-specific system catalogs, it is
|
|
uncertain how a cross-database query should even behave.
|
|
|
|
contrib/dblink allows cross-database queries using function calls. Of
|
|
course, a client can also make simultaneous connections to different
|
|
databases and merge the results on the client side.
|
|
|
|
4.18) How do I return multiple rows or columns from a function?
|
|
|
|
It is easy using set-returning functions,
|
|
http://www.postgresql.org/docs/techdocs.17.
|
|
|
|
4.19) Why do I get "relation with OID ##### does not exist" errors when
|
|
accessing temporary tables in PL/PgSQL functions?
|
|
|
|
In PostgreSQL versions < 8.3, PL/PgSQL caches function scripts, and an
|
|
unfortunate side effect is that if a PL/PgSQL function accesses a
|
|
temporary table, and that table is later dropped and recreated, and
|
|
the function called again, the function will fail because the cached
|
|
function contents still point to the old temporary table. The solution
|
|
is to use EXECUTE for temporary table access in PL/PgSQL. This will
|
|
cause the query to be reparsed every time.
|
|
|
|
This problem does not occur in PostgreSQL 8.3 and later.
|
|
|
|
4.20) What replication solutions are available?
|
|
|
|
Though "replication" is a single term, there are several technologies
|
|
for doing replication, with advantages and disadvantages for each.
|
|
|
|
Master/slave replication allows a single master to receive read/write
|
|
queries, while slaves can only accept read/SELECT queries. The most
|
|
popular freely available master-slave PostgreSQL replication solution
|
|
is Slony-I.
|
|
|
|
Multi-master replication allows read/write queries to be sent to
|
|
multiple replicated computers. This capability also has a severe
|
|
impact on performance due to the need to synchronize changes between
|
|
servers. PGCluster is the most popular such solution freely available
|
|
for PostgreSQL.
|
|
|
|
There are also commercial and hardware-based replication solutions
|
|
available supporting a variety of replication models.
|
|
|
|
4.21) Why are my table and column names not recognized in my query? Why is
|
|
capitalization not preserved?
|
|
|
|
The most common cause of unrecognized names is the use of
|
|
double-quotes around table or column names during table creation. When
|
|
double-quotes are used, table and column names (called identifiers)
|
|
are stored case-sensitive, meaning you must use double-quotes when
|
|
referencing the names in a query. Some interfaces, like pgAdmin,
|
|
automatically double-quote identifiers during table creation. So, for
|
|
identifiers to be recognized, you must either:
|
|
* Avoid double-quoting identifiers when creating tables
|
|
* Use only lowercase characters in identifiers
|
|
* Double-quote identifiers when referencing them in queries
|