878 lines
38 KiB
Plaintext
878 lines
38 KiB
Plaintext
|
|
Frequently Asked Questions (FAQ) for PostgreSQL
|
|
|
|
Last updated: Thu Feb 24 23:18:31 EST 2005
|
|
|
|
Current maintainer: Bruce Momjian (pgman@candle.pha.pa.us)
|
|
|
|
The most recent version of this document can be viewed at
|
|
http://www.postgresql.org/files/documentation/faqs/FAQ.html.
|
|
|
|
Platform-specific questions are answered at
|
|
http://www.postgresql.org/docs/faq/.
|
|
_________________________________________________________________
|
|
|
|
General Questions
|
|
|
|
1.1) What is PostgreSQL? How is it pronounced?
|
|
1.2) What is the copyright of PostgreSQL?
|
|
1.3) What platforms does PostgreSQL support?
|
|
1.4) Where can I get PostgreSQL?
|
|
1.5) Where can I get support?
|
|
1.6) How do I submit a bug report?
|
|
1.7) What is the latest release?
|
|
1.8) What documentation is available?
|
|
1.9) How do I find out about known bugs or missing features?
|
|
1.10) How can I learn SQL?
|
|
1.11) How do I join the development team?
|
|
1.12) How does PostgreSQL compare to other DBMSs?
|
|
|
|
User Client Questions
|
|
|
|
2.1) What interfaces are available for PostgreSQL?
|
|
2.2) What tools are available for using PostgreSQL with Web pages?
|
|
2.3) Does PostgreSQL have a graphical user interface?
|
|
|
|
Administrative Questions
|
|
|
|
3.1) How do I install PostgreSQL somewhere other than
|
|
/usr/local/pgsql?
|
|
3.2) How do I control connections from other hosts?
|
|
3.3) How do I tune the database engine for better performance?
|
|
3.4) What debugging features are available?
|
|
3.5) Why do I get "Sorry, too many clients" when trying to connect?
|
|
3.6) Why do I need to do a dump and restore to upgrade PostgreSQL
|
|
releases?
|
|
3.7) What computer hardware should I use?
|
|
|
|
Operational Questions
|
|
|
|
4.1) How do I SELECT only the first few rows of a query? A random row?
|
|
4.2) How do I find out what tables, indexes, databases, and users are
|
|
defined? How do I see the queries used by psql to display them?
|
|
4.3) How do you change a column's data type?
|
|
4.4) What is the maximum size for a row, a table, and a database?
|
|
4.5) How much database disk space is required to store data from a
|
|
typical text file?
|
|
4.6) Why are my queries slow? Why don't they use my indexes?
|
|
4.7) How do I see how the query optimizer is evaluating my query?
|
|
4.8) How do I perform regular expression searches and case-insensitive
|
|
regular expression searches? How do I use an index for
|
|
case-insensitive searches?
|
|
4.9) In a query, how do I detect if a field is NULL? How can I sort on
|
|
whether a field is NULL or not?
|
|
4.10) What is the difference between the various character types?
|
|
4.11.1) How do I create a serial/auto-incrementing field?
|
|
4.11.2) How do I get the value of a SERIAL insert?
|
|
4.11.3) Doesn't currval() lead to a race condition with other users?
|
|
4.11.4) Why aren't my sequence numbers reused on transaction abort?
|
|
Why are there gaps in the numbering of my sequence/SERIAL column?
|
|
4.12) What is an OID? What is a TID?
|
|
4.13) Why do I get the error "ERROR: Memory exhausted in
|
|
AllocSetAlloc()"?
|
|
4.14) How do I tell what PostgreSQL version I am running?
|
|
4.15) Why does my large-object operations get "invalid large obj
|
|
descriptor"?
|
|
4.16) How do I create a column that will default to the current time?
|
|
4.17) How do I perform an outer join?
|
|
4.18) How do I perform queries using multiple databases?
|
|
4.19) How do I return multiple rows or columns from a function?
|
|
4.20) Why do I get "relation with OID ##### does not exist" errors
|
|
when accessing temporary tables in PL/PgSQL functions?
|
|
4.21) What encryption options are available?
|
|
|
|
Extending PostgreSQL
|
|
|
|
5.1) I wrote a user-defined function. When I run it in psql, why does
|
|
it dump core?
|
|
5.2) How can I contribute some nifty new types and functions to
|
|
PostgreSQL?
|
|
5.3) How do I write a C function to return a tuple?
|
|
5.4) I have changed a source file. Why does the recompile not see the
|
|
change?
|
|
_________________________________________________________________
|
|
|
|
General Questions
|
|
|
|
1.1) What is PostgreSQL? How is it pronounced?
|
|
|
|
PostgreSQL is pronounced Post-Gres-Q-L, also called just Postgres.
|
|
|
|
PostgreSQL is an object-relational database system that has the
|
|
features of traditional commercial database systems with enhancements
|
|
to be found in next-generation DBMS systems. PostgreSQL is free and
|
|
the complete source code is available.
|
|
|
|
PostgreSQL development is performed by a team of mostly volunteer
|
|
developers spread throughout the world and communicating via the
|
|
Internet. It is a community project and is not controlled by any
|
|
company. To get involved, see the developer's FAQ at
|
|
http://www.postgresql.org/files/documentation/faqs/FAQ_DEV.html
|
|
|
|
1.2) What is the copyright of PostgreSQL?
|
|
|
|
PostgreSQL is distributed under the classic BSD license. It has no
|
|
restrictions on how the source code can be used. We like it and have
|
|
no intention of changing it.
|
|
|
|
This is the BSD license we use:
|
|
|
|
PostgreSQL Data Base Management System
|
|
|
|
Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
|
|
Portions Copyright (c) 1994-1996 Regents of the University of
|
|
California
|
|
|
|
Permission to use, copy, modify, and distribute this software and its
|
|
documentation for any purpose, without fee, and without a written
|
|
agreement is hereby granted, provided that the above copyright notice
|
|
and this paragraph and the following two paragraphs appear in all
|
|
copies.
|
|
|
|
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY
|
|
FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES,
|
|
INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND
|
|
ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN
|
|
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
|
|
|
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES,
|
|
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
|
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE
|
|
PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE UNIVERSITY OF
|
|
CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,
|
|
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
|
|
|
|
1.3) What platforms does PostgreSQL support?
|
|
|
|
In general, any modern Unix-compatible platform should be able to run
|
|
PostgreSQL. The platforms that had received explicit testing at the
|
|
time of release are listed in the installation instructions.
|
|
|
|
Starting with version 8.0, PostgreSQL now runs natively on Microsoft
|
|
Windows NT-based operating systems like Win2000, WinXP, and Win2003. A
|
|
prepackaged installer is available at
|
|
http://pgfoundry.org/projects/pginstaller. MSDOS-based versions of
|
|
Windows (Win95, Win98, WinMe) can run PostgreSQL using Cygwin.
|
|
|
|
There is also a Novell Netware 6 port at http://forge.novell.com, and
|
|
an OS/2 (eComStation) version at
|
|
http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=postgre
|
|
SQL&stype=all&sort=type&dir=%2F.
|
|
|
|
1.4) Where can I get PostgreSQL?
|
|
|
|
The primary anonymous ftp site for PostgreSQL is
|
|
ftp://ftp.PostgreSQL.org/pub/. For mirror sites, see our main web
|
|
site.
|
|
|
|
1.5) Where can I get support?
|
|
|
|
The PostgreSQL community provides assistance to many of its users via
|
|
email. The main web site to subscribe to the email lists is
|
|
http://www.postgresql.org/community/lists/. The general or bugs lists
|
|
are a good place to start.
|
|
|
|
The major IRC channel is #postgresql on Freenode (irc.freenode.net).
|
|
To connect you can use the Unix program irc -c '#postgresql' "$USER"
|
|
irc.freenode.net or use any of the other popular IRC clients. A
|
|
Spanish one also exists on the same network, (#postgresql-es), and a
|
|
French one, (#postgresqlfr). There is also a PostgreSQL channel on
|
|
EFNet.
|
|
|
|
A list of commercial support companies is available at
|
|
http://techdocs.postgresql.org/companies.php.
|
|
|
|
1.6) How do I submit a bug report?
|
|
|
|
Visit the PostgreSQL bug form at
|
|
http://www.postgresql.org/support/submitbug.
|
|
|
|
Also check out our ftp site ftp://ftp.PostgreSQL.org/pub/ to see if
|
|
there is a more recent PostgreSQL version.
|
|
|
|
1.7) What is the latest release?
|
|
|
|
The latest release of PostgreSQL is version 8.0.0.
|
|
|
|
We plan to have major releases every ten to twelve months.
|
|
|
|
1.8) What documentation is available?
|
|
|
|
PostgreSQL includes extensive documentation, including a large manual,
|
|
manual pages, and some test examples. See the /doc directory. You can
|
|
also browse the manuals online at http://www.PostgreSQL.org/docs.
|
|
|
|
There are two PostgreSQL books available online at
|
|
http://www.postgresql.org/docs/books/awbook.html and
|
|
http://www.commandprompt.com/ppbook/. There is a list of PostgreSQL
|
|
books available for purchase at
|
|
http://techdocs.PostgreSQL.org/techdocs/bookreviews.php. There is also
|
|
a collection of PostgreSQL technical articles at
|
|
http://techdocs.PostgreSQL.org/.
|
|
|
|
The command line client program psql has some \d commands to show
|
|
information about types, operators, functions, aggregates, etc. - use
|
|
\? to display the available commands.
|
|
|
|
Our web site contains even more documentation.
|
|
|
|
1.9) How do I find out about known bugs or missing features?
|
|
|
|
PostgreSQL supports an extended subset of SQL-92. See our TODO list
|
|
for known bugs, missing features, and future plans.
|
|
|
|
1.10) How can I learn SQL?
|
|
|
|
The PostgreSQL book at
|
|
http://www.postgresql.org/docs/books/awbook.html teaches SQL. There is
|
|
another PostgreSQL book at http://www.commandprompt.com/ppbook. There
|
|
is a nice tutorial at
|
|
http://www.intermedia.net/support/sql/sqltut.shtm, at
|
|
http://ourworld.compuserve.com/homepages/graeme_birchall/HTM_COOK.HTM,
|
|
and at http://sqlcourse.com.
|
|
|
|
Another one is "Teach Yourself SQL in 21 Days, Second Edition" at
|
|
http://members.tripod.com/er4ebus/sql/index.htm
|
|
|
|
Many of our users like The Practical SQL Handbook, Bowman, Judith S.,
|
|
et al., Addison-Wesley. Others like The Complete Reference SQL, Groff
|
|
et al., McGraw-Hill.
|
|
|
|
1.11) How do I join the development team?
|
|
|
|
First, download the latest source and read the PostgreSQL Developers
|
|
FAQ and documentation on our web site, or in the distribution. Second,
|
|
subscribe to the pgsql-hackers and pgsql-patches mailing lists. Third,
|
|
submit high quality patches to pgsql-patches.
|
|
|
|
There are about a dozen people who have commit privileges to the
|
|
PostgreSQL CVS archive. They each have submitted so many high-quality
|
|
patches that it was impossible for the existing committers to keep up,
|
|
and we had confidence that patches they committed were of high
|
|
quality.
|
|
|
|
1.12) How does PostgreSQL compare to other DBMSs?
|
|
|
|
There are several ways of measuring software: features, performance,
|
|
reliability, support, and price.
|
|
|
|
Features
|
|
PostgreSQL has most features present in large commercial DBMSs,
|
|
like transactions, subselects, triggers, views, foreign key
|
|
referential integrity, and sophisticated locking. We have some
|
|
features they do not have, like user-defined types,
|
|
inheritance, rules, and multi-version concurrency control to
|
|
reduce lock contention.
|
|
|
|
Performance
|
|
PostgreSQL's performance is comparable to other commercial and
|
|
open source databases. It is faster for some things, slower for
|
|
others. In comparison to MySQL or leaner database systems, we
|
|
are faster for multiple users, complex queries, and a
|
|
read/write query load. MySQL is faster for simple SELECT
|
|
queries done by a few users. Of course, MySQL does not have
|
|
most of the features mentioned in the Features section above.
|
|
We are built for reliability and features, and we continue to
|
|
improve performance in every release.
|
|
|
|
Reliability
|
|
We realize that a DBMS must be reliable, or it is worthless. We
|
|
strive to release well-tested, stable code that has a minimum
|
|
of bugs. Each release has at least one month of beta testing,
|
|
and our release history shows that we can provide stable, solid
|
|
releases that are ready for production use. We believe we
|
|
compare favorably to other database software in this area.
|
|
|
|
Support
|
|
Our mailing lists provide contact with a large group of
|
|
developers and users to help resolve any problems encountered.
|
|
While we cannot guarantee a fix, commercial DBMSs do not always
|
|
supply a fix either. Direct access to developers, the user
|
|
community, manuals, and the source code often make PostgreSQL
|
|
support superior to other DBMSs. There is commercial
|
|
per-incident support available for those who need it. (See FAQ
|
|
section 1.5.)
|
|
|
|
Price
|
|
We are free for all use, both commercial and non-commercial.
|
|
You can add our code to your product with no limitations,
|
|
except those outlined in our BSD-style license stated above.
|
|
_________________________________________________________________
|
|
|
|
User Client Questions
|
|
|
|
2.1) What interfaces are available for PostgreSQL?
|
|
|
|
The PostgreSQL install includes only the C and embedded C interfaces.
|
|
All other interfaces are independent projects that are downloaded
|
|
separately; being separate allows them to have their own release
|
|
schedule and development teams.
|
|
|
|
Some programming languages like PHP include an interface to
|
|
PostgreSQL. Interfaces for languages like Perl, TCL, Python, and many
|
|
others are available at http://gborg.postgresql.org in the
|
|
Drivers/Interfaces section and via Internet search.
|
|
|
|
2.2) What tools are available for using PostgreSQL with Web pages?
|
|
|
|
A nice introduction to Database-backed Web pages can be seen at:
|
|
http://www.webreview.com
|
|
|
|
For Web integration, PHP (http://www.php.net) is an excellent
|
|
interface.
|
|
|
|
For complex cases, many use the Perl and DBD::Pg with CGI.pm or
|
|
mod_perl.
|
|
|
|
2.3) Does PostgreSQL have a graphical user interface?
|
|
|
|
Yes, there are several graphical interfaces to PostgreSQL available.
|
|
These include pgAdmin III (http://www.pgadmin.org, PgAccess
|
|
http://www.pgaccess.org), RHDB Admin (http://sources.redhat.com/rhdb/
|
|
), TORA ( http://www.globecom.net/tora/, partly commercial), and
|
|
Rekall ( http://www.rekallrevealed.org/). There is also PhpPgAdmin (
|
|
http://phppgadmin.sourceforge.net/ ), a web-based interface to
|
|
PostgreSQL.
|
|
|
|
See http://techdocs.postgresql.org/guides/GUITools for a more detailed
|
|
list.
|
|
_________________________________________________________________
|
|
|
|
Administrative Questions
|
|
|
|
3.1) How do I install PostgreSQL somewhere other than /usr/local/pgsql?
|
|
|
|
Specify the --prefix option when running configure.
|
|
|
|
3.2) How do I control connections from other hosts?
|
|
|
|
By default, PostgreSQL only allows connections from the local machine
|
|
using Unix domain sockets or TCP/IP connections. Other machines will
|
|
not be able to connect unless you modify listen_addresses in the
|
|
postgresql.conf file, enable host-based authentication by modifying
|
|
the $PGDATA/pg_hba.conf file, and restart the server.
|
|
|
|
3.3) How do I tune the database engine for better performance?
|
|
|
|
There are three major areas for potential performance improvement:
|
|
|
|
Query Changes
|
|
This involves modifying queries to obtain better performance:
|
|
|
|
+ Creation of indexes, including expression and partial indexes
|
|
+ Use of COPY instead of multiple INSERTs
|
|
+ Grouping of multiple statements into a single transaction to
|
|
reduce commit overhead
|
|
+ Use of CLUSTER when retrieving many rows from an index
|
|
+ Use of LIMIT for returning a subset of a query's output
|
|
+ Use of Prepared queries
|
|
+ Use of ANALYZE to maintain accurate optimizer statistics
|
|
+ Regular use of VACUUM or pg_autovacuum
|
|
+ Dropping of indexes during large data changes
|
|
|
|
Server Configuration
|
|
A number of postgresql.conf settings affect performance. For
|
|
more details, see Administration Guide/Server Run-time
|
|
Environment/Run-time Configuration for a full listing, and for
|
|
commentary see
|
|
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_co
|
|
nf_e.html and
|
|
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html.
|
|
|
|
Hardware Selection
|
|
The effect of hardware on performance is detailed in
|
|
http://candle.pha.pa.us/main/writings/pgsql/hw_performance/inde
|
|
x.html and http://www.powerpostgresql.com/PerfList/.
|
|
|
|
3.4) What debugging features are available?
|
|
|
|
There are many log_* server configuration variables that enable
|
|
printing of query and process statistics which can be very useful for
|
|
debugging and performance measurements.
|
|
|
|
The following detailed debug instructions are to be used to provide
|
|
more detailed information for server developers debugging a problem.
|
|
|
|
It is also possible to debug the server if it isn't operating
|
|
properly. First, by running configure with the --enable-cassert
|
|
option, many assert()s monitor the progress of the backend and halt
|
|
the program when something unexpected occurs.
|
|
|
|
The postmaster has a -d option that allows even more detailed
|
|
information to be reported. The -d option takes a number that
|
|
specifies the debug level. Be warned that high debug level values
|
|
generate large log files.
|
|
|
|
If postmaster is not running, you can actually run the postgres
|
|
backend from the command line, and type your SQL statement directly.
|
|
This is recommended only for debugging purposes. Note that a newline
|
|
terminates the query, not a semicolon. If you have compiled with
|
|
debugging symbols, you can use a debugger to see what is happening.
|
|
Because the backend was not started from postmaster, it is not running
|
|
in an identical environment and locking/backend interaction problems
|
|
may not be duplicated.
|
|
|
|
If postmaster is running, start psql in one window, then find the PID
|
|
of the postgres process used by psql using SELECT pg_backend_pid().
|
|
Use a debugger to attach to the postgres PID. You can set breakpoints
|
|
in the debugger and issue queries from psql. If you are debugging
|
|
postgres startup, you can set PGOPTIONS="-W n", then start psql. This
|
|
will cause startup to delay for n seconds so you can attach to the
|
|
process with the debugger, set any breakpoints, and continue through
|
|
the startup sequence.
|
|
|
|
You can also compile with profiling to see what functions are taking
|
|
execution time. The backend profile files will be deposited in the
|
|
pgsql/data/base/dbname directory. The client profile file will be put
|
|
in the client's current directory. Linux requires a compile with
|
|
-DLINUX_PROFILE for proper profiling.
|
|
|
|
3.5) Why do I get "Sorry, too many clients" when trying to connect?
|
|
|
|
You have reached the default limit is 100 database sessions. You need
|
|
to increase the postmaster's limit on how many concurrent backend
|
|
processes it can start by changing the max_connections value in
|
|
postgresql.conf and restarting the postmaster.
|
|
|
|
3.6) Why do I need to do a dump and restore to upgrade between major
|
|
PostgreSQL releases?
|
|
|
|
The PostgreSQL team makes only small changes between minor releases,
|
|
so upgrading from 7.4 to 7.4.1 does not require a dump and restore.
|
|
However, major releases (e.g. from 7.3 to 7.4) often change the
|
|
internal format of system tables and data files. These changes are
|
|
often complex, so we don't maintain backward compatibility for data
|
|
files. A dump outputs data in a generic format that can then be loaded
|
|
in using the new internal format.
|
|
|
|
3.7) What computer hardware should I use?
|
|
|
|
Because PC hardware is mostly compatible, people tend to believe that
|
|
all PC hardware is of equal quality. It is not. ECC RAM, SCSI, and
|
|
quality motherboards are more reliable and have better performance
|
|
than less expensive hardware. PostgreSQL will run on almost any
|
|
hardware, but if reliability and performance are important it is wise
|
|
to research your hardware options thoroughly. Our email lists can be
|
|
used to discuss hardware options and tradeoffs.
|
|
_________________________________________________________________
|
|
|
|
Operational Questions
|
|
|
|
4.1) How do I SELECT only the first few rows of a query? A random row?
|
|
|
|
To retrieve only a few rows, if you know at the number of rows needed
|
|
at the time of the SELECT use LIMIT . If an index matches the ORDER BY
|
|
it is possible the entire query does not have to be executed. If you
|
|
don't know the number of rows at SELECT time, use a cursor and FETCH.
|
|
|
|
To SELECT a random row, use:
|
|
SELECT col
|
|
FROM tab
|
|
ORDER BY random()
|
|
LIMIT 1;
|
|
|
|
4.2) How do I find out what tables, indexes, databases, and users are
|
|
defined? How do I see the queries used by psql to display them?
|
|
|
|
Use the \dt command to see tables in psql. For a complete list of
|
|
commands inside psql you can use \?. Alternatively you can read the
|
|
source code for psql in file pgsql/src/bin/psql/describe.c, it
|
|
contains SQL commands that generate the output for psql's backslash
|
|
commands. You can also start psql with the -E option so it will print
|
|
out the queries it uses to execute the commands you give. PostgreSQL
|
|
also provides an SQL compliant INFORMATION SCHEMA interface you can
|
|
query to get information about the database.
|
|
|
|
There are also system tables beginning with pg_ that describe these
|
|
too.
|
|
|
|
Use psql -l will list all databases.
|
|
|
|
Also try the file pgsql/src/tutorial/syscat.source. It illustrates
|
|
many of the SELECTs needed to get information from the database system
|
|
tables.
|
|
|
|
4.3) How do you change a column's data type?
|
|
|
|
Changing the data type of a column can be done easily in 8.0 and later
|
|
with ALTER TABLE ALTER COLUMN TYPE.
|
|
|
|
In earlier releases, do this:
|
|
BEGIN;
|
|
ALTER TABLE tab ADD COLUMN new_col new_data_type;
|
|
UPDATE tab SET new_col = CAST(old_col AS new_data_type);
|
|
ALTER TABLE tab DROP COLUMN old_col;
|
|
COMMIT;
|
|
|
|
You might then want to do VACUUM FULL tab to reclaim the disk space
|
|
used by the expired rows.
|
|
|
|
4.4) What is the maximum size for a row, a table, and a database?
|
|
|
|
These are the limits:
|
|
|
|
Maximum size for a database? unlimited (32 TB databases exist)
|
|
Maximum size for a table? 32 TB
|
|
Maximum size for a row? 1.6TB
|
|
Maximum size for a field? 1 GB
|
|
Maximum number of rows in a table? unlimited
|
|
Maximum number of columns in a table? 250-1600 depending on column
|
|
types
|
|
Maximum number of indexes on a table? unlimited
|
|
|
|
Of course, these are not actually unlimited, but limited to available
|
|
disk space and memory/swap space. Performance may suffer when these
|
|
values get unusually large.
|
|
|
|
The maximum table size of 32 TB does not require large file support
|
|
from the operating system. Large tables are stored as multiple 1 GB
|
|
files so file system size limits are not important.
|
|
|
|
The maximum table size and maximum number of columns can be quadrupled
|
|
by increasing the default block size to 32k.
|
|
|
|
4.5) How much database disk space is required to store data from a typical
|
|
text file?
|
|
|
|
A PostgreSQL database may require up to five times the disk space to
|
|
store data from a text file.
|
|
|
|
As an example, consider a file of 100,000 lines with an integer and
|
|
text description on each line. Suppose the text string avergages
|
|
twenty bytes in length. The flat file would be 2.8 MB. The size of the
|
|
PostgreSQL database file containing this data can be estimated as 6.4
|
|
MB:
|
|
32 bytes: each row header (approximate)
|
|
24 bytes: one int field and one text field
|
|
+ 4 bytes: pointer on page to tuple
|
|
----------------------------------------
|
|
60 bytes per row
|
|
|
|
The data page size in PostgreSQL is 8192 bytes (8 KB), so:
|
|
|
|
8192 bytes per page
|
|
------------------- = 136 rows per database page (rounded down)
|
|
60 bytes per row
|
|
|
|
100000 data rows
|
|
-------------------- = 735 database pages (rounded up)
|
|
128 rows per page
|
|
|
|
735 database pages * 8192 bytes per page = 6,021,120 bytes (6 MB)
|
|
|
|
Indexes do not require as much overhead, but do contain the data that
|
|
is being indexed, so they can be large also.
|
|
|
|
NULLs are stored as bitmaps, so they use very little space.
|
|
|
|
4.6) Why are my queries slow? Why don't they use my indexes?
|
|
|
|
Indexes are not automatically used by every query. Indexes are only
|
|
used if the table is larger than a minimum size, and the query selects
|
|
only a small percentage of the rows in the table. This is because the
|
|
random disk access caused by an index scan can be slower than a
|
|
straight read through the table, or sequential scan.
|
|
|
|
To determine if an index should be used, PostgreSQL must have
|
|
statistics about the table. These statistics are collected using
|
|
VACUUM ANALYZE, or simply ANALYZE. Using statistics, the optimizer
|
|
knows how many rows are in the table, and can better determine if
|
|
indexes should be used. Statistics are also valuable in determining
|
|
optimal join order and join methods. Statistics collection should be
|
|
performed periodically as the contents of the table change.
|
|
|
|
Indexes are normally not used for ORDER BY or to perform joins. A
|
|
sequential scan followed by an explicit sort is usually faster than an
|
|
index scan of a large table.
|
|
However, LIMIT combined with ORDER BY often will use an index because
|
|
only a small portion of the table is returned. In fact, though MAX()
|
|
and MIN() don't use indexes, it is possible to retrieve such values
|
|
using an index with ORDER BY and LIMIT:
|
|
SELECT col
|
|
FROM tab
|
|
ORDER BY col [ DESC ]
|
|
LIMIT 1;
|
|
|
|
If you believe the optimizer is incorrect in choosing a sequential
|
|
scan, use SET enable_seqscan TO 'off' and run tests to see if an index
|
|
scan is indeed faster.
|
|
|
|
When using wild-card operators such as LIKE or ~, indexes can only be
|
|
used in certain circumstances:
|
|
* The beginning of the search string must be anchored to the start
|
|
of the string, i.e.
|
|
+ LIKE patterns must not start with %.
|
|
+ ~ (regular expression) patterns must start with ^.
|
|
* The search string can not start with a character class, e.g.
|
|
[a-e].
|
|
* Case-insensitive searches such as ILIKE and ~* do not utilize
|
|
indexes. Instead, use expression indexes, which are described in
|
|
section 4.8.
|
|
* The default C locale must be used during initdb because it is not
|
|
possible to know the next-greatest character in a non-C locale.
|
|
You can create a special text_pattern_ops index for such cases
|
|
that work only for LIKE indexing.
|
|
|
|
In pre-8.0 releases, indexes often can not be used unless the data
|
|
types exactly match the index's column types. This was particularly
|
|
true of int2, int8, and numeric column indexes.
|
|
|
|
4.7) How do I see how the query optimizer is evaluating my query?
|
|
|
|
See the EXPLAIN manual page.
|
|
|
|
4.8) How do I perform regular expression searches and case-insensitive
|
|
regular expression searches? How do I use an index for case-insensitive
|
|
searches?
|
|
|
|
The ~ operator does regular expression matching, and ~* does
|
|
case-insensitive regular expression matching. The case-insensitive
|
|
variant of LIKE is called ILIKE.
|
|
|
|
Case-insensitive equality comparisons are normally expressed as:
|
|
SELECT *
|
|
FROM tab
|
|
WHERE lower(col) = 'abc';
|
|
|
|
This will not use an standard index. However, if you create a
|
|
expresssion index, it will be used:
|
|
CREATE INDEX tabindex ON tab (lower(col));
|
|
|
|
4.9) In a query, how do I detect if a field is NULL? How can I sort on
|
|
whether a field is NULL or not?
|
|
|
|
You test the column with IS NULL and IS NOT NULL.
|
|
SELECT *
|
|
FROM tab
|
|
WHERE col IS NULL;
|
|
|
|
To sort by the NULL status, use the IS NULL and IS NOT NULL modifiers
|
|
in your WHERE clause. Things that are true will sort higher than
|
|
things that are false, so the following will put NULL entries at the
|
|
top of the resulting list:
|
|
SELECT *
|
|
FROM tab
|
|
ORDER BY (col IS NOT NULL)
|
|
|
|
4.10) What is the difference between the various character types?
|
|
|
|
Type Internal Name Notes
|
|
VARCHAR(n) varchar size specifies maximum length, no padding
|
|
CHAR(n) bpchar blank padded to the specified fixed length
|
|
TEXT text no specific upper limit on length
|
|
BYTEA bytea variable-length byte array (null-byte safe)
|
|
"char" char one character
|
|
|
|
You will see the internal name when examining system catalogs and in
|
|
some error messages.
|
|
|
|
The first four types above are "varlena" types (i.e., the first four
|
|
bytes on disk are the length, followed by the data). Thus the actual
|
|
space used is slightly greater than the declared size. However, these
|
|
data types are also subject to compression or being stored out-of-line
|
|
by TOAST, so the space on disk might also be less than expected.
|
|
VARCHAR(n) is best when storing variable-length strings and it limits
|
|
how long a string can be. TEXT is for strings of unlimited length,
|
|
with a maximum of one gigabyte.
|
|
|
|
CHAR(n) is for storing strings that are all the same length. CHAR(n)
|
|
pads with blanks to the specified length, while VARCHAR(n) only stores
|
|
the characters supplied. BYTEA is for storing binary data,
|
|
particularly values that include NULL bytes. All the types described
|
|
here have similar performance characteristics.
|
|
|
|
4.11.1) How do I create a serial/auto-incrementing field?
|
|
|
|
PostgreSQL supports a SERIAL data type. It auto-creates a sequence.
|
|
For example, this:
|
|
CREATE TABLE person (
|
|
id SERIAL,
|
|
name TEXT
|
|
);
|
|
|
|
is automatically translated into this:
|
|
CREATE SEQUENCE person_id_seq;
|
|
CREATE TABLE person (
|
|
id INT4 NOT NULL DEFAULT nextval('person_id_seq'),
|
|
name TEXT
|
|
);
|
|
|
|
See the create_sequence manual page for more information about
|
|
sequences.
|
|
|
|
4.11.2) How do I get the value of a SERIAL insert?
|
|
|
|
One approach is to retrieve the next SERIAL value from the sequence
|
|
object with the nextval() function before inserting and then insert it
|
|
explicitly. Using the example table in 4.11.1, an example in a
|
|
pseudo-language would look like this:
|
|
new_id = execute("SELECT nextval('person_id_seq')");
|
|
execute("INSERT INTO person (id, name) VALUES (new_id, 'Blaise Pascal')");
|
|
|
|
You would then also have the new value stored in new_id for use in
|
|
other queries (e.g., as a foreign key to the person table). Note that
|
|
the name of the automatically created SEQUENCE object will be named
|
|
<table>_<serialcolumn>_seq, where table and serialcolumn are the names
|
|
of your table and your SERIAL column, respectively.
|
|
|
|
Alternatively, you could retrieve the assigned SERIAL value with the
|
|
currval() function after it was inserted by default, e.g.,
|
|
execute("INSERT INTO person (name) VALUES ('Blaise Pascal')");
|
|
new_id = execute("SELECT currval('person_id_seq')");
|
|
|
|
4.11.3) Doesn't currval() lead to a race condition with other users?
|
|
|
|
No. currval() returns the current value assigned by your session, not
|
|
by all sessions.
|
|
|
|
4.11.4) Why aren't my sequence numbers reused on transaction abort? Why are
|
|
there gaps in the numbering of my sequence/SERIAL column?
|
|
|
|
To improve concurrency, sequence values are given out to running
|
|
transactions as needed and are not locked until the transaction
|
|
completes. This causes gaps in numbering from aborted transactions.
|
|
|
|
4.12) What is an OID? What is a TID?
|
|
|
|
Every row that is created in PostgreSQL gets a unique OID unless
|
|
created WITHOUT OIDS. OIDs are autotomatically assigned unique 4-byte
|
|
integers that are unique across the entire installation. However, they
|
|
overflow at 4 billion, and then the OIDs start being duplicated.
|
|
PostgreSQL uses OIDs to link its internal system tables together.
|
|
|
|
To uniquely number columns in user tables, it is best to use SERIAL
|
|
rather than OIDs because SERIAL sequences are unique only within a
|
|
single table. and are therefore less likely to overflow. SERIAL8 is
|
|
available for storing eight-byte sequence values.
|
|
|
|
TIDs are used to identify specific physical rows with block and offset
|
|
values. TIDs change after rows are modified or reloaded. They are used
|
|
by index entries to point to physical rows.
|
|
|
|
4.13) Why do I get the error "ERROR: Memory exhausted in AllocSetAlloc()"?
|
|
|
|
You probably have run out of virtual memory on your system, or your
|
|
kernel has a low limit for certain resources. Try this before starting
|
|
postmaster:
|
|
ulimit -d 262144
|
|
limit datasize 256m
|
|
|
|
Depending on your shell, only one of these may succeed, but it will
|
|
set your process data segment limit much higher and perhaps allow the
|
|
query to complete. This command applies to the current process, and
|
|
all subprocesses created after the command is run. If you are having a
|
|
problem with the SQL client because the backend is returning too much
|
|
data, try it before starting the client.
|
|
|
|
4.14) How do I tell what PostgreSQL version I am running?
|
|
|
|
From psql, type SELECT version();
|
|
|
|
4.15) Why does my large-object operations get "invalid large obj
|
|
descriptor"?
|
|
|
|
You need to put BEGIN WORK and COMMIT around any use of a large object
|
|
handle, that is, surrounding lo_open ... lo_close.
|
|
|
|
Currently PostgreSQL enforces the rule by closing large object handles
|
|
at transaction commit. So the first attempt to do anything with the
|
|
handle will draw invalid large obj descriptor. So code that used to
|
|
work (at least most of the time) will now generate that error message
|
|
if you fail to use a transaction.
|
|
|
|
If you are using a client interface like ODBC you may need to set
|
|
auto-commit off.
|
|
|
|
4.16) How do I create a column that will default to the current time?
|
|
|
|
Use CURRENT_TIMESTAMP:
|
|
CREATE TABLE test (x int, modtime TIMESTAMP DEFAULT CURRENT_TIMESTAMP );
|
|
|
|
4.17) How do I perform an outer join?
|
|
|
|
PostgreSQL supports outer joins using the SQL standard syntax. Here
|
|
are two examples:
|
|
SELECT *
|
|
FROM t1 LEFT OUTER JOIN t2 ON (t1.col = t2.col);
|
|
|
|
or
|
|
SELECT *
|
|
FROM t1 LEFT OUTER JOIN t2 USING (col);
|
|
|
|
These identical queries join t1.col to t2.col, and also return any
|
|
unjoined rows in t1 (those with no match in t2). A RIGHT join would
|
|
add unjoined rows of t2. A FULL join would return the matched rows
|
|
plus all unjoined rows from t1 and t2. The word OUTER is optional and
|
|
is assumed in LEFT, RIGHT, and FULL joins. Ordinary joins are called
|
|
INNER joins.
|
|
|
|
4.18) How do I perform queries using multiple databases?
|
|
|
|
There is no way to query a database other than the current one.
|
|
Because PostgreSQL loads database-specific system catalogs, it is
|
|
uncertain how a cross-database query should even behave.
|
|
|
|
contrib/dblink allows cross-database queries using function calls. Of
|
|
course, a client can also make simultaneous connections to different
|
|
databases and merge the results on the client side.
|
|
|
|
4.19) How do I return multiple rows or columns from a function?
|
|
|
|
It is easy using set-returning functions,
|
|
http://techdocs.postgresql.org/guides/SetReturningFunctions.
|
|
|
|
4.20) Why do I get "relation with OID ##### does not exist" errors when
|
|
accessing temporary tables in PL/PgSQL functions?
|
|
|
|
PL/PgSQL caches function scripts, and an unfortunate side effect is
|
|
that if a PL/PgSQL function accesses a temporary table, and that table
|
|
is later dropped and recreated, and the function called again, the
|
|
function will fail because the cached function contents still point to
|
|
the old temporary table. The solution is to use EXECUTE for temporary
|
|
table access in PL/PgSQL. This will cause the query to be reparsed
|
|
every time.
|
|
|
|
4.21) What encryption options are available?
|
|
|
|
* contrib/pgcrypto contains many encryption functions for use in SQL
|
|
queries.
|
|
* To encrypt transmission from the client to the server, the server
|
|
must have the ssl option set to true in postgresql.conf, and an
|
|
applicable host or hostssl record must exist in pg_hba.conf, and
|
|
the client sslmode must not be disable. (Note that it is also
|
|
possible to use a third-party encrypted transport, such as stunnel
|
|
or ssh, rather than PostgreSQL's native SSL connections.)
|
|
* Database user passwords are automatically encrypted when stored in
|
|
the system tables.
|
|
* The server can run using an encrypted file system.
|
|
_________________________________________________________________
|
|
|
|
Extending PostgreSQL
|
|
|
|
5.1) I wrote a user-defined function. When I run it in psql, why does it
|
|
dump core?
|
|
|
|
The problem could be a number of things. Try testing your user-defined
|
|
function in a stand-alone test program first.
|
|
|
|
5.2) How can I contribute some nifty new types and functions to PostgreSQL?
|
|
|
|
Send your extensions to the pgsql-hackers mailing list, and they will
|
|
eventually end up in the contrib/ subdirectory.
|
|
|
|
5.3) How do I write a C function to return a tuple?
|
|
|
|
In versions of PostgreSQL beginning with 7.3, table-returning
|
|
functions are fully supported in C, PL/PgSQL, and SQL. See the
|
|
Programmer's Guide for more information. An example of a
|
|
table-returning function defined in C can be found in
|
|
contrib/tablefunc.
|
|
|
|
5.4) I have changed a source file. Why does the recompile not see the
|
|
change?
|
|
|
|
The Makefiles do not have the proper dependencies for include files.
|
|
You have to do a make clean and then another make. If you are using
|
|
GCC you can use the --enable-depend option of configure to have the
|
|
compiler compute the dependencies automatically.
|