postgres/doc/src/sgml/filelayout.sgml
Tom Lane 7f4b5a003b Add some real documentation about the overall filesystem layout used by
a Postgres database.  Update page.sgml to match 8.0 tuple header layout.
2004-11-12 21:50:53 +00:00

162 lines
5.1 KiB
Plaintext

<!--
$PostgreSQL: pgsql/doc/src/sgml/filelayout.sgml,v 1.1 2004/11/12 21:50:53 tgl Exp $
-->
<chapter id="file-layout">
<title>Database File Layout</title>
<abstract>
<para>
A description of the database physical storage layout.
</para>
</abstract>
<para>
This section provides an overview of the physical format used by
<productname>PostgreSQL</productname> databases.
</para>
<para>
All the data needed for a database cluster is stored within the cluster's data
directory, commonly referred to as <varname>PGDATA</> (after the name of the
environment variable that can be used to define it). A common location for
<varname>PGDATA</> is <filename>/var/lib/pgsql/data</>. Multiple clusters,
managed by different postmasters, can exist on the same machine.
</para>
<para>
The <varname>PGDATA</> directory contains several subdirectories and control
files, as shown in <xref linkend="pgdata-contents-table">. In addition to
these required items, the cluster configuration files
<filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
<filename>pg_ident.conf</filename> are traditionally stored in
<varname>PGDATA</> (although beginning in
<productname>PostgreSQL</productname> 8.0 it is possible to keep them
elsewhere).
</para>
<table tocentry="1" id="pgdata-contents-table">
<title>Contents of <varname>PGDATA</></title>
<tgroup cols="2">
<thead>
<row>
<entry>
Item
</entry>
<entry>Description</entry>
</row>
</thead>
<tbody>
<row>
<entry><filename>PG_VERSION</></entry>
<entry>A file containing the major version number of <productname>PostgreSQL</productname></entry>
</row>
<row>
<entry><filename>base</></entry>
<entry>Subdirectory containing per-database subdirectories</entry>
</row>
<row>
<entry><filename>global</></entry>
<entry>Subdirectory containing cluster-wide tables, such as
<structname>pg_database</></entry>
</row>
<row>
<entry><filename>pg_clog</></entry>
<entry>Subdirectory containing transaction commit status data</entry>
</row>
<row>
<entry><filename>pg_subtrans</></entry>
<entry>Subdirectory containing subtransaction status data</entry>
</row>
<row>
<entry><filename>pg_tblspc</></entry>
<entry>Subdirectory containing symbolic links to tablespaces</entry>
</row>
<row>
<entry><filename>pg_xlog</></entry>
<entry>Subdirectory containing WAL (Write Ahead Log) files</entry>
</row>
<row>
<entry><filename>postmaster.opts</></entry>
<entry>A file recording the command-line options the postmaster was
last started with</entry>
</row>
<row>
<entry><filename>postmaster.pid</></entry>
<entry>A lock file recording the current postmaster PID and shared memory
segment ID (not present after postmaster shutdown)</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
For each database in the cluster there is a subdirectory within
<varname>PGDATA</><filename>/base</>, named after the database's OID in
<structname>pg_database</>. This subdirectory is the default location
for the database's files; in particular, its system catalogs are stored
there.
</para>
<para>
Each table and index is stored in a separate file, named after the table
or index's <firstterm>filenode</> number, which can be found in
<structname>pg_class</>.<structfield>relfilenode</>.
</para>
<caution>
<para>
Note that while a table's filenode often matches its OID, this is
<emphasis>not</> necessarily the case; some operations, like
<command>TRUNCATE</>, <command>REINDEX</>, <command>CLUSTER</> and some forms
of <command>ALTER TABLE</>, can change the filenode while preserving the OID.
Avoid assuming that filenode and table OID are the same.
</para>
</caution>
<para>
When a table or index exceeds 1Gb, it is divided into gigabyte-sized
<firstterm>segments</>. The first segment's file name is the same as the
filenode; subsequent segments are named filenode.1, filenode.2, etc.
This arrangement avoids problems on platforms that have file size limitations.
The contents of tables and indexes are discussed further in
<xref linkend="page">.
</para>
<para>
A table that has columns with potentially large entries will have an
associated <firstterm>TOAST</> table, which is used for out-of-line storage of
field values that are too large to keep in the table rows proper.
<structname>pg_class</>.<structfield>reltoastrelid</> links from a table to
its TOAST table, if any.
</para>
<para>
Tablespaces make the scenario more complicated. Each non-default tablespace
has a symbolic link inside the <varname>PGDATA</><filename>/pg_tblspc</>
directory, which points to the physical tablespace directory (as specified in
its <command>CREATE TABLESPACE</> command). The symbolic link is named after
the tablespace's OID. Inside the physical tablespace directory there is
a subdirectory for each database that has elements in the tablespace, named
after the database's OID. Tables within that directory follow the filenode
naming scheme. The <literal>pg_default</> tablespace is not accessed through
<filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/base</>. Similarly, the <literal>pg_global</>
tablespace is not accessed through <filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/global</>.
</para>
</chapter>