postgres/doc/src/sgml/filelayout.sgml

<!--
$PostgreSQL: pgsql/doc/src/sgml/filelayout.sgml,v 1.1 2004/11/12 21:50:53 tgl Exp $
-->

<chapter id="file-layout">

<title>Database File Layout</title>

<abstract>
<para>
A description of the database physical storage layout.
</para>
</abstract>

<para>
This section provides an overview of the physical format used by
<productname>PostgreSQL</productname> databases.
</para>

<para>
All the data needed for a database cluster is stored within the cluster's data
directory, commonly referred to as <varname>PGDATA</> (after the name of the
environment variable that can be used to define it).  A common location for
<varname>PGDATA</> is <filename>/var/lib/pgsql/data</>.  Multiple clusters,
managed by different postmasters, can exist on the same machine.
</para>

<para>
The <varname>PGDATA</> directory contains several subdirectories and control
files, as shown in <xref linkend="pgdata-contents-table">.  In addition to
these required items, the cluster configuration files
<filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and
<filename>pg_ident.conf</filename> are traditionally stored in
<varname>PGDATA</> (although beginning in
<productname>PostgreSQL</productname> 8.0 it is possible to keep them
elsewhere).
</para>

<table tocentry="1" id="pgdata-contents-table">
<title>Contents of <varname>PGDATA</></title>
<tgroup cols="2">
<thead>
<row>
<entry>
Item
</entry>
<entry>Description</entry>
</row>
</thead>

<tbody>

<row>
 <entry><filename>PG_VERSION</></entry>
 <entry>A file containing the major version number of <productname>PostgreSQL</productname></entry>
</row>

<row>
 <entry><filename>base</></entry>
 <entry>Subdirectory containing per-database subdirectories</entry>
</row>

<row>
 <entry><filename>global</></entry>
 <entry>Subdirectory containing cluster-wide tables, such as
 <structname>pg_database</></entry>
</row>

<row>
 <entry><filename>pg_clog</></entry>
 <entry>Subdirectory containing transaction commit status data</entry>
</row>

<row>
 <entry><filename>pg_subtrans</></entry>
 <entry>Subdirectory containing subtransaction status data</entry>
</row>

<row>
 <entry><filename>pg_tblspc</></entry>
 <entry>Subdirectory containing symbolic links to tablespaces</entry>
</row>

<row>
 <entry><filename>pg_xlog</></entry>
 <entry>Subdirectory containing WAL (Write Ahead Log) files</entry>
</row>

<row>
 <entry><filename>postmaster.opts</></entry>
 <entry>A file recording the command-line options the postmaster was
last started with</entry>
</row>

<row>
 <entry><filename>postmaster.pid</></entry>
 <entry>A lock file recording the current postmaster PID and shared memory
segment ID (not present after postmaster shutdown)</entry>
</row>

</tbody>
</tgroup>
</table>

<para>
For each database in the cluster there is a subdirectory within
<varname>PGDATA</><filename>/base</>, named after the database's OID in
<structname>pg_database</>.  This subdirectory is the default location
for the database's files; in particular, its system catalogs are stored
there.
</para>

<para>
Each table and index is stored in a separate file, named after the table
or index's <firstterm>filenode</> number, which can be found in
<structname>pg_class</>.<structfield>relfilenode</>.
</para>

<caution>
<para>
Note that while a table's filenode often matches its OID, this is
<emphasis>not</> necessarily the case; some operations, like
<command>TRUNCATE</>, <command>REINDEX</>, <command>CLUSTER</> and some forms
of <command>ALTER TABLE</>, can change the filenode while preserving the OID.
Avoid assuming that filenode and table OID are the same.
</para>
</caution>

<para>
When a table or index exceeds 1Gb, it is divided into gigabyte-sized
<firstterm>segments</>.  The first segment's file name is the same as the
filenode; subsequent segments are named filenode.1, filenode.2, etc.
This arrangement avoids problems on platforms that have file size limitations.
The contents of tables and indexes are discussed further in
<xref linkend="page">.
</para>

<para>
A table that has columns with potentially large entries will have an
associated <firstterm>TOAST</> table, which is used for out-of-line storage of
field values that are too large to keep in the table rows proper.
<structname>pg_class</>.<structfield>reltoastrelid</> links from a table to
its TOAST table, if any.
</para>

<para>
Tablespaces make the scenario more complicated.  Each non-default tablespace
has a symbolic link inside the <varname>PGDATA</><filename>/pg_tblspc</>
directory, which points to the physical tablespace directory (as specified in
its <command>CREATE TABLESPACE</> command).  The symbolic link is named after
the tablespace's OID.  Inside the physical tablespace directory there is
a subdirectory for each database that has elements in the tablespace, named
after the database's OID.  Tables within that directory follow the filenode
naming scheme.  The <literal>pg_default</> tablespace is not accessed through
<filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/base</>.  Similarly, the <literal>pg_global</>
tablespace is not accessed through <filename>pg_tblspc</>, but corresponds to
<varname>PGDATA</><filename>/global</>.
</para>

</chapter>