diff --git a/doc/src/sgml/page.sgml b/doc/src/sgml/page.sgml index bb82142e61..7551085dc9 100644 --- a/doc/src/sgml/page.sgml +++ b/doc/src/sgml/page.sgml @@ -22,9 +22,13 @@ refers to data that is stored in PostgreSQL tables. - shows how pages in both normal PostgreSQL tables - and PostgreSQL indexes -(e.g., a B-tree index) are structured. + + shows how pages in both normal + PostgreSQL tables and + PostgreSQL indexes (e.g., a B-tree index) +are structured. This structure is also used for toast tables and sequences. +There are five parts to each page. + @@ -42,114 +46,256 @@ Item + + PageHeaderData + 20 bytes long. Contains general information about the page to allow to access it. + + itemPointerData +List of (offset,length) pairs pointing to the actual item. -filler +Free space +The unallocated space. All new tuples are allocated from here, generally from the end. -itemData... - - - -Unallocated Space - - - -ItemContinuationData +items +The actual items themselves. Different access method have different data here. Special Space - - - -ItemData 2 - - - -ItemData 1 - - - -ItemIdData - - - -PageHeaderData +Access method specific data. Different method store different data. Unused by normal tables.
- + - -The first 8 bytes of each page consists of a page header -(PageHeaderData). -Within the header, the first three 2-byte integer fields -(lower, -upper, -and -special) -represent byte offsets to the start of unallocated space, to the end -of unallocated space, and to the start of special space. -Special space is a region at the end of the page that is allocated at -page initialization time and contains information specific to an -access method. The last 2 bytes of the page header, -opaque, -encode the page size and information on the internal fragmentation of -the page. Page size is stored in each page because frames in the -buffer pool may be subdivided into equal sized pages on a frame by -frame basis within a table. The internal fragmentation information is -used to aid in determining when page reorganization should occur. - + The first 20 bytes of each page consists of a page header + (PageHeaderData). It's format is detailed in . The first two fields deal with WAL + related stuff. This is followed by three 2-byte integer fields + (lower, upper, and + special). These represent byte offsets to the start + of unallocated space, to the end of unallocated space, and to the start of + the special space. + + + + + PageHeaderData Layout + PageHeaderData Layout + + + + Field + Type + Length + Description + + + + + pd_lsn + XLogRecPtr + 6 bytes + LSN: next byte after last byte of xlog + + + pd_sui + StartUpID + 4 bytes + SUI of last changes (currently it's used by heap AM only) + + + pd_lower + LocationIndex + 2 bytes + Offset to start of free space. + + + pd_upper + LocationIndex + 2 bytes + Offset to end of free space. + + + pd_special + LocationIndex + 2 bytes + Offset to start of special space. + + + pd_opaque + OpaqueData + 2 bytes + AM-generic information. Currently just stores the page size. + + + +
- -Following the page header are item identifiers -(ItemIdData). -New item identifiers are allocated from the first four bytes of -unallocated space. Because an item identifier is never moved until it -is freed, its index may be used to indicate the location of an item on -a page. In fact, every pointer to an item -(ItemPointer) -created by PostgreSQL consists of a frame number and an index of an item -identifier. An item identifier contains a byte-offset to the start of -an item, its length in bytes, and a set of attribute bits which affect -its interpretation. - + + Special space is a region at the end of the page that is allocated at page + initialization time and contains information specific to an access method. + The last 2 bytes of the page header, opaque, + currently only stores the page size. Page size is stored in each page + because frames in the buffer pool may be subdivided into equal sized pages + on a frame by frame basis within a table (is this true? - mvo). - -The items themselves are stored in space allocated backwards from -the end of unallocated space. Usually, the items are not interpreted. -However when the item is too long to be placed on a single page or -when fragmentation of the item is desired, the item is divided and -each piece is handled as distinct items in the following manner. The -first through the next to last piece are placed in an item -continuation structure -(ItemContinuationData). -This structure contains -itemPointerData -which points to the next piece and the piece itself. The last piece -is handled normally. - + + + + Following the page header are item identifiers + (ItemIdData). New item identifiers are allocated + from the first four bytes of unallocated space. Because an item + identifier is never moved until it is freed, its index may be used to + indicate the location of an item on a page. In fact, every pointer to an + item (ItemPointer, also know as + CTID) created by + PostgreSQL consists of a frame number and an + index of an item identifier. An item identifier contains a byte-offset to + the start of an item, its length in bytes, and a set of attribute bits + which affect its interpretation. + + + + + + The items themselves are stored in space allocated backwards from the end + of unallocated space. The exact structure varies depending on what the + table is to contain. Sequences and tables both use a structure named + HeapTupleHeaderData, describe below. + + + + + + The final section is the "special section" which may contain anything the + access method wishes to store. Ordinary tables do not use this at all + (indicated by setting the offset to the pagesize). + + + + + + All tuples are structured the same way. A header of around 31 bytes + followed by an optional null bitmask and the data. The header is detailed + below in . The null bitmask is + only present if the HEAP_HASNULL bit is set in the + t_infomask. If it is present it takes up the space + between the end of the header and the beginning of the data, as indicated + by the t_hoff field. In this list of bits, a 1 bit + indicates not-null, a 0 bit is a null. + + + + + HeapTupleHeaderData Layout + HeapTupleHeaderData Layout + + + + Field + Type + Length + Description + + + + + t_oid + Oid + 4 bytes + OID of this tuple + + + t_cmin + CommandId + 4 bytes + insert CID stamp + + + t_cmax + CommandId + 4 bytes + delete CID stamp + + + t_xmin + TransactionId + 4 bytes + insert XID stamp + + + t_xmax + TransactionId + 4 bytes + delete XID stamp + + + t_ctid + ItemPointerData + 6 bytes + current TID of this or newer tuple + + + t_natts + int16 + 2 bytes + number of attributes + + + t_infomask + uint16 + 2 bytes + Various flags + + + t_hoff + uint8 + 1 byte + length of tuple header. Also offset of data. + + + +
+ + + + All the details may be found in src/include/storage/bufpage.h. + + + + + + Interpreting the actual data can only be done with information obtained + from other tables, mostly pg_attribute. The + particular fields are attlen and + attalign. There is no way to directly get a + particular attribute, except when there are only fixed width fields and no + NULLs. All this trickery is wrapped up in the functions + heap_getattr, fastgetattr + and heap_getsysattr. + + + + + To read the data you need to examine each attribute in turn. First check + whether the field is NULL according to the null bitmap. If it is, go to + the next. Then make sure you have the right alignment. If the field is a + fixed width field, then all the bytes are simply placed. If it's a + variable length field (attlen == -1) then it's a bit more complicated, + using the variable length structure varattrib. + Depending on the flags, the data may be either inline, compressed or in + another table (TOAST). + +