doc: add section about heap-only tuples (HOT)

Reported-by: Jonathan S. Katz Discussion: https://postgr.es/m/c59ffbd5-96ac-a5a5-a401-14f627ca1405@postgresql.org Backpatch-through: 11
2022-08-12 15:05:12 -04:00 · 2022-08-12 15:05:12 -04:00 · a9885f2c77
commit a9885f2c77
parent a4a24feff4
9 changed files with 87 additions and 12 deletions
--- a/doc/src/sgml/acronyms.sgml
+++ b/doc/src/sgml/acronyms.sgml
@ -299,9 +299,7 @@
    <term><acronym>HOT</acronym></term>
    <listitem>
     <para>
-      <ulink
+      <link linkend="storage-hot">Heap-Only Tuples</link>
      url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD">Heap-Only
      Tuples</ulink>
     </para>
    </listitem>
   </varlistentry>
--- a/doc/src/sgml/btree.sgml
+++ b/doc/src/sgml/btree.sgml
@ -708,8 +708,9 @@ options(<replaceable>relopts</replaceable> <type>local_relopts *</type>) returns
   entry.  <quote>Version duplicates</quote> may sometimes accumulate
   and adversely affect query latency and throughput.  This typically
   occurs with <command>UPDATE</command>-heavy workloads where most
-   individual updates cannot apply the <acronym>HOT</acronym>
+   individual updates cannot apply the
-   optimization (often because at least one indexed column gets
+   <link linkend="storage-hot"><acronym>HOT</acronym> optimization</link>
   (often because at least one indexed column gets
   modified, necessitating a new set of index tuple versions &mdash;
   one new tuple for <emphasis>each and every</emphasis> index).  In
   effect, B-Tree deduplication ameliorates index bloat caused by
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@ -4287,7 +4287,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
      <para>
       If true, queries must not use the index until the <structfield>xmin</structfield>
       of this <structname>pg_index</structname> row is below their <symbol>TransactionXmin</symbol>
-       event horizon, because the table may contain broken HOT chains with
+       event horizon, because the table may contain broken <link linkend="storage-hot">HOT chains</link> with
       incompatible rows that they can see
      </para></entry>
     </row>
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@ -4195,7 +4195,8 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
      <listitem>
       <para>
        Specifies the number of transactions by which <command>VACUUM</command> and
-        <acronym>HOT</acronym> updates will defer cleanup of dead row versions. The
+        <link linkend="storage-hot"><acronym>HOT</acronym> updates</link>
        will defer cleanup of dead row versions. The
        default is zero transactions, meaning that dead row versions can be
        removed as soon as possible, that is, as soon as they are no longer
        visible to any open transaction.  You may wish to set this to a
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@ -45,7 +45,8 @@
   extant versions of the same logical row; to an index, each tuple is
   an independent object that needs its own index entry.  Thus, an
   update of a row always creates all-new index entries for the row, even if
-   the key values did not change.  (HOT tuples are an exception to this
+   the key values did not change.  (<link linkend="storage-hot">HOT
   tuples</link> are an exception to this
   statement; but indexes do not deal with those, either.)  Index entries for
   dead tuples are reclaimed (by vacuuming) when the dead tuples themselves
   are reclaimed.
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@ -103,7 +103,9 @@ CREATE INDEX test1_id_index ON test1 (id);
  <para>
   After an index is created, the system has to keep it synchronized with the
-   table.  This adds overhead to data manipulation operations.
+   table.  This adds overhead to data manipulation operations.  Indexes can
   also prevent the creation of <link linkend="storage-hot">heap-only
   tuples</link>.
   Therefore indexes that are seldom or never used in queries
   should be removed.
  </para>
@ -733,7 +735,7 @@ CREATE INDEX people_names ON people ((first_name || ' ' || last_name));
  <para>
   Index expressions are relatively expensive to maintain, because the
   derived expression(s) must be computed for each row insertion
-   and non-HOT update.  However, the index expressions are
+   and <link linkend="storage-hot">non-HOT update.</link>  However, the index expressions are
   <emphasis>not</emphasis> recomputed during an indexed search, since they are
   already stored in the index.  In both examples above, the system
   sees the query as just <literal>WHERE indexedcolumn = 'constant'</literal>
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@ -3721,7 +3721,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
       <structfield>n_tup_upd</structfield> <type>bigint</type>
      </para>
      <para>
-       Number of rows updated (includes HOT updated rows)
+       Number of rows updated (includes <link linkend="storage-hot">HOT updated rows</link>)
      </para></entry>
     </row>
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@ -1357,7 +1357,9 @@ WITH ( MODULUS <replaceable class="parameter">numeric_literal</replaceable>, REM
      to the indicated percentage; the remaining space on each page is
      reserved for updating rows on that page.  This gives <command>UPDATE</command>
      a chance to place the updated copy of a row on the same page as the
-      original, which is more efficient than placing it on a different page.
+      original, which is more efficient than placing it on a different
      page, and makes <link linkend="storage-hot">heap-only tuple
      updates</link> more likely.
      For a table whose entries are never updated, complete packing is the
      best choice, but in heavily updated tables smaller fillfactors are
      appropriate.  This parameter cannot be set for TOAST tables.
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@ -1070,4 +1070,74 @@ data. Empty in ordinary tables.</entry>
 </sect2>
 </sect1>
 <sect1 id="storage-hot">
 <title>Heap-Only Tuples (<acronym>HOT</acronym>)</title>
 <para>
  To allow for high concurrency, <productname>PostgreSQL</productname>
  uses <link linkend="mvcc-intro">multiversion concurrency
  control</link> (<acronym>MVCC</acronym>) to store rows.  However,
  <acronym>MVCC</acronym> has some downsides for update queries.
  Specifically, updates require new versions of rows to be added to
  tables.  This can also require new index entries for each updated row,
  and removal of old versions of rows and their index entries can be
  expensive.
 </para>
 <para>
  To help reduce the overhead of updates,
  <productname>PostgreSQL</productname> has an optimization called
  heap-only tuples (<acronym>HOT</acronym>).  This optimization is
  possible when:
  <itemizedlist>
   <listitem>
    <para>
     The update does not modify any columns referenced by the table's
     indexes, including expression and partial indexes.
     </para>
   </listitem>
   <listitem>
    <para>
     There is sufficient free space on the page containing the old row
     for the updated row.
    </para>
   </listitem>
  </itemizedlist>
  In such cases, heap-only tuples provide two optimizations:
  <itemizedlist>
   <listitem>
    <para>
     New index entries are not needed to represent updated rows.
    </para>
   </listitem>
   <listitem>
    <para>
     Old versions of updated rows can be completely removed during normal
     operation, including <command>SELECT</command>s, instead of requiring
     periodic vacuum operations.  (This is possible because indexes
     do not reference their <link linkend="storage-page-layout">page
     item identifiers</link>.)
    </para>
   </listitem>
  </itemizedlist>
 </para>
 <para>
  In summary, heap-only tuple updates can only be created
  if columns used by indexes are not updated.  You can
  increase the likelihood of sufficient page space for
  <acronym>HOT</acronym> updates by decreasing a table's <link
  linkend="sql-createtable"><literal>fillfactor</literal></link>.
  If you don't, <acronym>HOT</acronym> updates will still happen because
  new rows will naturally migrate to new pages and existing pages with
  sufficient free space for new row versions.  The system view <link
  linkend="monitoring-pg-stat-all-tables-view">pg_stat_all_tables</link>
  allows monitoring of the occurrence of HOT and non-HOT updates.
 </para>
 </sect1>
 </chapter>