Doc: move info for btree opclass implementors into main documentation.
Up to now, useful info for writing a new btree opclass has been buried in the backend's nbtree/README file. Let's move it into the SGML docs, in preparation for extending it with info about "in_range" functions in the upcoming window RANGE patch. To do this, I chose to create a new chapter for btree indexes in Part VII (Internals), parallel to the chapters that exist for the newer index AMs. This is a pretty short chapter as-is. At some point somebody might care to flesh it out with more detail about btree internals, but that is beyond the scope of my ambition for today. Discussion: https://postgr.es/m/23141.1517874668@sss.pgh.pa.us
This commit is contained in:
parent
f069c91a57
commit
3785f7eee3
267
doc/src/sgml/btree.sgml
Normal file
267
doc/src/sgml/btree.sgml
Normal file
@ -0,0 +1,267 @@
|
||||
<!-- doc/src/sgml/btree.sgml -->
|
||||
|
||||
<chapter id="btree">
|
||||
<title>B-Tree Indexes</title>
|
||||
|
||||
<indexterm>
|
||||
<primary>index</primary>
|
||||
<secondary>B-Tree</secondary>
|
||||
</indexterm>
|
||||
|
||||
<sect1 id="btree-intro">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>
|
||||
<productname>PostgreSQL</productname> includes an implementation of the
|
||||
standard <acronym>btree</acronym> (multi-way binary tree) index data
|
||||
structure. Any data type that can be sorted into a well-defined linear
|
||||
order can be indexed by a btree index. The only limitation is that an
|
||||
index entry cannot exceed approximately one-third of a page (after TOAST
|
||||
compression, if applicable).
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Because each btree operator class imposes a sort order on its data type,
|
||||
btree operator classes (or, really, operator families) have come to be
|
||||
used as <productname>PostgreSQL</productname>'s general representation
|
||||
and understanding of sorting semantics. Therefore, they've acquired
|
||||
some features that go beyond what would be needed just to support btree
|
||||
indexes, and parts of the system that are quite distant from the
|
||||
btree AM make use of them.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="btree-behavior">
|
||||
<title>Behavior of B-Tree Operator Classes</title>
|
||||
|
||||
<para>
|
||||
As shown in <xref linkend="xindex-btree-strat-table"/>, a btree operator
|
||||
class must provide five comparison operators,
|
||||
<literal><</literal>,
|
||||
<literal><=</literal>,
|
||||
<literal>=</literal>,
|
||||
<literal>>=</literal> and
|
||||
<literal>></literal>.
|
||||
One might expect that <literal><></literal> should also be part of
|
||||
the operator class, but it is not, because it would almost never be
|
||||
useful to use a <literal><></literal> WHERE clause in an index
|
||||
search. (For some purposes, the planner treats <literal><></literal>
|
||||
as associated with a btree operator class; but it finds that operator via
|
||||
the <literal>=</literal> operator's negator link, rather than
|
||||
from <structname>pg_amop</structname>.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
When several data types share near-identical sorting semantics, their
|
||||
operator classes can be grouped into an operator family. Doing so is
|
||||
advantageous because it allows the planner to make deductions about
|
||||
cross-type comparisons. Each operator class within the family should
|
||||
contain the single-type operators (and associated support functions)
|
||||
for its input data type, while cross-type comparison operators and
|
||||
support functions are <quote>loose</quote> in the family. It is
|
||||
recommendable that a complete set of cross-type operators be included
|
||||
in the family, thus ensuring that the planner can represent any
|
||||
comparison conditions that it deduces from transitivity.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There are some basic assumptions that a btree operator family must
|
||||
satisfy:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
An <literal>=</literal> operator must be an equivalence relation; that
|
||||
is, for all non-null values <replaceable>A</replaceable>,
|
||||
<replaceable>B</replaceable>, <replaceable>C</replaceable> of the
|
||||
data type:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<replaceable>A</replaceable> <literal>=</literal>
|
||||
<replaceable>A</replaceable> is true
|
||||
(<firstterm>reflexive law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
if <replaceable>A</replaceable> <literal>=</literal>
|
||||
<replaceable>B</replaceable>,
|
||||
then <replaceable>B</replaceable> <literal>=</literal>
|
||||
<replaceable>A</replaceable>
|
||||
(<firstterm>symmetric law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
if <replaceable>A</replaceable> <literal>=</literal>
|
||||
<replaceable>B</replaceable> and <replaceable>B</replaceable>
|
||||
<literal>=</literal> <replaceable>C</replaceable>,
|
||||
then <replaceable>A</replaceable> <literal>=</literal>
|
||||
<replaceable>C</replaceable>
|
||||
(<firstterm>transitive law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
A <literal><</literal> operator must be a strong ordering relation;
|
||||
that is, for all non-null values <replaceable>A</replaceable>,
|
||||
<replaceable>B</replaceable>, <replaceable>C</replaceable>:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<replaceable>A</replaceable> <literal><</literal>
|
||||
<replaceable>A</replaceable> is false
|
||||
(<firstterm>irreflexive law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
if <replaceable>A</replaceable> <literal><</literal>
|
||||
<replaceable>B</replaceable>
|
||||
and <replaceable>B</replaceable> <literal><</literal>
|
||||
<replaceable>C</replaceable>,
|
||||
then <replaceable>A</replaceable> <literal><</literal>
|
||||
<replaceable>C</replaceable>
|
||||
(<firstterm>transitive law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Furthermore, the ordering is total; that is, for all non-null
|
||||
values <replaceable>A</replaceable>, <replaceable>B</replaceable>:
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
exactly one of <replaceable>A</replaceable> <literal><</literal>
|
||||
<replaceable>B</replaceable>, <replaceable>A</replaceable>
|
||||
<literal>=</literal> <replaceable>B</replaceable>, and
|
||||
<replaceable>B</replaceable> <literal><</literal>
|
||||
<replaceable>A</replaceable> is true
|
||||
(<firstterm>trichotomy law</firstterm>)
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
(The trichotomy law justifies the definition of the comparison support
|
||||
function, of course.)
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
The other three operators are defined in terms of <literal>=</literal>
|
||||
and <literal><</literal> in the obvious way, and must act consistently
|
||||
with them.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For an operator family supporting multiple data types, the above laws must
|
||||
hold when <replaceable>A</replaceable>, <replaceable>B</replaceable>,
|
||||
<replaceable>C</replaceable> are taken from any data types in the family.
|
||||
The transitive laws are the trickiest to ensure, as in cross-type
|
||||
situations they represent statements that the behaviors of two or three
|
||||
different operators are consistent.
|
||||
As an example, it would not work to put <type>float8</type>
|
||||
and <type>numeric</type> into the same operator family, at least not with
|
||||
the current semantics that <type>numeric</type> values are converted
|
||||
to <type>float8</type> for comparison to a <type>float8</type>. Because
|
||||
of the limited accuracy of <type>float8</type>, this means there are
|
||||
distinct <type>numeric</type> values that will compare equal to the
|
||||
same <type>float8</type> value, and thus the transitive law would fail.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Another requirement for a multiple-data-type family is that any implicit
|
||||
or binary-coercion casts that are defined between data types included in
|
||||
the operator family must not change the associated sort ordering.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
It should be fairly clear why a btree index requires these laws to hold
|
||||
within a single data type: without them there is no ordering to arrange
|
||||
the keys with. Also, index searches using a comparison key of a
|
||||
different data type require comparisons to behave sanely across two
|
||||
data types. The extensions to three or more data types within a family
|
||||
are not strictly required by the btree index mechanism itself, but the
|
||||
planner relies on them for optimization purposes.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="btree-support-funcs">
|
||||
<title>B-Tree Support Functions</title>
|
||||
|
||||
<para>
|
||||
As shown in <xref linkend="xindex-btree-support-table"/>, btree defines
|
||||
one required and one optional support function.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
For each combination of data types that a btree operator family provides
|
||||
comparison operators for, it must provide a comparison support function,
|
||||
registered in <structname>pg_amproc</structname> with support function
|
||||
number 1 and
|
||||
<structfield>amproclefttype</structfield>/<structfield>amprocrighttype</structfield>
|
||||
equal to the left and right data types for the comparison (i.e., the
|
||||
same data types that the matching operators are registered with
|
||||
in <structname>pg_amop</structname>).
|
||||
The comparison function must take two non-null values
|
||||
<replaceable>A</replaceable> and <replaceable>B</replaceable> and
|
||||
return an <type>int32</type> value that
|
||||
is <literal><</literal> <literal>0</literal>, <literal>0</literal>,
|
||||
or <literal>></literal> <literal>0</literal>
|
||||
when <replaceable>A</replaceable> <literal><</literal>
|
||||
<replaceable>B</replaceable>, <replaceable>A</replaceable>
|
||||
<literal>=</literal> <replaceable>B</replaceable>,
|
||||
or <replaceable>A</replaceable> <literal>></literal>
|
||||
<replaceable>B</replaceable>, respectively. The function must not
|
||||
return <literal>INT_MIN</literal> for the <replaceable>A</replaceable>
|
||||
<literal><</literal> <replaceable>B</replaceable> case,
|
||||
since the value may be negated before being tested for sign. A null
|
||||
result is disallowed, too.
|
||||
See <filename>src/backend/access/nbtree/nbtcompare.c</filename> for
|
||||
examples.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
If the compared values are of a collatable data type, the appropriate
|
||||
collation OID will be passed to the comparison support function, using
|
||||
the standard <function>PG_GET_COLLATION()</function> mechanism.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Optionally, a btree operator family may provide <firstterm>sort
|
||||
support</firstterm> function(s), registered under support function number
|
||||
2. These functions allow implementing comparisons for sorting purposes
|
||||
in a more efficient way than naively calling the comparison support
|
||||
function. The APIs involved in this are defined in
|
||||
<filename>src/include/utils/sortsupport.h</filename>.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="btree-implementation">
|
||||
<title>Implementation</title>
|
||||
|
||||
<para>
|
||||
An introduction to the btree index implementation can be found in
|
||||
<filename>src/backend/access/nbtree/README</filename>.
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
@ -83,6 +83,7 @@
|
||||
<!ENTITY bki SYSTEM "bki.sgml">
|
||||
<!ENTITY catalogs SYSTEM "catalogs.sgml">
|
||||
<!ENTITY geqo SYSTEM "geqo.sgml">
|
||||
<!ENTITY btree SYSTEM "btree.sgml">
|
||||
<!ENTITY gist SYSTEM "gist.sgml">
|
||||
<!ENTITY spgist SYSTEM "spgist.sgml">
|
||||
<!ENTITY gin SYSTEM "gin.sgml">
|
||||
|
@ -252,6 +252,7 @@
|
||||
&geqo;
|
||||
&indexam;
|
||||
&generic-wal;
|
||||
&btree;
|
||||
&gist;
|
||||
&spgist;
|
||||
&gin;
|
||||
|
@ -35,7 +35,7 @@
|
||||
<productname>PostgreSQL</productname>, but all index methods are
|
||||
described in <classname>pg_am</classname>. It is possible to add a
|
||||
new index access method by writing the necessary code and
|
||||
then creating a row in <classname>pg_am</classname> — but that is
|
||||
then creating an entry in <classname>pg_am</classname> — but that is
|
||||
beyond the scope of this chapter (see <xref linkend="indexam"/>).
|
||||
</para>
|
||||
|
||||
@ -404,6 +404,8 @@
|
||||
B-trees require a single support function, and allow a second one to be
|
||||
supplied at the operator class author's option, as shown in <xref
|
||||
linkend="xindex-btree-support-table"/>.
|
||||
The requirements for these support functions are explained further in
|
||||
<xref linkend="btree-support-funcs"/>.
|
||||
</para>
|
||||
|
||||
<table tocentry="1" id="xindex-btree-support-table">
|
||||
@ -426,8 +428,8 @@
|
||||
</row>
|
||||
<row>
|
||||
<entry>
|
||||
Return the addresses of C-callable sort support function(s),
|
||||
as documented in <filename>utils/sortsupport.h</filename> (optional)
|
||||
Return the addresses of C-callable sort support function(s)
|
||||
(optional)
|
||||
</entry>
|
||||
<entry>2</entry>
|
||||
</row>
|
||||
@ -1056,11 +1058,8 @@ ALTER OPERATOR FAMILY integer_ops USING btree ADD
|
||||
|
||||
<para>
|
||||
In a B-tree operator family, all the operators in the family must sort
|
||||
compatibly, meaning that the transitive laws hold across all the data types
|
||||
supported by the family: <quote>if A = B and B = C, then A = C</quote>,
|
||||
and <quote>if A < B and B < C, then A < C</quote>. Moreover, implicit
|
||||
or binary coercion casts between types represented in the operator family
|
||||
must not change the associated sort ordering. For each
|
||||
compatibly, as is specified in detail in <xref linkend="btree-behavior"/>.
|
||||
For each
|
||||
operator in the family there must be a support function having the same
|
||||
two input data types as the operator. It is recommended that a family be
|
||||
complete, i.e., for each combination of data types, all operators are
|
||||
|
@ -623,56 +623,3 @@ routines must treat it accordingly. The actual key stored in the
|
||||
item is irrelevant, and need not be stored at all. This arrangement
|
||||
corresponds to the fact that an L&Y non-leaf page has one more pointer
|
||||
than key.
|
||||
|
||||
Notes to Operator Class Implementors
|
||||
------------------------------------
|
||||
|
||||
With this implementation, we require each supported combination of
|
||||
datatypes to supply us with a comparison procedure via pg_amproc.
|
||||
This procedure must take two nonnull values A and B and return an int32 < 0,
|
||||
0, or > 0 if A < B, A = B, or A > B, respectively. The procedure must
|
||||
not return INT_MIN for "A < B", since the value may be negated before
|
||||
being tested for sign. A null result is disallowed, too. See nbtcompare.c
|
||||
for examples.
|
||||
|
||||
There are some basic assumptions that a btree operator family must satisfy:
|
||||
|
||||
An = operator must be an equivalence relation; that is, for all non-null
|
||||
values A,B,C of the datatype:
|
||||
|
||||
A = A is true reflexive law
|
||||
if A = B, then B = A symmetric law
|
||||
if A = B and B = C, then A = C transitive law
|
||||
|
||||
A < operator must be a strong ordering relation; that is, for all non-null
|
||||
values A,B,C:
|
||||
|
||||
A < A is false irreflexive law
|
||||
if A < B and B < C, then A < C transitive law
|
||||
|
||||
Furthermore, the ordering is total; that is, for all non-null values A,B:
|
||||
|
||||
exactly one of A < B, A = B, and B < A is true trichotomy law
|
||||
|
||||
(The trichotomy law justifies the definition of the comparison support
|
||||
procedure, of course.)
|
||||
|
||||
The other three operators are defined in terms of these two in the obvious way,
|
||||
and must act consistently with them.
|
||||
|
||||
For an operator family supporting multiple datatypes, the above laws must hold
|
||||
when A,B,C are taken from any datatypes in the family. The transitive laws
|
||||
are the trickiest to ensure, as in cross-type situations they represent
|
||||
statements that the behaviors of two or three different operators are
|
||||
consistent. As an example, it would not work to put float8 and numeric into
|
||||
an opfamily, at least not with the current semantics that numerics are
|
||||
converted to float8 for comparison to a float8. Because of the limited
|
||||
accuracy of float8, this means there are distinct numeric values that will
|
||||
compare equal to the same float8 value, and thus the transitive law fails.
|
||||
|
||||
It should be fairly clear why a btree index requires these laws to hold within
|
||||
a single datatype: without them there is no ordering to arrange the keys with.
|
||||
Also, index searches using a key of a different datatype require comparisons
|
||||
to behave sanely across two datatypes. The extensions to three or more
|
||||
datatypes within a family are not strictly required by the btree index
|
||||
mechanism itself, but the planner relies on them for optimization purposes.
|
||||
|
Loading…
x
Reference in New Issue
Block a user