<HTML>
<HEAD>
	<TITLE>The POSTGRES95 User Manual - THE QUERY LANGUAGE</TITLE>
</HEAD>

<BODY>

<font size=-1>
<A HREF="pg95user.html">[ TOC ]</A> 
<A HREF="xaggr.html">[ Previous ]</A> 
<A HREF="libpq.html">[ Next ]</A> 
</font>
<HR>
<H1>11.  INTERFACING EXTENSIONS TO INDICES</H1>
<HR>
     The procedures described thus far let you define a  new
     type,  new  functions  and  new operators.  However, we
     cannot yet define a secondary index (such as a  <B>B-tree</B>,
     <B>R-tree</B>  or  hash  access method) over a new type or its
     operators.<p>
     
     <A HREF="extend.html#about-the-postgres-system-catalogs">Look back at Figure 3</A>.  
     The right half shows the  catalogs  
     that we must modify in order to tell POSTGRES how
     to use a user-defined type and/or  user-defined  operators 
     with an index (i.e., <CODE>pg_am, pg_amop, pg_amproc</CODE> and
     <CODE>pg_opclass</CODE>).  Unfortunately, there is no simple command
     to  do  this.   We will demonstrate how to modify these
     catalogs through a  running  example:  a  new  operator
     class  for the <B>B-tree</B> access method that sorts integers
     in ascending absolute value order.<p>
     
     The <CODE>pg_am</CODE> class contains one instance for  every  user
     defined  access  method.   Support  for the heap access
     method is built into POSTGRES, but every  other  access
     method is described here.  The schema is
<p>
<center>
<table border=1>
<tr>
  <td>amname       </td><td> name of the access method                                     </td>
</tr>
<td>amowner        </td><td> object id of the owner's instance in pg_user                  </td>
</tr>
<tr>
<td>amkind         </td><td> not used at present, but set to 'o' as a place holder         </td>
</tr>
<tr>
<td>amstrategies   </td><td> number of strategies for this access method (see below)       </td>
</tr>
<tr>
<td>amsupport      </td><td> number of support routines for this access method (see below) </td>
</tr>
<tr>
<td>amgettuple<br>
    aminsert<br>
    ...</td>
<td>procedure  identifiers  for  interface routines to the access
    method.  For example, regproc ids for opening,  closing,  and
    getting instances from the access method appear here. </td>
</tr>
</table>
</center>

<p>

     The  <B>object  ID</B>  of  the instance in <CODE>pg_am</CODE> is used as a
     foreign key in lots of other classes.  You  don't  need
     to  add a new instance to this class; all you're interested in 
     is the <B>object ID</B> of the access method instance
     you want to extend:

<pre>         SELECT oid FROM pg_am WHERE amname = 'btree'

         +----+
         |oid |
         +----+
         |403 |
         +----+
</pre>

     The  <CODE>amstrategies</CODE>  attribute exists to standardize 
     comparisons  across  data  types.   For  example,  <B>B-tree</B>s
     impose  a  strict  ordering on keys, lesser to greater.
     Since POSTGRES allows the  user  to  define  operators,
     POSTGRES  cannot look at the name of an operator (eg, &gt;
     or &lt;) and tell what kind of comparison it is.  In fact,
     some  access  methods don't impose any ordering at all.
     For example, <B>R-tree</B>s  express  a  rectangle-containment
     relationship, whereas a hashed data structure expresses
     only bitwise similarity based on the value  of  a  hash
     function.  POSTGRES needs some consistent way of taking
     a qualification in your query, looking at the  operator
     and  then  deciding  if  a  usable  index exists.  This
     implies that POSTGRES needs to know, for example,  that
     the  &lt;=  and  &gt; operators partition a <B>B-tree</B>.  POSTGRES
     uses strategies to express these relationships  between
     operators and the way they can be used to scan indices.<p>
     
     Defining a new set of strategies is beyond the scope of
     this  discussion,  but we'll explain how <B>B-tree</B> strategies 
     work because you'll need to know that to add a new
     operator  class.   In the <CODE>pg_am</CODE> class, the amstrategies
     attribute is the number of strategies defined for  this
     access  method.   For <B>B-tree</B>s, this number is 5.  These
     strategies correspond to
<p>

<center>     
<table border=1>
<tr>
                   <td>less than             </td><td> 1 </td>
</tr>
<tr>
                   <td>less than or equal    </td><td> 2 </td>
</tr>
<tr>
                   <td>equal                 </td><td> 3 </td>
</tr>
<tr>
                   <td>greater than or equal </td><td> 4 </td>
</tr>
<tr>
                   <td>greater than          </td><td> 5 </td>
</tr>
</table>
</center>
<p>

     The idea is that you'll need to add  procedures  corresponding  
     to the comparisons above to the <CODE>pg_amop</CODE> relation 
     (see below).  The access method code can use these
     strategy  numbers,  regardless  of data type, to figure
     out how to partition the <B>B-tree</B>,  compute  selectivity,
     and  so  on.   Don't  worry about the details of adding
     procedures yet; just understand that there  must  be  a
     set  of these procedures for <CODE>int2, int4, oid,</CODE> and every
     other data type on which a <B>B-tree</B> can operate.
<p>
     Sometimes, strategies aren't enough information for the
     system  to figure out how to use an index.  Some access
     methods require other  support  routines  in  order  to
     work.   For  example,  the <B>B-tree</B> access method must be
     able to compare two keys and determine whether  one  is
     greater  than, equal to, or less than the other.  
     Similarly, the <B>R-tree</B> access method must be able to compute
     intersections,  unions, and sizes of rectangles.  These
     operations do not correspond to user qualifications  in
     SQL  queries;  they are administrative routines used by
     the access methods, internally.<p>
     
     In order to manage  diverse  support  routines  
     consistently   across  all  POSTGRES  access  methods,  <CODE>pg_am</CODE>
     includes an attribute called <CODE>amsupport</CODE>.  This attribute
     records  the  number  of  support  routines  used by an
     access method.  For <B>B-tree</B>s, this number is one --  the
     routine  to  take  two  keys  and  return -1, 0, or +1,
     depending on whether the first key is less than,  equal
     to, or greater than the second.<A HREF="#8"><font size=-1>[8]</font></A><p>
     
     The <CODE>amstrategies</CODE> entry in pg_am is just the  number  of
     strategies  defined  for the access method in question.
     The procedures for less than, less  equal,  and  so  on
     don't  appear  in  <CODE>pg_am</CODE>.  Similarly, <CODE>amsupport</CODE> is just
     the number of support routines required by  the  access
     method.  The actual routines are listed elsewhere.<p>
     
     The  next  class of interest is pg_opclass.  This class
     exists only to  associate  a  name  with  an  oid.   In
     pg_amop,  every <B>B-tree</B> operator class has a set of 
     procedures,  one  through  five,  above.   Some   existing
     opclasses  are  <CODE>int2_ops,  int4_ops,  and oid_ops</CODE>.  You
     need to add an instance with  your  opclass  name  (for
     example,  <CODE>complex_abs_ops</CODE>)  to  <CODE>pg_opclass</CODE>.  The <CODE>oid</CODE> of
     this instance is a foreign key in other classes.
     
<pre>         INSERT INTO pg_opclass (opcname) VALUES ('complex_abs_ops');

         SELECT oid, opcname
           FROM pg_opclass
           WHERE opcname = 'complex_abs_ops';

         +------+--------------+
         |oid   | opcname      |
         +------+--------------+
         |17314 | int4_abs_ops |
         +------+--------------+
</pre>

     Note that the oid for your <CODE>pg_opclass</CODE> instance will  be
     different!   You should substitute your value for 17314
     wherever it appears in this discussion.<p>
     
     So now we have an access method and an operator  class.
     We  still  need  a  set of operators; the procedure for
     defining operators was discussed earlier in  this  manual.   
     For  the  complex_abs_ops  operator  class on Btrees, 
     the operators we require are:
     
<pre>         absolute value less-than
         absolute value less-than-or-equal
         absolute value equal
         absolute value greater-than-or-equal
         absolute value greater-than
</pre>

     Suppose the code that implements the functions  defined
     is stored in the file
     
<pre>
         /usr/local/postgres95/src/tutorial/complex.c
</pre>

     Part  of  the  code  look like this: (note that we will
     only show the equality operator for  the  rest  of  the
     examples.  The  other  four operators are very similar.
     Refer to <CODE>complex.c</CODE> or <CODE>complex.sql</CODE> for the details.)

<pre>         #define Mag(c) ((c)-&gt;x&#42;(c)-&gt;x + (c)-&gt;y&#42;(c)-&gt;y)

         bool
         complex_abs_eq(Complex &#42;a, Complex &#42;b)
         {
             double amag = Mag(a), bmag = Mag(b);
             return (amag==bmag);
         }
</pre>

     There are a couple of important things that are happening below.<p>
     
     First, note that operators for less-than, less-than-or
     equal, equal, greater-than-or-equal,  and  greater-than
     for <CODE>int4</CODE> are being defined.  All of these operators are
     already defined for <CODE>int4</CODE> under the names &lt;, &lt;=, =,  &gt;=,
     and  &gt;.   The  new  operators  behave  differently,  of
     course.  In order to guarantee that POSTGRES uses these
     new operators rather than the old ones, they need to be
     named differently from the old ones.   This  is  a  key
     point: you can overload operators in POSTGRES, but only
     if the operator isn't already defined for the  argument
     types.   That  is,  if  you  have  &lt; defined for (int4,
     int4), you can't define it again.   POSTGRES  does  not
     check  this  when you define your operator, so be careful.  
     To avoid this problem, odd names will be used for
     the operators.  If you get this wrong, the access methods 
     are likely to crash when you try to do scans.<p>
     
     The other important point  is  that  all  the  operator
     functions  return  Boolean  values.  The access methods
     rely on this fact.  (On the  other  hand,  the  support
     function  returns whatever the particular access method
     expects -- in this case, a signed integer.)
     The final routine in the file is the "support  routine"
     mentioned  when we discussed the amsupport attribute of
     the <CODE>pg_am</CODE> class.  We will use this later on.  For  now,
     ignore it.

<pre>         CREATE FUNCTION complex_abs_eq(complex, complex)
              RETURNS bool
              AS '/usr/local/postgres95/tutorial/obj/complex.so'
              LANGUAGE 'c';
</pre>

     Now  define the operators that use them.  As noted, the
     operator names must be unique among all operators  that
     take  two <CODE>int4</CODE> operands.  In order to see if the 
     operator names listed below are taken, we can do a query  on
     <CODE>pg_operator</CODE>:
     
<pre>         /&#42;
          &#42; this query uses the regular expression operator (~)
          &#42; to find three-character operator names that end in
          &#42; the character &amp;
          &#42;/
         SELECT &#42;
         FROM pg_operator
         WHERE oprname ~ '^..&amp;&#36;'::text;
</pre>

     to  see  if  your name is taken for the types you want.
     The important things here are the procedure (which  are
     the  <B>C</B> functions defined above) and the restriction and
     join selectivity functions.  You should  just  use  the
     ones  used  below--note  that  there are different such
     functions for the less-than,  equal,  and  greater-than
     cases.   These  must  be supplied, or the access method
     will crash when it tries  to  use  the  operator.   You
     should  copy  the  names for restrict and join, but use
     the procedure names you defined in the last step.
     
<pre>         CREATE OPERATOR = (
            leftarg = complex, rightarg = complex, procedure = complex_abs_eq,
            restrict = eqsel, join = eqjoinsel
         )
</pre>

     Notice that five operators corresponding to less,  less
     equal, equal, greater, and greater equal are defined.<p>
     
     We're just about finished. the last thing we need to do
     is to update the <CODE>pg_amop</CODE> relation.  To do this, we need
     the following attributes:
     <p>
     
<center>
<table border=1>
              <td>amopid      </td><td> the <CODE>oid</CODE> of the <CODE>pg_am</CODE> instance for  B-tree 
                                        (== 403, see above) </td>
<tr>
</tr>
              <td>amopclaid   </td><td> the   <CODE>oid</CODE>   of   the 
                                        <CODE>pg_opclass</CODE>  instance for <CODE>int4_abs_ops</CODE> (== 
                                        whatever you got instead  of <CODE>17314</CODE>, see above)</td>
<tr>
</tr>
              <td>amopopr     </td><td> the <CODE>oid</CODE>s of the  operators  for the opclass  (which  we'll 
                                        get in just  a minute) </td>
<tr>
</tr>
              <td>amopselect, amopnpages </td><td> cost functions.</td>
</tr>
</table>
</center>
<p>
     The cost functions are used by the query  optimizer  to
     decide  whether  or not to use a given index in a scan.
     Fortunately, these already exist.   The  two  functions
     we'll use are <CODE>btreesel</CODE>, which estimates the selectivity
     of the <B>B-tree</B>, and <CODE>btreenpage</CODE>, which estimates the 
     number of pages a search will touch in the tree.<p>
     
     So  we  need the <CODE>oid</CODE>s of the operators we just defined.
     We'll look up the names of all the operators that  take
     two <CODE>int4</CODE>s, and pick ours out:
     
<pre>         SELECT o.oid AS opoid, o.oprname
         INTO TABLE complex_ops_tmp
         FROM pg_operator o, pg_type t
         WHERE o.oprleft = t.oid and o.oprright = t.oid
              and t.typname = 'complex';

     which returns:

         +------+---------+
         |oid   | oprname |
         +------+---------+
         |17321 | &lt;       |
         +------+---------+
         |17322 | &lt;=      |
         +------+---------+
         |17323 |  =      |
         +------+---------+
         |17324 | &gt;=      |
         +------+---------+
         |17325 | &gt;       |
         +------+---------+
</pre>

     (Again,  some of your <CODE>oid</CODE> numbers will almost certainly
     be different.)  The operators we are interested in  are
     those  with  <CODE>oid</CODE>s  17321 through 17325.  The values you
     get will probably be different, and you should  
     substitute  them  for  the  values below.  We can look at the
     operator names and pick out the ones we just added.<p>
     
     Now we're ready to update <CODE>pg_amop</CODE> with our new operator
     class.  The most important thing in this entire 
     discussion is that the operators are ordered, from less equal
     through   greater   equal,  in  <CODE>pg_amop</CODE>.   We  add  the
     instances we need:

<pre>          INSERT INTO pg_amop (amopid, amopclaid, amopopr, amopstrategy,
                              amopselect, amopnpages)
              SELECT am.oid, opcl.oid, c.opoid, 3,
                 'btreesel'::regproc, 'btreenpage'::regproc
              FROM pg_am am, pg_opclass opcl, complex_ops_tmp c
              WHERE amname = 'btree' and opcname = 'complex_abs_ops'
                 and c.oprname = '=';
</pre>

     Note the order: "less than" is 1, "less than or  equal"
     is  2,  "equal" is 3, "greater than or equal" is 4, and
     "greater than" is 5.<p>
     
     The last step (finally!) is registration of  the  
     "support routine" previously described in our discussion of
     <CODE>pg_am</CODE>.  The <CODE>oid</CODE> of this support routine  is  stored  in
     the <CODE>pg_amproc</CODE> class, keyed by the access method <CODE>oid</CODE> and
     the operator class <CODE>oid</CODE>.  First, we need to register the
     function  in  POSTGRES  (recall  that we put the <B>C</B> code
     that implements this routine in the bottom of the  file
     in which we implemented the operator routines):

<pre>         CREATE FUNCTION int4_abs_cmp(int4, int4)
               RETURNS int4
               AS '/usr/local/postgres95/tutorial/obj/complex.so'
               LANGUAGE 'c';

         SELECT oid, proname FROM pg_proc WHERE prname = 'int4_abs_cmp';

         +------+--------------+
         |oid   | proname      |
         +------+--------------+
         |17328 | int4_abs_cmp |
         +------+--------------+
</pre>
     (Again,  your <CODE>oid</CODE> number will probably be different and
     you should substitute the value you see for  the  value
     below.)   Recalling  that  the <B>B-tree</B> instance's oid is
     403 and that of <CODE>int4_abs_ops</CODE> is 17314, we can  add  the
     new instance as follows:
     
<pre>         INSERT INTO pg_amproc (amid, amopclaid, amproc, amprocnum)
              VALUES ('403'::oid,        -- btree oid
                      '17314'::oid,      --  pg_opclass tuple
                      '17328'::oid,      -- new pg_proc oid
                      '1'::int2);
</pre>
<p>
<HR>
<A NAME="8"><B>[8]</B></A>  Strictly  speaking, this routine can return a negative
number (&lt; 0), 0, or a non-zero positive number (&gt; 0).
<HR>
<font size=-1>
<A HREF="pg95user.html">[ TOC ]</A> 
<A HREF="xaggr.html">[ Previous ]</A> 
<A HREF="libpq.html">[ Next ]</A> 
</font>
</BODY>
</HTML>