ISpell info updated

2003-08-21 15:53:21 +00:00 · 2003-08-21 15:53:21 +00:00 · 38e2bf6283
parent ef38ca9b3d
commit 38e2bf6283
1 changed files with 177 additions and 141 deletions
--- a/contrib/tsearch2/docs/tsearch-V2-intro.html
+++ b/contrib/tsearch2/docs/tsearch-V2-intro.html
@ -1,17 +1,13 @@
-<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head><title>tsearch-v2-intro</title>
+  
+<link type="text/css" rel="stylesheet" href="tsearch-V2-intro_files/tsearch.txt"></head>

-<html>
-<head>
-  <title>tsearch-v2-intro</title>
-<link type="text/css" rel="stylesheet" href="/~megera/postgres/gist/tsearch/tsearch.css">
-</head>

 <body>
  <div class="content">
    <h2>Tsearch2 - Introduction</h2>

-    <p><a href=
-    "http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html">
+    <p><a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html">
    [Online version]</a> of this document is available.</p>

    <p>The tsearch2 module is available to add as an extension to
@ -38,13 +34,11 @@

    <p>The README.tsearch2 file included in the contrib/tsearch2
    directory contains a brief overview and history behind tsearch.
-    This can also be found online <a href=
-    "http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[right
+    This can also be found online <a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/">[right
    here]</a>.</p>

    <p>Further in depth documentation such as a full function
-    reference, and user guide can be found online at the <a href=
-    "http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/">[tsearch
+    reference, and user guide can be found online at the <a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/docs/">[tsearch
    documentation home]</a>.</p>

    <h3>ACKNOWLEDGEMENTS</h3>
@ -105,11 +99,9 @@

    <p>Step one is to download the tsearch V2 module :</p>

-    <p><a href=
-    "http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/">[http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/]</a>
+    <p><a href="http://www.sai.msu.su/%7Emegera/postgres/gist/tsearch/V2/">[http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/]</a>
    (check Development History for latest stable version !)</p>
-    <pre>
-        tar -zxvf tsearch-v2.tar.gz
+    <pre>        tar -zxvf tsearch-v2.tar.gz
        mv tsearch2 PGSQL_SRC/contrib/
        cd PGSQL_SRC/contrib/tsearch2
 </pre>
@ -121,18 +113,15 @@

    <p>Then continue with the regular building and installation
    process</p>
-    <pre>
-        gmake
+    <pre>        gmake
        gmake install
        gmake installcheck
 </pre>

    <p>That is pretty much all you have to do, unless of course you
    get errors. However if you get those, you better go check with
-    the mailing lists over at <a href=
-    "http://www.postgresql.org">http://www.postgresql.org</a> or
-    <a href=
-    "http://openfts.sourceforge.net/">http://openfts.sourceforge.net/</a>
+    the mailing lists over at <a href="http://www.postgresql.org/">http://www.postgresql.org</a> or
+    <a href="http://openfts.sourceforge.net/">http://openfts.sourceforge.net/</a>
    since its never failed for me.</p>

    <p>The directory in the contib/ and the directory from the
@ -151,15 +140,13 @@
    <p>We should create a database to use as an example for the
    remainder of this file. We can call the database "ftstest". You
    can create it from the command line like this:</p>
-    <pre>
-        #createdb ftstest
+    <pre>        #createdb ftstest
 </pre>

    <p>If you thought installation was easy, this next bit is even
    easier. Change to the PGSQL_SRC/contrib/tsearch2 directory and
    type:</p>
-    <pre>
-        psql ftstest &lt; tsearch2.sql
+    <pre>        psql ftstest &lt; tsearch2.sql
 </pre>

    <p>The file "tsearch2.sql" holds all the wonderful little
@ -170,8 +157,7 @@
    pg_ts_cfgmap are added.</p>

    <p>You can check out the tables if you like:</p>
-    <pre>
-        #psql ftstest
+    <pre>        #psql ftstest
        ftstest=# \d
                    List of relations
         Schema |     Name     | Type  |  Owner
@ -188,8 +174,7 @@
    <p>The first thing we can do is try out some of the types that
    are provided for us. Lets look at the tsvector type provided
    for us:</p>
-    <pre>
-        SELECT 'Our first string used today'::tsvector;
+    <pre>        SELECT 'Our first string used today'::tsvector;
                        tsvector
        ---------------------------------------
         'Our' 'used' 'first' 'today' 'string'
@ -199,8 +184,7 @@
    <p>The results are the words used within our string. Notice
    they are not in any particular order. The tsvector type returns
    a string of space separated words.</p>
-    <pre>
-        SELECT 'Our first string used today first string'::tsvector;
+    <pre>        SELECT 'Our first string used today first string'::tsvector;
                            tsvector
        -----------------------------------------------
         'Our' 'used' 'again' 'first' 'today' 'string'
@ -217,8 +201,7 @@
    by the tsearch2 module.</p>

    <p>The function to_tsvector has 3 possible signatures:</p>
-    <pre>
-        to_tsvector(oid, text);
+    <pre>        to_tsvector(oid, text);
        to_tsvector(text, text);
        to_tsvector(text);
 </pre>
@ -228,8 +211,7 @@
    the searchable text is broken up into words (Stemming process).
    Right now we will specify the 'default' configuration. See the
    section on TSEARCH2 CONFIGURATION to learn more about this.</p>
-    <pre>
-        SELECT to_tsvector('default',
+    <pre>        SELECT to_tsvector('default',
                           'Our first string used today first string');
                        to_tsvector
        --------------------------------------------
@ -259,8 +241,7 @@
    <p>If you want to view the output of the tsvector fields
    without their positions, you can do so with the function
    "strip(tsvector)".</p>
-    <pre>
-        SELECT strip(to_tsvector('default',
+    <pre>        SELECT strip(to_tsvector('default',
                     'Our first string used today first string'));
                    strip
        --------------------------------
@ -270,8 +251,7 @@
    <p>If you wish to know the number of unique words returned in
    the tsvector you can do so by using the function
    "length(tsvector)"</p>
-    <pre>
-        SELECT length(to_tsvector('default',
+    <pre>        SELECT length(to_tsvector('default',
                      'Our first string used today first string'));
         length
        --------
@ -282,15 +262,13 @@
    <p>Lets take a look at the function to_tsquery. It also has 3
    signatures which follow the same rational as the to_tsvector
    function:</p>
-    <pre>
-        to_tsquery(oid, text);
+    <pre>        to_tsquery(oid, text);
        to_tsquery(text, text);
        to_tsquery(text);
 </pre>

    <p>Lets try using the function with a single word :</p>
-    <pre>
-        SELECT to_tsquery('default', 'word');
+    <pre>        SELECT to_tsquery('default', 'word');
         to_tsquery
        -----------
         'word'
@ -303,8 +281,7 @@

    <p>Lets attempt to use the function with a string of multiple
    words:</p>
-    <pre>
-        SELECT to_tsquery('default', 'this is many words');
+    <pre>        SELECT to_tsquery('default', 'this is many words');
        ERROR:  Syntax error
 </pre>

@ -313,8 +290,7 @@
    "tsquery" used for searching a tsvector field. What we need to
    do is search for one to many words with some kind of logic (for
    now simple boolean).</p>
-    <pre>
-        SELECT to_tsquery('default', 'searching|sentence');
+    <pre>        SELECT to_tsquery('default', 'searching|sentence');
              to_tsquery
        ----------------------
         'search' | 'sentenc'
@ -328,8 +304,7 @@
    <p>You can not use words defined as being a stop word in your
    configuration. The function will not fail ... you will just get
    no result, and a NOTICE like this:</p>
-    <pre>
-        SELECT to_tsquery('default', 'a|is&amp;not|!the');
+    <pre>        SELECT to_tsquery('default', 'a|is&amp;not|!the');
        NOTICE:  Query contains only stopword(s)
                 or doesn't contain lexem(s), ignored
         to_tsquery
@ -348,8 +323,7 @@
    <p>The next stage is to add a full text index to an existing
    table. In this example we already have a table defined as
    follows:</p>
-    <pre>
-        CREATE TABLE tblMessages
+    <pre>        CREATE TABLE tblMessages
        (
                intIndex        int4,
                strTopic        varchar(100),
@ -362,8 +336,7 @@
    test strings for a topic, and a message. here is some test data
    I inserted. (yes I know it's completely useless stuff ;-) but
    it will serve our purpose right now).</p>
-    <pre>
-        INSERT INTO tblMessages
+    <pre>        INSERT INTO tblMessages
               VALUES ('1', 'Testing Topic', 'Testing message data input');
        INSERT INTO tblMessages
               VALUES ('2', 'Movie', 'Breakfast at Tiffany\'s');
@ -400,8 +373,7 @@
    <p>The next stage is to create a special text index which we
    will use for FTI, so we can search our table of messages for
    words or a phrase. We do this using the SQL command:</p>
-    <pre>
-        ALTER TABLE tblMessages ADD idxFTI tsvector;
+    <pre>        ALTER TABLE tblMessages ADD COLUMN idxFTI tsvector;
 </pre>

    <p>Note that unlike traditional indexes, this is actually a new
@ -411,8 +383,7 @@

    <p>The general rule for the initial insertion of data will
    follow four steps:</p>
-    <pre>
-    1. update table
+    <pre>    1. update table
    2. vacuum full analyze
    3. create index
    4. vacuum full analyze
@ -426,8 +397,7 @@
    the index has been created on the table, vacuum full analyze is
    run again to update postgres's statistics (ie having the index
    take effect).</p>
-    <pre>
-        UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
+    <pre>        UPDATE tblMessages SET idxFTI=to_tsvector('default', strMessage);
        VACUUM FULL ANALYZE;
 </pre>

@ -436,8 +406,7 @@
    information stored, you should instead do the following, which
    effectively concatenates the two fields into one before being
    inserted into the table:</p>
-    <pre>
-        UPDATE tblMessages
+    <pre>        UPDATE tblMessages
            SET idxFTI=to_tsvector('default',coalesce(strTopic,'') ||' '|| coalesce(strMessage,''));
        VACUUM FULL ANALYZE;
 </pre>
@ -451,8 +420,7 @@
    Full Text INDEXINGi ;-)), so don't worry about any indexing
    overhead. We will create an index based on the gist function.
    GiST is an index structure for Generalized Search Tree.</p>
-    <pre>
-        CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
+    <pre>        CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
        VACUUM FULL ANALYZE;
 </pre>

@ -464,15 +432,13 @@
    <p>The last thing to do is set up a trigger so every time a row
    in this table is changed, the text index is automatically
    updated. This is easily done using:</p>
-    <pre>
-        CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
+    <pre>        CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
            FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, strMessage);
 </pre>

    <p>Or if you are indexing both strMessage and strTopic you
    should instead do:</p>
-    <pre>
-        CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
+    <pre>        CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
            FOR EACH ROW EXECUTE PROCEDURE
                tsearch2(idxFTI, strTopic, strMessage);
 </pre>
@ -490,15 +456,13 @@
    the tsearch2 function. Lets say we want to create a function to
    remove certain characters (like the @ symbol from all
    text).</p>
-    <pre>
-       CREATE FUNCTION dropatsymbol(text) 
+    <pre>       CREATE FUNCTION dropatsymbol(text) 
                     RETURNS text AS 'select replace($1, \'@\', \' \');' LANGUAGE SQL;
 </pre>

    <p>Now we can use this function within the tsearch2 function on
    the trigger.</p>
-    <pre>
-      DROP TRIGGER tsvectorupdate ON tblmessages;
+    <pre>      DROP TRIGGER tsvectorupdate ON tblmessages;
        CREATE TRIGGER tsvectorupdate BEFORE UPDATE OR INSERT ON tblMessages
            FOR EACH ROW EXECUTE PROCEDURE tsearch2(idxFTI, dropatsymbol, strMessage);
        INSERT INTO tblmessages VALUES (69, 'Attempt for dropatsymbol', 'Test@test.com');
@ -513,8 +477,7 @@
    locale of the server. All you have to do is change your default
    configuration, or add a new one for your specific locale. See
    the section on TSEARCH2 CONFIGURATION.</p>
-    <pre class="real">
-   SELECT * FROM tblmessages WHERE intindex = 69;
+    <pre class="real">   SELECT * FROM tblmessages WHERE intindex = 69;

         intindex |         strtopic         |  strmessage   |        idxfti
        ----------+--------------------------+---------------+-----------------------   
@ -540,8 +503,7 @@ in the tsvector column.
    <p>Lets search the indexed data for the word "Test". I indexed
    based on the the concatenation of the strTopic, and the
    strMessage:</p>
-    <pre>
-        SELECT intindex, strtopic FROM tblmessages
+    <pre>        SELECT intindex, strtopic FROM tblmessages
                                  WHERE idxfti @@ 'test'::tsquery;
         intindex |   strtopic
        ----------+---------------
@ -553,8 +515,7 @@ in the tsvector column.
    "Testing Topic". Notice that the word I search for was all
    lowercase. Let's see what happens when I query for uppercase
    "Test".</p>
-    <pre>
-        SELECT intindex, strtopic FROM tblmessages
+    <pre>        SELECT intindex, strtopic FROM tblmessages
                                  WHERE idxfti @@ 'Test'::tsquery;
         intindex | strtopic
        ----------+----------
@ -570,8 +531,7 @@ in the tsvector column.
    <p>Most likely the best way to query the field is to use the
    to_tsquery function on the right hand side of the @@ operator
    like this:</p>
-    <pre>
-        SELECT intindex, strtopic FROM tblmessages
+    <pre>        SELECT intindex, strtopic FROM tblmessages
               WHERE idxfti @@ to_tsquery('default', 'Test | Zeppelin');
         intindex |      strtopic
        ----------+--------------------
@ -592,8 +552,7 @@ in the tsvector column.
    a way around which doesn't appear to have a significant impact
    on query time, and that is to use a query such as the
    following:</p>
-    <pre>
-        SELECT intindex, strTopic FROM tblmessages
+    <pre>        SELECT intindex, strTopic FROM tblmessages
                WHERE idxfti @@ to_tsquery('default', 'gettysburg &amp; address')
                AND strMessage ~* '.*men are created equal.*';
         intindex |           strtopic
@ -626,8 +585,7 @@ in the tsvector column.
    english stemming. We could edit the file
    :'/usr/local/pgsql/share/english.stop' and add a word to the
    list. I edited mine to exclude my name from indexing:</p>
-    <pre>
-    - Edit /usr/local/pgsql/share/english.stop
+    <pre>    - Edit /usr/local/pgsql/share/english.stop
    - Add 'andy' to the list
    - Save the file.
 </pre>
@ -638,16 +596,14 @@ in the tsvector column.
    connected to the DB while editing the stop words, you will need
    to end the current session and re-connect. When you re-connect
    to the database, 'andy' is no longer indexed:</p>
-    <pre>
-        SELECT to_tsvector('default', 'Andy');
+    <pre>        SELECT to_tsvector('default', 'Andy');
         to_tsvector
        ------------
        (1 row)
 </pre>

    <p>Originally I would get the result :</p>
-    <pre>
-        SELECT to_tsvector('default', 'Andy');
+    <pre>        SELECT to_tsvector('default', 'Andy');
         to_tsvector
        ------------
         'andi':1
@ -660,8 +616,7 @@ in the tsvector column.
    'simple', the results would be different. There are no stop
    words for the simple dictionary. It will just convert to lower
    case, and index every unique word.</p>
-    <pre>
-        SELECT to_tsvector('simple', 'Andy andy The the in out');
+    <pre>        SELECT to_tsvector('simple', 'Andy andy The the in out');
                     to_tsvector
        -------------------------------------
         'in':5 'out':6 'the':3,4 'andy':1,2
@ -672,8 +627,7 @@ in the tsvector column.
    into the actual configuration of tsearch2. In the examples in
    this document the configuration has always been specified when
    using the tsearch2 functions:</p>
-    <pre>
-        SELECT to_tsvector('default', 'Testing the default config');
+    <pre>        SELECT to_tsvector('default', 'Testing the default config');
        SELECT to_tsvector('simple', 'Example of simple Config');
 </pre>

@ -682,8 +636,7 @@ in the tsvector column.
    contains both the 'default' configurations based on the 'C'
    locale. And the 'simple' configuration which is not based on
    any locale.</p>
-    <pre>
-        SELECT * from pg_ts_cfg;
+    <pre>        SELECT * from pg_ts_cfg;
             ts_name     | prs_name |    locale
        -----------------+----------+--------------
         default         | default  | C
@ -706,8 +659,7 @@ in the tsvector column.
    configuration or just use one that already exists. If I do not
    specify which configuration to use in the to_tsvector function,
    I receive the following error.</p>
-    <pre>
-        SELECT to_tsvector('learning tsearch is like going to school');
+    <pre>        SELECT to_tsvector('learning tsearch is like going to school');
        ERROR:  Can't find tsearch config by locale
 </pre>

@ -716,8 +668,7 @@ in the tsvector column.
    into the pg_ts_cfg table. We will call the configuration
    'default_english', with the default parser and use the locale
    'en_US'.</p>
-    <pre>
-        INSERT INTO pg_ts_cfg (ts_name, prs_name, locale)
+    <pre>        INSERT INTO pg_ts_cfg (ts_name, prs_name, locale)
               VALUES ('default_english', 'default', 'en_US');
 </pre>

@ -732,15 +683,14 @@ in the tsvector column.
    tsearch2.sql</p>

    <p>Lets take a first look at the pg_ts_dict table</p>
-    <pre>
-        ftstest=# \d pg_ts_dict
+    <pre>        ftstest=# \d pg_ts_dict
                Table "public.pg_ts_dict"
         Column      |  Type   | Modifiers
        -----------------+---------+-----------
         dict_name       | text    | not null
         dict_init       | oid     |
         dict_initoption | text    |
-         dict_lemmatize  | oid     | not null
+         dict_lexize     | oid     | not null
         dict_comment    | text    |
        Indexes: pg_ts_dict_idx unique btree (dict_name)
 </pre>
@ -763,28 +713,57 @@ in the tsvector column.
    ISpell. We will assume you have ISpell installed on you
    machine. (in /usr/local/lib)</p>

-    <p>First lets register the dictionary(ies) to use from ISpell.
-    We will use the english dictionary from ISpell. We insert the
-    paths to the relevant ISpell dictionary (*.hash) and affixes
-    (*.aff) files. There seems to be some question as to which
-    ISpell files are to be used. I installed ISpell from the latest
-    sources on my computer. The installation installed the
-    dictionary files with an extension of *.hash. Some
-    installations install with an extension of *.dict As far as I
-    know the two extensions are equivilant. So *.hash ==
-    *.dict.</p>
+    <p>There has been some confusion in the past as to which files
+    are used from ISpell. ISpell operates using a hash file. This
+    is a binary file created by the ISpell command line utility
+    "buildhash". This utility accepts a file containing the words
+    from the dictionary, and the affixes file and the output is the
+    hash file. The default installation of ISPell installs the
+    english hash file english.hash, which is the exact same file as
+    american.hash. ISpell uses this as the fallback dictionary to
+    use.</p>

-    <p>We will also continue to use the english word stop file that
+    <p>This hash file is not what tsearch2 requires as the ISpell
+    interface. The file(s) needed are those used to create the
+    hash. Tsearch uses the dictionary words for morphology, so the
+    listing is needed not spellchecking. Regardless, these files
+    are included in the ISpell sources, and you can use them to
+    integrate into tsearch2. This is not complicated, but is not
+    very obvious to begin with. The tsearch2 ISpell interface needs
+    only the listing of dictionary words, it will parse and load
+    those words, and use the ISpell dictionary for lexem
+    processing.</p>
+
+    <p>I found the ISPell make system to be very finicky. Their
+    documentation actually states this to be the case. So I just
+    did things the command line way. In the ISpell source tree
+    under langauges/english there are several files in this
+    directory. For a complete description, please read the ISpell
+    README. Basically for the english dictionary there is the
+    option to create the small, medium, large and extra large
+    dictionaries. The medium dictionary is recommended. If the make
+    system is configured correctly, it would build and install the
+    english.has file from the medium size dictionary. Since we are
+    only concerned with the dictionary word listing ... it can be
+    created from the /languages/english directory with the
+    following command:</p>
+    <pre>   sort -u -t/ +0f -1 +0 -T /usr/tmp -o english.med english.0 english.1
+</pre>
+
+    <p>This will create a file called english.med. You can copy
+    this file to whever you like. I place mine in /usr/local/lib so
+    it coincides with the ISpell hash files. You can now add the
+    tsearch2 configuration entry for the ISpell english dictionary.
+    We will also continue to use the english word stop file that
    was installed for the en_stem dictionary. You could use a
    different one if you like. The ISpell configuration is based on
    the "ispell_template" dictionary installed by default with
    tsearch2. We will use the OIDs to the stored procedures from
    the row where the dict_name = 'ispell_template'.</p>
-    <pre>
-        INSERT INTO pg_ts_dict
+    <pre>        INSERT INTO pg_ts_dict
               (SELECT 'en_ispell',
                       dict_init,
-                       'DictFile="/usr/local/lib/english.hash",'
+                       'DictFile="/usr/local/lib/english.med",'
                       'AffFile="/usr/local/lib/english.aff",'
                       'StopFile="/usr/local/pgsql/share/english.stop"',
                       dict_lexize
@ -792,6 +771,50 @@ in the tsvector column.
                WHERE dict_name = 'ispell_template');
 </pre>

+    <p>Now that we have a dictionary we can specify it's use in a
+    query to get a lexem. For this we will use the lexize function.
+    The lexize function takes the name of the dictionary to use as
+    an argument. Just as the other tsearch2 functions operate.</p>
+    <pre>   SELECT lexize('en_ispell', 'program');
+          lexize
+        -----------
+         {program}
+        (1 row)
+</pre>
+
+    <p>If you wanted to always use the ISpell english dictionary
+    you have installed, you can configure tsearch2 to always use a
+    specific dictionary.</p>
+    <pre>  SELCECT set_curdict('en_ispell');
+</pre>
+
+    <p>Lexize is meant to turn a word into a lexem. It is possible
+    to receive more than one lexem returned for a single word.</p>
+    <pre> SELECT lexize('en_ispell', 'conditionally');
+           lexize
+        -----------------------------
+         {conditionally,conditional}
+        (1 row)
+</pre>
+
+    <p>The lexize function is not meant to take a full string as an
+    argument to return lexems for. If you passed in an entire
+    sentence, it attempts to find that entire sentence in the
+    dictionary. SInce the dictionary contains only words, you will
+    receive an empty result set back.</p>
+    <pre>      SELECT lexize('en_ispell', 'This is a senctece to lexize');
+         lexize
+        --------
+        
+        (1 row)
+        
+If you parse a lexem from a word not in the dictionary, then you will receive an empty result. This makes sense because the word "tsearch" is not int the english dictionary. You can create your own additions to the dictionary if you like. This may be useful for scientific or technical glossaries that need to be indexed. SELECT lexize('en_ispell', 'tsearch'); lexize -------- (1 row)
+</pre>
+
+    <p>This is not to say that tsearch will be ignored when adding
+    text information to the the tsvector index column. This will be
+    explained in greater detail with the table pg_ts_cfgmap.</p>
+
    <p>Next we need to set up the configuration for mapping the
    dictionay use to the lexxem parsings. This will be done by
    altering the pg_ts_cfgmap table. We will insert several rows,
@ -799,8 +822,7 @@ in the tsvector column.
    configured for use within tsearch2. There are several type of
    lexims we would be concerned with forcing the use of the ISpell
    dictionary.</p>
-    <pre>
-        INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
+    <pre>        INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
               VALUES ('default_english', 'lhword', '{en_ispell,en_stem}');
        INSERT INTO pg_ts_cfgmap (ts_name, tok_alias, dict_name)
               VALUES ('default_english', 'lpart_hword', '{en_ispell,en_stem}');
@ -818,8 +840,7 @@ in the tsvector column.
    <p>There are several other lexem types used that we do not need
    to specify as using the ISpell dictionary. We can simply insert
    values using the 'simple' stemming process dictionary.</p>
-    <pre>
-        INSERT INTO pg_ts_cfgmap
+    <pre>        INSERT INTO pg_ts_cfgmap
               VALUES ('default_english', 'url', '{simple}');
        INSERT INTO pg_ts_cfgmap
               VALUES ('default_english', 'host', '{simple}');
@ -857,8 +878,7 @@ in the tsvector column.
    complete. We have successfully created a new tsearch2
    configuration. At the same time we have also set the new
    configuration to be our default for en_US locale.</p>
-    <pre>
-        SELECT to_tsvector('default_english',
+    <pre>        SELECT to_tsvector('default_english',
                           'learning tsearch is like going to school');
                           to_tsvector
        --------------------------------------------------
@ -870,12 +890,37 @@ in the tsvector column.
        (1 row)
 </pre>

+    <p>Notice here that words like "tsearch" are still parsed and
+    indexed in the tsvector column. There is a lexem returned for
+    the word becuase in the configuration mapping table, we specify
+    words to be used from the 'en_ispell' dictionary first, but as
+    a fallback to use the 'en_stem' dictionary. Therefore a lexem
+    is not returned from en_ispell, but is returned from en_stem,
+    and added to the tsvector.</p>
+    <pre> SELECT to_tsvector('learning tsearch is like going to computer school');
+                                to_tsvector
+        ---------------------------------------------------------------------------
+         'go':5 'like':4 'learn':1 'school':8 'compute':7 'tsearch':2 'computer':7
+        (1 row)
+</pre>
+
+    <p>Notice in this last example I added the word "computer" to
+    the text to be converted into a tsvector. Because we have setup
+    our default configuration to use the ISpell english dictionary,
+    the words are lexized, and computer returns 2 lexems at the
+    same position. 'compute':7 and 'computer':7 are now both
+    indexed for the word computer.</p>
+
+    <p>You can create additional dictionarynlists, or use the extra
+    large dictionary from ISpell. You can read through the ISpell
+    documents, and source tree to make modifications as you see
+    fit.</p>
+
    <p>In the case that you already have a configuration set for
    the locale, and you are changing it to your new dictionary
    configuration. You will have to set the old locale to NULL. If
    we are using the 'C' locale then we would do this:</p>
-    <pre>
-        UPDATE pg_ts_cfg SET locale=NULL WHERE locale = 'C';
+    <pre>        UPDATE pg_ts_cfg SET locale=NULL WHERE locale = 'C';
 </pre>

    <p>That about wraps up the configuration of tsearch2. There is
@ -917,38 +962,32 @@ in the tsvector column.
    <p>1) Backup any global database objects such as users and
    groups (this step is usually only necessary when you will be
    restoring to a virgin system)</p>
-    <pre>
-        pg_dumpall -g &gt; GLOBALobjects.sql
+    <pre>        pg_dumpall -g &gt; GLOBALobjects.sql
 </pre>

    <p>2) Backup the full database schema using pg_dump</p>
-    <pre>
-        pg_dump -s DATABASE &gt; DATABASEschema.sql
+    <pre>        pg_dump -s DATABASE &gt; DATABASEschema.sql
 </pre>

    <p>3) Backup the full database using pg_dump</p>
-    <pre>
-        pg_dump -Fc DATABASE &gt; DATABASEdata.tar
+    <pre>        pg_dump -Fc DATABASE &gt; DATABASEdata.tar
 </pre>

    <p>To Restore a PostgreSQL database that uses the tsearch2
    module:</p>

    <p>1) Create the blank database</p>
-    <pre>
-        createdb DATABASE
+    <pre>        createdb DATABASE
 </pre>

    <p>2) Restore any global database objects such as users and
    groups (this step is usually only necessary when you will be
    restoring to a virgin system)</p>
-    <pre>
-        psql DATABASE &lt; GLOBALobjects.sql
+    <pre>        psql DATABASE &lt; GLOBALobjects.sql
 </pre>

    <p>3) Create the tsearch2 objects, functions and operators</p>
-    <pre>
-        psql DATABASE &lt; tsearch2.sql
+    <pre>        psql DATABASE &lt; tsearch2.sql
 </pre>

    <p>4) Edit the backed up database schema and delete all SQL
@ -957,13 +996,11 @@ in the tsvector column.
    tsvector types. If your not sure what these are, they are the
    ones listed in tsearch2.sql. Then restore the edited schema to
    the database</p>
-    <pre>
-        psql DATABASE &lt; DATABASEschema.sql
+    <pre>        psql DATABASE &lt; DATABASEschema.sql
 </pre>

    <p>5) Restore the data for the database</p>
-    <pre>
-        pg_restore -N -a -d DATABASE DATABASEdata.tar
+    <pre>        pg_restore -N -a -d DATABASE DATABASEdata.tar
 </pre>

    <p>If you get any errors in step 4, it will most likely be
@ -971,5 +1008,4 @@ in the tsvector column.
    tsearch2.sql. Any errors in step 5 will mean the database
    schema was probably restored wrongly.</p>
  </div>
-</body>
-</html>
+</body></html>