Some more tsearch docs work --- sync names with CVS-tip reality, some
minor rewording, some markup fixups. Lots left to do here ...
This commit is contained in:
parent
a13cefafb1
commit
52a0830c40
@ -210,9 +210,9 @@ SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11 rat:12'::ts
|
|||||||
'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
|
'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12 'sat':4
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
Each lexeme position also can be labeled as <literal>'A'</literal>,
|
Each lexeme position also can be labeled as <literal>A</literal>,
|
||||||
<literal>'B'</literal>, <literal>'C'</literal>, <literal>'D'</literal>,
|
<literal>B</literal>, <literal>C</literal>, <literal>D</literal>,
|
||||||
where <literal>'D'</literal> is the default. These labels can be used to group
|
where <literal>D</literal> is the default. These labels can be used to group
|
||||||
lexemes into different <emphasis>importance</emphasis> or
|
lexemes into different <emphasis>importance</emphasis> or
|
||||||
<emphasis>rankings</emphasis>, for example to reflect document structure.
|
<emphasis>rankings</emphasis>, for example to reflect document structure.
|
||||||
Actual values can be assigned at search time and used during the calculation
|
Actual values can be assigned at search time and used during the calculation
|
||||||
@ -668,9 +668,9 @@ setweight(<replaceable class="PARAMETER">vector</replaceable> TSVECTOR, <replace
|
|||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
This function returns a copy of the input vector in which every location
|
This function returns a copy of the input vector in which every location
|
||||||
has been labeled with either the letter <literal>'A'</literal>,
|
has been labeled with either the letter <literal>A</literal>,
|
||||||
<literal>'B'</literal>, or <literal>'C'</literal>, or the default label
|
<literal>B</literal>, or <literal>C</literal>, or the default label
|
||||||
<literal>'D'</literal> (which is the default for new vectors
|
<literal>D</literal> (which is the default for new vectors
|
||||||
and as such is usually not displayed). These labels are retained
|
and as such is usually not displayed). These labels are retained
|
||||||
when vectors are concatenated, allowing words from different parts of a
|
when vectors are concatenated, allowing words from different parts of a
|
||||||
document to be weighted differently by ranking functions.
|
document to be weighted differently by ranking functions.
|
||||||
@ -807,13 +807,12 @@ to be made.
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<indexterm zone="textsearch-tsvector">
|
<indexterm zone="textsearch-tsvector">
|
||||||
<primary>stat</primary>
|
<primary>ts_stat</primary>
|
||||||
</indexterm>
|
</indexterm>
|
||||||
|
|
||||||
<term>
|
<term>
|
||||||
<synopsis>
|
<synopsis>
|
||||||
stat(<optional><replaceable class="PARAMETER">sqlquery</replaceable> text </optional>, <optional>weight text </optional>) returns SETOF statinfo
|
ts_stat(<replaceable class="PARAMETER">sqlquery</replaceable> text <optional>, <replaceable class="PARAMETER">weights</replaceable> text </optional>) returns SETOF statinfo
|
||||||
<!-- TODO I guess that not both of the arguments are optional? -->
|
|
||||||
</synopsis>
|
</synopsis>
|
||||||
</term>
|
</term>
|
||||||
|
|
||||||
@ -821,27 +820,27 @@ stat(<optional><replaceable class="PARAMETER">sqlquery</replaceable> text </opti
|
|||||||
<para>
|
<para>
|
||||||
Here <type>statinfo</type> is a type, defined as:
|
Here <type>statinfo</type> is a type, defined as:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TYPE statinfo AS (word text, ndoc int4, nentry int4);
|
CREATE TYPE statinfo AS (word text, ndoc integer, nentry integer);
|
||||||
</programlisting>
|
</programlisting>
|
||||||
and <replaceable>sqlquery</replaceable> is a query which returns a
|
and <replaceable>sqlquery</replaceable> is a text value containing a SQL query
|
||||||
<type>tsvector</type> column's contents. <function>stat</> returns
|
which returns a single <type>tsvector</type> column. <function>ts_stat</>
|
||||||
statistics about a <type>tsvector</type> column, i.e., the number of
|
executes the query and returns statistics about the resulting
|
||||||
documents, <literal>ndoc</>, and the total number of words in the
|
<type>tsvector</type> data, i.e., the number of documents, <literal>ndoc</>,
|
||||||
collection, <literal>nentry</>. It is useful for checking your
|
and the total number of words in the collection, <literal>nentry</>. It is
|
||||||
configuration and to find stop word candidates. For example, to find
|
useful for checking your configuration and to find stop word candidates. For
|
||||||
the ten most frequent words:
|
example, to find the ten most frequent words:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT * FROM stat('SELECT vector from apod')
|
SELECT * FROM ts_stat('SELECT vector from apod')
|
||||||
ORDER BY ndoc DESC, nentry DESC, word
|
ORDER BY ndoc DESC, nentry DESC, word
|
||||||
LIMIT 10;
|
LIMIT 10;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
Optionally, one can specify <replaceable>weight</replaceable> to obtain
|
Optionally, one can specify <replaceable>weights</replaceable> to obtain
|
||||||
statistics about words with a specific <replaceable>weight</replaceable>:
|
statistics about words with a specific <replaceable>weight</replaceable>:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT * FROM stat('SELECT vector FROM apod','a')
|
SELECT * FROM ts_stat('SELECT vector FROM apod','a')
|
||||||
ORDER BY ndoc DESC, nentry DESC, word
|
ORDER BY ndoc DESC, nentry DESC, word
|
||||||
LIMIT 10;
|
LIMIT 10;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1146,9 +1145,9 @@ topic.
|
|||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The <function>rewrite()</function> function changes the original query by
|
The <function>ts_rewrite()</function> function changes the original query by
|
||||||
replacing part of the query with some other string of type <type>tsquery</type>,
|
replacing part of the query with some other string of type <type>tsquery</type>,
|
||||||
as defined by the rewrite rule. Arguments to <function>rewrite()</function>
|
as defined by the rewrite rule. Arguments to <function>ts_rewrite()</function>
|
||||||
can be names of columns of type <type>tsquery</type>.
|
can be names of columns of type <type>tsquery</type>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -1161,20 +1160,20 @@ INSERT INTO aliases VALUES('a', 'c');
|
|||||||
<varlistentry>
|
<varlistentry>
|
||||||
|
|
||||||
<indexterm zone="textsearch-tsquery">
|
<indexterm zone="textsearch-tsquery">
|
||||||
<primary>rewrite - 1</primary>
|
<primary>ts_rewrite</primary>
|
||||||
</indexterm>
|
</indexterm>
|
||||||
|
|
||||||
<term>
|
<term>
|
||||||
<synopsis>
|
<synopsis>
|
||||||
rewrite (<replaceable class="PARAMETER">query</replaceable> TSQUERY, <replaceable class="PARAMETER">target</replaceable> TSQUERY, <replaceable class="PARAMETER">sample</replaceable> TSQUERY) returns TSQUERY
|
ts_rewrite (<replaceable class="PARAMETER">query</replaceable> TSQUERY, <replaceable class="PARAMETER">target</replaceable> TSQUERY, <replaceable class="PARAMETER">sample</replaceable> TSQUERY) returns TSQUERY
|
||||||
</synopsis>
|
</synopsis>
|
||||||
</term>
|
</term>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT rewrite('a & b'::tsquery, 'a'::tsquery, 'c'::tsquery);
|
SELECT ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'c'::tsquery);
|
||||||
rewrite
|
ts_rewrite
|
||||||
-----------
|
-----------
|
||||||
'b' & 'c'
|
'b' & 'c'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1184,21 +1183,17 @@ SELECT rewrite('a & b'::tsquery, 'a'::tsquery, 'c'::tsquery);
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
|
|
||||||
<indexterm zone="textsearch-tsquery">
|
|
||||||
<primary>rewrite - 2</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<term>
|
<term>
|
||||||
<synopsis>
|
<synopsis>
|
||||||
rewrite(ARRAY[<replaceable class="PARAMETER">query</replaceable> TSQUERY, <replaceable class="PARAMETER">target</replaceable> TSQUERY, <replaceable class="PARAMETER">sample</replaceable> TSQUERY]) returns TSQUERY
|
ts_rewrite(ARRAY[<replaceable class="PARAMETER">query</replaceable> TSQUERY, <replaceable class="PARAMETER">target</replaceable> TSQUERY, <replaceable class="PARAMETER">sample</replaceable> TSQUERY]) returns TSQUERY
|
||||||
</synopsis>
|
</synopsis>
|
||||||
</term>
|
</term>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT rewrite(ARRAY['a & b'::tsquery, t,s]) FROM aliases;
|
SELECT ts_rewrite(ARRAY['a & b'::tsquery, t,s]) FROM aliases;
|
||||||
rewrite
|
ts_rewrite
|
||||||
-----------
|
-----------
|
||||||
'b' & 'c'
|
'b' & 'c'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1208,21 +1203,17 @@ SELECT rewrite(ARRAY['a & b'::tsquery, t,s]) FROM aliases;
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
|
|
||||||
<indexterm zone="textsearch-tsquery">
|
|
||||||
<primary>rewrite - 3</primary>
|
|
||||||
</indexterm>
|
|
||||||
|
|
||||||
<term>
|
<term>
|
||||||
<synopsis>
|
<synopsis>
|
||||||
rewrite (<replaceable class="PARAMETER">query</> TSQUERY,<literal>'SELECT target ,sample FROM test'</literal>::text) returns TSQUERY
|
ts_rewrite (<replaceable class="PARAMETER">query</> TSQUERY,<literal>'SELECT target ,sample FROM test'</literal>::text) returns TSQUERY
|
||||||
</synopsis>
|
</synopsis>
|
||||||
</term>
|
</term>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<para>
|
<para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases');
|
SELECT ts_rewrite('a & b'::tsquery, 'SELECT t,s FROM aliases');
|
||||||
rewrite
|
ts_rewrite
|
||||||
-----------
|
-----------
|
||||||
'b' & 'c'
|
'b' & 'c'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1246,12 +1237,12 @@ SELECT * FROM aliases;
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
This ambiguity can be resolved by specifying a sort order:
|
This ambiguity can be resolved by specifying a sort order:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT rewrite('a & b', 'SELECT t, s FROM aliases ORDER BY t DESC');
|
SELECT ts_rewrite('a & b', 'SELECT t, s FROM aliases ORDER BY t DESC');
|
||||||
rewrite
|
ts_rewrite
|
||||||
---------
|
---------
|
||||||
'cc'
|
'cc'
|
||||||
SELECT rewrite('a & b', 'SELECT t, s FROM aliases ORDER BY t ASC');
|
SELECT ts_rewrite('a & b', 'SELECT t, s FROM aliases ORDER BY t ASC');
|
||||||
rewrite
|
ts_rewrite
|
||||||
-----------
|
-----------
|
||||||
'b' & 'c'
|
'b' & 'c'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1263,7 +1254,7 @@ Let's consider a real-life astronomical example. We'll expand query
|
|||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TABLE aliases (t tsquery primary key, s tsquery);
|
CREATE TABLE aliases (t tsquery primary key, s tsquery);
|
||||||
INSERT INTO aliases VALUES(to_tsquery('supernovae'), to_tsquery('supernovae|sn'));
|
INSERT INTO aliases VALUES(to_tsquery('supernovae'), to_tsquery('supernovae|sn'));
|
||||||
SELECT rewrite(to_tsquery('supernovae'), 'SELECT * FROM aliases') && to_tsquery('crab');
|
SELECT ts_rewrite(to_tsquery('supernovae'), 'SELECT * FROM aliases') && to_tsquery('crab');
|
||||||
?column?
|
?column?
|
||||||
---------------------------------
|
---------------------------------
|
||||||
( 'supernova' | 'sn' ) & 'crab'
|
( 'supernova' | 'sn' ) & 'crab'
|
||||||
@ -1271,7 +1262,7 @@ SELECT rewrite(to_tsquery('supernovae'), 'SELECT * FROM aliases') && to
|
|||||||
Notice, that we can change the rewriting rule online<!-- TODO maybe use another word for "online"? -->:
|
Notice, that we can change the rewriting rule online<!-- TODO maybe use another word for "online"? -->:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
UPDATE aliases SET s=to_tsquery('supernovae|sn & !nebulae') WHERE t=to_tsquery('supernovae');
|
UPDATE aliases SET s=to_tsquery('supernovae|sn & !nebulae') WHERE t=to_tsquery('supernovae');
|
||||||
SELECT rewrite(to_tsquery('supernovae'), 'SELECT * FROM aliases') && to_tsquery('crab');
|
SELECT ts_rewrite(to_tsquery('supernovae'), 'SELECT * FROM aliases') && to_tsquery('crab');
|
||||||
?column?
|
?column?
|
||||||
---------------------------------------------
|
---------------------------------------------
|
||||||
( 'supernova' | 'sn' & !'nebula' ) & 'crab'
|
( 'supernova' | 'sn' & !'nebula' ) & 'crab'
|
||||||
@ -1288,10 +1279,10 @@ for a possible hit. To filter out obvious non-candidate rules there are containm
|
|||||||
operators for the <type>tsquery</type> type. In the example below, we select only those
|
operators for the <type>tsquery</type> type. In the example below, we select only those
|
||||||
rules which might contain the original query:
|
rules which might contain the original query:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT rewrite(ARRAY['a & b'::tsquery, t,s])
|
SELECT ts_rewrite(ARRAY['a & b'::tsquery, t,s])
|
||||||
FROM aliases
|
FROM aliases
|
||||||
WHERE 'a & b' @> t;
|
WHERE 'a & b' @> t;
|
||||||
rewrite
|
ts_rewrite
|
||||||
-----------
|
-----------
|
||||||
'b' & 'c'
|
'b' & 'c'
|
||||||
</programlisting>
|
</programlisting>
|
||||||
@ -1525,7 +1516,7 @@ SELECT * FROM ts_parse('default','123 - a number');
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<indexterm zone="textsearch-parser">
|
<indexterm zone="textsearch-parser">
|
||||||
<primary>token_type</primary>
|
<primary>ts_token_type</primary>
|
||||||
</indexterm>
|
</indexterm>
|
||||||
|
|
||||||
<term>
|
<term>
|
||||||
@ -1894,11 +1885,13 @@ configuration <replaceable>config_name</replaceable><!-- TODO I don't get this -
|
|||||||
<title>Dictionaries</title>
|
<title>Dictionaries</title>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
Dictionaries are used to specify words that should not be considered in
|
Dictionaries are used to eliminate words that should not be considered in a
|
||||||
a search and for the normalization of words to allow the user to use any
|
search (<firstterm>stop words</>), and to <firstterm>normalize</> words so
|
||||||
derived form of a word in a query. Also, normalization can reduce the size of
|
that different derived forms of the same word will match. Aside from
|
||||||
<type>tsvector</type>. Normalization does not always have linguistic
|
improving search quality, normalization and removal of stop words reduce the
|
||||||
meaning and usually depends on application semantics.
|
size of the <type>tsvector</type> representation of a document, thereby
|
||||||
|
improving performance. Normalization does not always have linguistic meaning
|
||||||
|
and usually depends on application semantics.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -1954,10 +1947,6 @@ a void array if the dictionary knows the lexeme, but it is a stop word
|
|||||||
<literal>NULL</literal> if the dictionary does not recognize the input lexeme
|
<literal>NULL</literal> if the dictionary does not recognize the input lexeme
|
||||||
</para></listitem>
|
</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<emphasis>WARNING:</emphasis>
|
|
||||||
Data files used by dictionaries should be in the <varname>server_encoding</varname>
|
|
||||||
so all encodings are consistent across databases.
|
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -1987,7 +1976,8 @@ recognizes everything. For example, for an astronomy-specific search
|
|||||||
terms, a general English dictionary and a <application>snowball</> English
|
terms, a general English dictionary and a <application>snowball</> English
|
||||||
stemmer:
|
stemmer:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
ALTER TEXT SEARCH CONFIGURATION astro_en ADD MAPPING FOR lword WITH astrosyn, en_ispell, en_stem;
|
ALTER TEXT SEARCH CONFIGURATION astro_en
|
||||||
|
ADD MAPPING FOR lword WITH astrosyn, english_ispell, english_stem;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -1995,7 +1985,7 @@ ALTER TEXT SEARCH CONFIGURATION astro_en ADD MAPPING FOR lword WITH astrosyn, en
|
|||||||
Function <function>ts_lexize</function> can be used to test dictionaries,
|
Function <function>ts_lexize</function> can be used to test dictionaries,
|
||||||
for example:
|
for example:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT ts_lexize('en_stem', 'stars');
|
SELECT ts_lexize('english_stem', 'stars');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{star}
|
{star}
|
||||||
@ -2068,6 +2058,15 @@ SELECT ts_lexize('public.simple_dict','The');
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<caution>
|
||||||
|
<para>
|
||||||
|
Most types of dictionaries rely on configuration files, such as files of stop
|
||||||
|
words. These files <emphasis>must</> be stored in UTF-8 encoding. They will
|
||||||
|
be translated to the actual database encoding, if that is different, when they
|
||||||
|
are read into the server.
|
||||||
|
</para>
|
||||||
|
</caution>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
@ -2080,23 +2079,25 @@ word with a synonym. Phrases are not supported (use the thesaurus
|
|||||||
dictionary (<xref linkend="textsearch-thesaurus">) for that). A synonym
|
dictionary (<xref linkend="textsearch-thesaurus">) for that). A synonym
|
||||||
dictionary can be used to overcome linguistic problems, for example, to
|
dictionary can be used to overcome linguistic problems, for example, to
|
||||||
prevent an English stemmer dictionary from reducing the word 'Paris' to
|
prevent an English stemmer dictionary from reducing the word 'Paris' to
|
||||||
'pari'. In that case, it is enough to have a <literal>Paris
|
'pari'. It is enough to have a <literal>Paris paris</literal> line in the
|
||||||
paris</literal> line in the synonym dictionary and put it before the
|
synonym dictionary and put it before the <literal>english_stem</> dictionary:
|
||||||
<literal>en_stem</> dictionary:
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT * FROM ts_debug('english','Paris');
|
SELECT * FROM ts_debug('english','Paris');
|
||||||
Alias | Description | Token | Dictionaries | Lexized token
|
Alias | Description | Token | Dictionaries | Lexized token
|
||||||
-------+-------------+-------+--------------+-----------------
|
-------+-------------+-------+----------------+----------------------
|
||||||
lword | Latin word | Paris | {english} | english: {pari}
|
lword | Latin word | Paris | {english_stem} | english_stem: {pari}
|
||||||
(1 row)
|
(1 row)
|
||||||
|
|
||||||
|
CREATE TEXT SEARCH DICTIONARY synonym
|
||||||
|
(TEMPLATE = synonym, SYNONYMS = my_synonyms);
|
||||||
|
|
||||||
ALTER TEXT SEARCH CONFIGURATION english
|
ALTER TEXT SEARCH CONFIGURATION english
|
||||||
ADD MAPPING FOR lword WITH synonym, en_stem;
|
ALTER MAPPING FOR lword WITH synonym, english_stem;
|
||||||
|
|
||||||
SELECT * FROM ts_debug('english','Paris');
|
SELECT * FROM ts_debug('english','Paris');
|
||||||
Alias | Description | Token | Dictionaries | Lexized token
|
Alias | Description | Token | Dictionaries | Lexized token
|
||||||
-------+-------------+-------+-------------------+------------------
|
-------+-------------+-------+------------------------+------------------
|
||||||
lword | Latin word | Paris | {synonym,en_stem} | synonym: {paris}
|
lword | Latin word | Paris | {synonym,english_stem} | synonym: {paris}
|
||||||
(1 row)
|
(1 row)
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
@ -2119,25 +2120,27 @@ preferred term and, optionally, preserves them for indexing. Thesauruses
|
|||||||
are used during indexing so any change in the thesaurus <emphasis>requires</emphasis>
|
are used during indexing so any change in the thesaurus <emphasis>requires</emphasis>
|
||||||
reindexing. The current implementation of the thesaurus
|
reindexing. The current implementation of the thesaurus
|
||||||
dictionary is an extension of the synonym dictionary with added
|
dictionary is an extension of the synonym dictionary with added
|
||||||
<emphasis>phrase</emphasis> support. A thesaurus is a plain file of the
|
<emphasis>phrase</emphasis> support. A thesaurus dictionary requires
|
||||||
following format:
|
a configuration file of the following format:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
# this is a comment
|
# this is a comment
|
||||||
sample word(s) : indexed word(s)
|
sample word(s) : indexed word(s)
|
||||||
...............................
|
more sample word(s) : more indexed word(s)
|
||||||
|
...
|
||||||
</programlisting>
|
</programlisting>
|
||||||
where the colon (<symbol>:</symbol>) symbol acts as a delimiter.
|
where the colon (<symbol>:</symbol>) symbol acts as a delimiter between a
|
||||||
|
a phrase and its replacement.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
A thesaurus dictionary uses a <emphasis>subdictionary</emphasis> (which
|
A thesaurus dictionary uses a <emphasis>subdictionary</emphasis> (which
|
||||||
should be defined in the full text configuration) to normalize the
|
is defined in the dictionary's configuration) to normalize the input text
|
||||||
thesaurus text. It is only possible to define one dictionary. Notice that
|
before checking for phrase matches. It is only possible to select one
|
||||||
the <emphasis>subdictionary</emphasis> will produce an error if it can
|
subdictionary. An error is reported if the subdictionary fails to
|
||||||
not recognize a word. In that case, you should remove the definition of
|
recognize a word. In that case, you should remove the use of the word or teach
|
||||||
the word or teach the <emphasis>subdictionary</emphasis> to about it.
|
the subdictionary about it. Use an asterisk (<symbol>*</symbol>) at the
|
||||||
Use an asterisk (<symbol>*</symbol>) at the beginning of an indexed word to
|
beginning of an indexed word to skip the subdictionary. It is still required
|
||||||
skip the subdictionary. It is still required that sample words are known.
|
that sample words are known.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -2149,16 +2152,16 @@ Stop words recognized by the subdictionary are replaced by a 'stop word
|
|||||||
placeholder' to record their position. To break possible ties the thesaurus
|
placeholder' to record their position. To break possible ties the thesaurus
|
||||||
uses the last definition. To illustrate this, consider a thesaurus (with
|
uses the last definition. To illustrate this, consider a thesaurus (with
|
||||||
a <parameter>simple</parameter> subdictionary) with pattern
|
a <parameter>simple</parameter> subdictionary) with pattern
|
||||||
<literal>'swsw'</>, where <literal>'s'</> designates any stop word and
|
<replaceable>swsw</>, where <replaceable>s</> designates any stop word and
|
||||||
<literal>'w'</>, any known word:
|
<replaceable>w</>, any known word:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
a one the two : swsw
|
a one the two : swsw
|
||||||
the one a two : swsw2
|
the one a two : swsw2
|
||||||
</programlisting>
|
</programlisting>
|
||||||
Words <literal>'a'</> and <literal>'the'</> are stop words defined in the
|
Words <literal>a</> and <literal>the</> are stop words defined in the
|
||||||
configuration of a subdictionary. The thesaurus considers <literal>'the
|
configuration of a subdictionary. The thesaurus considers <literal>the
|
||||||
one the two'</literal> and <literal>'that one then two'</literal> as equal
|
one the two</literal> and <literal>that one then two</literal> as equal
|
||||||
and will use definition 'swsw2'.
|
and will use definition <replaceable>swsw2</>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@ -2186,7 +2189,7 @@ For example:
|
|||||||
CREATE TEXT SEARCH DICTIONARY thesaurus_simple (
|
CREATE TEXT SEARCH DICTIONARY thesaurus_simple (
|
||||||
TEMPLATE = thesaurus,
|
TEMPLATE = thesaurus,
|
||||||
DictFile = mythesaurus,
|
DictFile = mythesaurus,
|
||||||
Dictionary = pg_catalog.en_stem
|
Dictionary = pg_catalog.english_stem
|
||||||
);
|
);
|
||||||
</programlisting>
|
</programlisting>
|
||||||
Here:
|
Here:
|
||||||
@ -2201,10 +2204,10 @@ where <literal>$SHAREDIR</> means the installation shared-data directory,
|
|||||||
often <filename>/usr/local/share</>).
|
often <filename>/usr/local/share</>).
|
||||||
</para></listitem>
|
</para></listitem>
|
||||||
<listitem><para>
|
<listitem><para>
|
||||||
<literal>pg_catalog.en_stem</literal> is the dictionary (snowball
|
<literal>pg_catalog.english_stem</literal> is the dictionary (Snowball
|
||||||
English stemmer) to use for thesaurus normalization. Notice that the
|
English stemmer) to use for thesaurus normalization. Notice that the
|
||||||
<literal>en_stem</> dictionary has its own configuration (for example,
|
<literal>english_stem</> dictionary has its own configuration (for example,
|
||||||
stop words).
|
stop words), which is not shown here.
|
||||||
</para></listitem>
|
</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
@ -2235,10 +2238,10 @@ an astronomical thesaurus and english stemmer:
|
|||||||
CREATE TEXT SEARCH DICTIONARY thesaurus_astro (
|
CREATE TEXT SEARCH DICTIONARY thesaurus_astro (
|
||||||
TEMPLATE = thesaurus,
|
TEMPLATE = thesaurus,
|
||||||
DictFile = thesaurus_astro,
|
DictFile = thesaurus_astro,
|
||||||
Dictionary = en_stem
|
Dictionary = english_stem
|
||||||
);
|
);
|
||||||
ALTER TEXT SEARCH CONFIGURATION russian
|
ALTER TEXT SEARCH CONFIGURATION russian
|
||||||
ADD MAPPING FOR lword, lhword, lpart_hword WITH thesaurus_astro, en_stem;
|
ADD MAPPING FOR lword, lhword, lpart_hword WITH thesaurus_astro, english_stem;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
Now we can see how it works. Note that <function>ts_lexize</function> cannot
|
Now we can see how it works. Note that <function>ts_lexize</function> cannot
|
||||||
be used for testing the thesaurus (see description of
|
be used for testing the thesaurus (see description of
|
||||||
@ -2266,7 +2269,7 @@ SELECT to_tsquery('''supernova star''');
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
Notice that <literal>supernova star</literal> matches <literal>supernovae
|
Notice that <literal>supernova star</literal> matches <literal>supernovae
|
||||||
stars</literal> in <literal>thesaurus_astro</literal> because we specified the
|
stars</literal> in <literal>thesaurus_astro</literal> because we specified the
|
||||||
<literal>en_stem</literal> stemmer in the thesaurus definition.
|
<literal>english_stem</literal> stemmer in the thesaurus definition.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
To keep an original phrase in full text indexing just add it to the right part
|
To keep an original phrase in full text indexing just add it to the right part
|
||||||
@ -2308,15 +2311,15 @@ conjugations of the search term <literal>bank</literal>, e.g.
|
|||||||
<literal>banking</>, <literal>banked</>, <literal>banks</>,
|
<literal>banking</>, <literal>banked</>, <literal>banks</>,
|
||||||
<literal>banks'</>, and <literal>bank's</>.
|
<literal>banks'</>, and <literal>bank's</>.
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT ts_lexize('en_ispell','banking');
|
SELECT ts_lexize('english_ispell','banking');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{bank}
|
{bank}
|
||||||
SELECT ts_lexize('en_ispell','bank''s');
|
SELECT ts_lexize('english_ispell','bank''s');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{bank}
|
{bank}
|
||||||
SELECT ts_lexize('en_ispell','banked');
|
SELECT ts_lexize('english_ispell','banked');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{bank}
|
{bank}
|
||||||
@ -2330,7 +2333,7 @@ To create an ispell dictionary one should use the built-in
|
|||||||
parameters.
|
parameters.
|
||||||
</para>
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TEXT SEARCH DICTIONARY en_ispell (
|
CREATE TEXT SEARCH DICTIONARY english_ispell (
|
||||||
TEMPLATE = ispell,
|
TEMPLATE = ispell,
|
||||||
DictFile = english,
|
DictFile = english,
|
||||||
AffFile = english,
|
AffFile = english,
|
||||||
@ -2386,13 +2389,13 @@ The <application>Snowball</> dictionary template is based on the project
|
|||||||
of Martin Porter, inventor of the popular Porter's stemming algorithm
|
of Martin Porter, inventor of the popular Porter's stemming algorithm
|
||||||
for the English language and now supported in many languages (see the <ulink
|
for the English language and now supported in many languages (see the <ulink
|
||||||
url="http://snowball.tartarus.org">Snowball site</ulink> for more
|
url="http://snowball.tartarus.org">Snowball site</ulink> for more
|
||||||
information). Full text searching contains a large number of stemmers for
|
information). The Snowball project supplies a large number of stemmers for
|
||||||
many languages. A Snowball dictionary requires a language parameter to
|
many languages. A Snowball dictionary requires a language parameter to
|
||||||
identify which stemmer to use, and optionally can specify a stopword file name.
|
identify which stemmer to use, and optionally can specify a stopword file name.
|
||||||
For example,
|
For example, there is a built-in definition equivalent to
|
||||||
<programlisting>
|
<programlisting>
|
||||||
ALTER TEXT SEARCH DICTIONARY en_stem (
|
CREATE TEXT SEARCH DICTIONARY english_stem (
|
||||||
StopWords = english-utf8, Language = english
|
TEMPLATE = snowball, Language = english, StopWords = english
|
||||||
);
|
);
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
@ -2400,7 +2403,8 @@ ALTER TEXT SEARCH DICTIONARY en_stem (
|
|||||||
<para>
|
<para>
|
||||||
The <application>Snowball</> dictionary recognizes everything, so it is best
|
The <application>Snowball</> dictionary recognizes everything, so it is best
|
||||||
to place it at the end of the dictionary stack. It it useless to have it
|
to place it at the end of the dictionary stack. It it useless to have it
|
||||||
before any other dictionary because a lexeme will not pass through its stemmer.
|
before any other dictionary because a lexeme will never pass through it to
|
||||||
|
the next dictionary.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -2420,7 +2424,7 @@ The <function>ts_lexize</> function facilitates dictionary testing:
|
|||||||
|
|
||||||
<term>
|
<term>
|
||||||
<synopsis>
|
<synopsis>
|
||||||
ts_lexize(<optional> <replaceable class="PARAMETER">dict_name</replaceable> text</optional>, <replaceable class="PARAMETER">lexeme</replaceable> text) returns text[]
|
ts_lexize(<replaceable class="PARAMETER">dict_name</replaceable> text, <replaceable class="PARAMETER">lexeme</replaceable> text) returns text[]
|
||||||
</synopsis>
|
</synopsis>
|
||||||
</term>
|
</term>
|
||||||
|
|
||||||
@ -2432,11 +2436,11 @@ array if the lexeme is known to the dictionary but it is a stop word, or
|
|||||||
<literal>NULL</literal> if it is an unknown word.
|
<literal>NULL</literal> if it is an unknown word.
|
||||||
</para>
|
</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT ts_lexize('en_stem', 'stars');
|
SELECT ts_lexize('english_stem', 'stars');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{star}
|
{star}
|
||||||
SELECT ts_lexize('en_stem', 'a');
|
SELECT ts_lexize('english_stem', 'a');
|
||||||
ts_lexize
|
ts_lexize
|
||||||
-----------
|
-----------
|
||||||
{}
|
{}
|
||||||
@ -2457,9 +2461,9 @@ SELECT ts_lexize('thesaurus_astro','supernovae stars') is null;
|
|||||||
----------
|
----------
|
||||||
t
|
t
|
||||||
</programlisting>
|
</programlisting>
|
||||||
Thesaurus dictionary <literal>thesaurus_astro</literal> does know
|
The thesaurus dictionary <literal>thesaurus_astro</literal> does know
|
||||||
<literal>supernovae stars</literal>, but ts_lexize fails since it does not
|
<literal>supernovae stars</literal>, but <function>ts_lexize</> fails since it
|
||||||
parse the input text and considers it as a single lexeme. Use
|
does not parse the input text and considers it as a single lexeme. Use
|
||||||
<function>plainto_tsquery</> and <function>to_tsvector</> to test thesaurus
|
<function>plainto_tsquery</> and <function>to_tsvector</> to test thesaurus
|
||||||
dictionaries:
|
dictionaries:
|
||||||
<programlisting>
|
<programlisting>
|
||||||
@ -2541,25 +2545,14 @@ CREATE TEXT SEARCH DICTIONARY pg_dict (
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
Then register the <productname>ispell</> dictionary
|
Then register the <productname>ispell</> dictionary
|
||||||
<literal>en_ispell</literal> using the <literal>ispell</literal> template:
|
<literal>english_ispell</literal> using the <literal>ispell</literal> template:
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TEXT SEARCH DICTIONARY en_ispell (
|
CREATE TEXT SEARCH DICTIONARY english_ispell (
|
||||||
TEMPLATE = ispell,
|
TEMPLATE = ispell,
|
||||||
DictFile = english-utf8,
|
DictFile = english,
|
||||||
AffFile = english-utf8,
|
AffFile = english,
|
||||||
StopWords = english-utf8
|
StopWords = english
|
||||||
);
|
|
||||||
</programlisting>
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>
|
|
||||||
We can use the same stop word list for the <application>Snowball</> stemmer
|
|
||||||
<literal>en_stem</literal>, which is available by default:
|
|
||||||
|
|
||||||
<programlisting>
|
|
||||||
ALTER TEXT SEARCH DICTIONARY en_stem (
|
|
||||||
StopWords = english-utf8
|
|
||||||
);
|
);
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
@ -2570,7 +2563,7 @@ Now modify mappings for Latin words for configuration <literal>pg</>:
|
|||||||
<programlisting>
|
<programlisting>
|
||||||
ALTER TEXT SEARCH CONFIGURATION pg
|
ALTER TEXT SEARCH CONFIGURATION pg
|
||||||
ALTER MAPPING FOR lword, lhword, lpart_hword
|
ALTER MAPPING FOR lword, lhword, lpart_hword
|
||||||
WITH pg_dict, en_ispell, en_stem;
|
WITH pg_dict, english_ispell, english_stem;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -2759,10 +2752,10 @@ the transitive containment relation <!-- huh --> is realized by
|
|||||||
superimposed coding (Knuth, 1973) of signatures, i.e., a parent is the
|
superimposed coding (Knuth, 1973) of signatures, i.e., a parent is the
|
||||||
result of 'OR'-ing the bit-strings of all children. This is a second
|
result of 'OR'-ing the bit-strings of all children. This is a second
|
||||||
factor of lossiness. It is clear that parents tend to be full of
|
factor of lossiness. It is clear that parents tend to be full of
|
||||||
<literal>'1'</>s (degenerates) and become quite useless because of the
|
<literal>1</>s (degenerates) and become quite useless because of the
|
||||||
limited selectivity. Searching is performed as a bit comparison of a
|
limited selectivity. Searching is performed as a bit comparison of a
|
||||||
signature representing the query and an <literal>RD-tree</literal> entry.
|
signature representing the query and an <literal>RD-tree</literal> entry.
|
||||||
If all <literal>'1'</>s of both signatures are in the same position we
|
If all <literal>1</>s of both signatures are in the same position we
|
||||||
say that this branch probably matches the query, but if there is even one
|
say that this branch probably matches the query, but if there is even one
|
||||||
discrepancy we can definitely reject this branch.
|
discrepancy we can definitely reject this branch.
|
||||||
</para>
|
</para>
|
||||||
@ -2870,13 +2863,15 @@ The current limitations of Full Text Searching are:
|
|||||||
|
|
||||||
<para>
|
<para>
|
||||||
For comparison, the <productname>PostgreSQL</productname> 8.1 documentation
|
For comparison, the <productname>PostgreSQL</productname> 8.1 documentation
|
||||||
consists of 10,441 unique words, a total of 335,420 words, and the most frequent word
|
contained 10,441 unique words, a total of 335,420 words, and the most frequent
|
||||||
'postgresql' is mentioned 6,127 times in 655 documents.
|
word <quote>postgresql</> was mentioned 6,127 times in 655 documents.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<!-- TODO we need to put a date on these numbers? -->
|
||||||
<para>
|
<para>
|
||||||
Another example - the <productname>PostgreSQL</productname> mailing list archives
|
Another example — the <productname>PostgreSQL</productname> mailing list
|
||||||
consists of 910,989 unique words with 57,491,343 lexemes in 461,020 messages.
|
archives contained 910,989 unique words with 57,491,343 lexemes in 461,020
|
||||||
|
messages.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
@ -2942,28 +2937,27 @@ names and object names. The following examples illustrate this:
|
|||||||
=> \dF+ russian
|
=> \dF+ russian
|
||||||
Configuration "pg_catalog.russian"
|
Configuration "pg_catalog.russian"
|
||||||
Parser name: "pg_catalog.default"
|
Parser name: "pg_catalog.default"
|
||||||
Locale: 'ru_RU.UTF-8' (default)
|
|
||||||
Token | Dictionaries
|
Token | Dictionaries
|
||||||
--------------+-------------------------
|
--------------+-------------------------
|
||||||
email | pg_catalog.simple
|
email | pg_catalog.simple
|
||||||
file | pg_catalog.simple
|
file | pg_catalog.simple
|
||||||
float | pg_catalog.simple
|
float | pg_catalog.simple
|
||||||
host | pg_catalog.simple
|
host | pg_catalog.simple
|
||||||
hword | pg_catalog.ru_stem_utf8
|
hword | pg_catalog.russian_stem
|
||||||
int | pg_catalog.simple
|
int | pg_catalog.simple
|
||||||
lhword | public.tz_simple
|
lhword | public.tz_simple
|
||||||
lpart_hword | public.tz_simple
|
lpart_hword | public.tz_simple
|
||||||
lword | public.tz_simple
|
lword | public.tz_simple
|
||||||
nlhword | pg_catalog.ru_stem_utf8
|
nlhword | pg_catalog.russian_stem
|
||||||
nlpart_hword | pg_catalog.ru_stem_utf8
|
nlpart_hword | pg_catalog.russian_stem
|
||||||
nlword | pg_catalog.ru_stem_utf8
|
nlword | pg_catalog.russian_stem
|
||||||
part_hword | pg_catalog.simple
|
part_hword | pg_catalog.simple
|
||||||
sfloat | pg_catalog.simple
|
sfloat | pg_catalog.simple
|
||||||
uint | pg_catalog.simple
|
uint | pg_catalog.simple
|
||||||
uri | pg_catalog.simple
|
uri | pg_catalog.simple
|
||||||
url | pg_catalog.simple
|
url | pg_catalog.simple
|
||||||
version | pg_catalog.simple
|
version | pg_catalog.simple
|
||||||
word | pg_catalog.ru_stem_utf8
|
word | pg_catalog.russian_stem
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
@ -3112,43 +3106,43 @@ play with the standard <literal>english</literal> configuration.
|
|||||||
<programlisting>
|
<programlisting>
|
||||||
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
|
CREATE TEXT SEARCH CONFIGURATION public.english ( COPY = pg_catalog.english );
|
||||||
|
|
||||||
CREATE TEXT SEARCH DICTIONARY en_ispell (
|
CREATE TEXT SEARCH DICTIONARY english_ispell (
|
||||||
TEMPLATE = ispell,
|
TEMPLATE = ispell,
|
||||||
DictFile = english-utf8,
|
DictFile = english,
|
||||||
AffFile = english-utf8,
|
AffFile = english,
|
||||||
StopWords = english
|
StopWords = english
|
||||||
);
|
);
|
||||||
|
|
||||||
ALTER TEXT SEARCH CONFIGURATION public.english
|
ALTER TEXT SEARCH CONFIGURATION public.english
|
||||||
ALTER MAPPING FOR lword WITH en_ispell, en_stem;
|
ALTER MAPPING FOR lword WITH english_ispell, english_stem;
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
|
SELECT * FROM ts_debug('public.english','The Brightest supernovaes');
|
||||||
Alias | Description | Token | Dicts list | Lexized token
|
Alias | Description | Token | Dicts list | Lexized token
|
||||||
-------+---------------+-------------+---------------------------------------+---------------------------------
|
-------+---------------+-------------+---------------------------------------+---------------------------------
|
||||||
lword | Latin word | The | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {}
|
lword | Latin word | The | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {}
|
||||||
blank | Space symbols | | |
|
blank | Space symbols | | |
|
||||||
lword | Latin word | Brightest | {public.en_ispell,pg_catalog.en_stem} | public.en_ispell: {bright}
|
lword | Latin word | Brightest | {public.english_ispell,pg_catalog.english_stem} | public.english_ispell: {bright}
|
||||||
blank | Space symbols | | |
|
blank | Space symbols | | |
|
||||||
lword | Latin word | supernovaes | {public.en_ispell,pg_catalog.en_stem} | pg_catalog.en_stem: {supernova}
|
lword | Latin word | supernovaes | {public.english_ispell,pg_catalog.english_stem} | pg_catalog.english_stem: {supernova}
|
||||||
(5 rows)
|
(5 rows)
|
||||||
</programlisting>
|
</programlisting>
|
||||||
<para>
|
<para>
|
||||||
In this example, the word <literal>'Brightest'</> was recognized by a
|
In this example, the word <literal>Brightest</> was recognized by a
|
||||||
parser as a <literal>Latin word</literal> (alias <literal>lword</literal>)
|
parser as a <literal>Latin word</literal> (alias <literal>lword</literal>)
|
||||||
and came through the dictionaries <literal>public.en_ispell</> and
|
and came through the dictionaries <literal>public.english_ispell</> and
|
||||||
<literal>pg_catalog.en_stem</literal>. It was recognized by
|
<literal>pg_catalog.english_stem</literal>. It was recognized by
|
||||||
<literal>public.en_ispell</literal>, which reduced it to the noun
|
<literal>public.english_ispell</literal>, which reduced it to the noun
|
||||||
<literal>bright</literal>. The word <literal>supernovaes</literal> is unknown
|
<literal>bright</literal>. The word <literal>supernovaes</literal> is unknown
|
||||||
by the <literal>public.en_ispell</literal> dictionary so it was passed to
|
by the <literal>public.english_ispell</literal> dictionary so it was passed to
|
||||||
the next dictionary, and, fortunately, was recognized (in fact,
|
the next dictionary, and, fortunately, was recognized (in fact,
|
||||||
<literal>public.en_stem</literal> is a stemming dictionary and recognizes
|
<literal>public.english_stem</literal> is a stemming dictionary and recognizes
|
||||||
everything; that is why it was placed at the end of the dictionary stack).
|
everything; that is why it was placed at the end of the dictionary stack).
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
The word <literal>The</literal> was recognized by <literal>public.en_ispell</literal>
|
The word <literal>The</literal> was recognized by <literal>public.english_ispell</literal>
|
||||||
dictionary as a stop word (<xref linkend="textsearch-stopwords">) and will not be indexed.
|
dictionary as a stop word (<xref linkend="textsearch-stopwords">) and will not be indexed.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -3159,11 +3153,11 @@ SELECT "Alias", "Token", "Lexized token"
|
|||||||
FROM ts_debug('public.english','The Brightest supernovaes');
|
FROM ts_debug('public.english','The Brightest supernovaes');
|
||||||
Alias | Token | Lexized token
|
Alias | Token | Lexized token
|
||||||
-------+-------------+---------------------------------
|
-------+-------------+---------------------------------
|
||||||
lword | The | public.en_ispell: {}
|
lword | The | public.english_ispell: {}
|
||||||
blank | |
|
blank | |
|
||||||
lword | Brightest | public.en_ispell: {bright}
|
lword | Brightest | public.english_ispell: {bright}
|
||||||
blank | |
|
blank | |
|
||||||
lword | supernovaes | pg_catalog.en_stem: {supernova}
|
lword | supernovaes | pg_catalog.english_stem: {supernova}
|
||||||
(5 rows)
|
(5 rows)
|
||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
Loading…
x
Reference in New Issue
Block a user