Adjust text search documentation for recent commits.
Fix some now-obsolete statements that were overlooked in commits 6734a1cac, 3dbbd0f02, 028350f61. Document the behavior of <0>. Also do a little bit of rearranging and copy-editing for clarity.
This commit is contained in:
parent
8dee039fa1
commit
4242a715c3
@ -3885,12 +3885,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
|
||||
|
||||
<para>
|
||||
It is important to understand that the
|
||||
<type>tsvector</type> type itself does not perform any normalization;
|
||||
it assumes the words it is given are normalized appropriately
|
||||
for the application. For example,
|
||||
<type>tsvector</type> type itself does not perform any word
|
||||
normalization; it assumes the words it is given are normalized
|
||||
appropriately for the application. For example,
|
||||
|
||||
<programlisting>
|
||||
select 'The Fat Rats'::tsvector;
|
||||
SELECT 'The Fat Rats'::tsvector;
|
||||
tsvector
|
||||
--------------------
|
||||
'Fat' 'Rats' 'The'
|
||||
@ -3929,12 +3929,20 @@ SELECT to_tsvector('english', 'The Fat Rats');
|
||||
<literal><-></> (FOLLOWED BY). There is also a variant
|
||||
<literal><<replaceable>N</>></literal> of the FOLLOWED BY
|
||||
operator, where <replaceable>N</> is an integer constant that
|
||||
specifies a maximum distance between the two lexemes being searched
|
||||
specifies the distance between the two lexemes being searched
|
||||
for. <literal><-></> is equivalent to <literal><1></>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Parentheses can be used to enforce grouping of the operators:
|
||||
Parentheses can be used to enforce grouping of these operators.
|
||||
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
|
||||
<literal><-></literal> (FOLLOWED BY) next most tightly, then
|
||||
<literal>&</literal> (AND), with <literal>|</literal> (OR) binding
|
||||
the least tightly.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Here are some examples:
|
||||
|
||||
<programlisting>
|
||||
SELECT 'fat & rat'::tsquery;
|
||||
@ -3951,17 +3959,21 @@ SELECT 'fat & rat & ! cat'::tsquery;
|
||||
tsquery
|
||||
------------------------
|
||||
'fat' & 'rat' & !'cat'
|
||||
|
||||
SELECT '(fat | rat) <-> cat'::tsquery;
|
||||
tsquery
|
||||
-----------------------------------
|
||||
'fat' <-> 'cat' | 'rat' <-> 'cat'
|
||||
</programlisting>
|
||||
|
||||
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
|
||||
and <literal>&</literal> (AND) and <literal><-></literal> (FOLLOWED BY)
|
||||
both bind more tightly than <literal>|</literal> (OR).
|
||||
The last example demonstrates that <type>tsquery</type> sometimes
|
||||
rearranges nested operators into a logically equivalent formulation.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Optionally, lexemes in a <type>tsquery</type> can be labeled with
|
||||
one or more weight letters, which restricts them to match only
|
||||
<type>tsvector</> lexemes with matching weights:
|
||||
<type>tsvector</> lexemes with one of those weights:
|
||||
|
||||
<programlisting>
|
||||
SELECT 'fat:ab & cat'::tsquery;
|
||||
@ -3981,25 +3993,7 @@ SELECT 'super:*'::tsquery;
|
||||
'super':*
|
||||
</programlisting>
|
||||
This query will match any word in a <type>tsvector</> that begins
|
||||
with <quote>super</>. Note that prefixes are first processed by
|
||||
text search configurations, which means this comparison returns
|
||||
true:
|
||||
<programlisting>
|
||||
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
|
||||
?column?
|
||||
----------
|
||||
t
|
||||
(1 row)
|
||||
</programlisting>
|
||||
because <literal>postgres</> gets stemmed to <literal>postgr</>:
|
||||
<programlisting>
|
||||
SELECT to_tsquery('postgres:*');
|
||||
to_tsquery
|
||||
------------
|
||||
'postgr':*
|
||||
(1 row)
|
||||
</programlisting>
|
||||
which then matches <literal>postgraduate</>.
|
||||
with <quote>super</>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -4015,6 +4009,24 @@ SELECT to_tsquery('Fat:ab & Cats');
|
||||
------------------
|
||||
'fat':AB & 'cat'
|
||||
</programlisting>
|
||||
|
||||
Note that <function>to_tsquery</> will process prefixes in the same way
|
||||
as other words, which means this comparison returns true:
|
||||
|
||||
<programlisting>
|
||||
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
|
||||
?column?
|
||||
----------
|
||||
t
|
||||
</programlisting>
|
||||
because <literal>postgres</> gets stemmed to <literal>postgr</>:
|
||||
<programlisting>
|
||||
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
|
||||
to_tsvector | to_tsquery
|
||||
---------------+------------
|
||||
'postgradu':1 | 'postgr':*
|
||||
</programlisting>
|
||||
which will match the stemmed form of <literal>postgraduate</>.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
@ -322,8 +322,7 @@ text @@ text
|
||||
match. Similarly, the <literal>|</literal> (OR) operator specifies that
|
||||
at least one of its arguments must appear, while the <literal>!</> (NOT)
|
||||
operator specifies that its argument must <emphasis>not</> appear in
|
||||
order to have a match. Parentheses can be used to control nesting of
|
||||
these operators.
|
||||
order to have a match.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
@ -346,10 +345,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal <-> error');
|
||||
|
||||
There is a more general version of the FOLLOWED BY operator having the
|
||||
form <literal><<replaceable>N</>></literal>,
|
||||
where <replaceable>N</> is an integer standing for the exact distance
|
||||
allowed between the matching lexemes. <literal><1></literal> is
|
||||
where <replaceable>N</> is an integer standing for the difference between
|
||||
the positions of the matching lexemes. <literal><1></literal> is
|
||||
the same as <literal><-></>, while <literal><2></literal>
|
||||
allows one other lexeme to appear between the matches, and so
|
||||
allows exactly one other lexeme to appear between the matches, and so
|
||||
on. The <literal>phraseto_tsquery</> function makes use of this
|
||||
operator to construct a <literal>tsquery</> that can match a multi-word
|
||||
phrase when some of the words are stop words. For example:
|
||||
@ -366,9 +365,17 @@ SELECT phraseto_tsquery('the cats ate the rats');
|
||||
'cat' <-> 'ate' <2> 'rat'
|
||||
</programlisting>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&</literal>,
|
||||
<literal><-></literal>, <literal>!</literal>.
|
||||
A special case that's sometimes useful is that <literal><0></literal>
|
||||
can be used to require that two patterns match the same word.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Parentheses can be used to control nesting of the <type>tsquery</>
|
||||
operators. Without parentheses, <literal>|</literal> binds least tightly,
|
||||
then <literal>&</literal>, then <literal><-></literal>,
|
||||
and <literal>!</literal> most tightly.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
@ -1423,9 +1430,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
|
||||
lacks any position or weight information. The result is usually much
|
||||
smaller than an unstripped vector, but it is also less useful.
|
||||
Relevance ranking does not work as well on stripped vectors as
|
||||
unstripped ones. Also, when given stripped input,
|
||||
unstripped ones. Also,
|
||||
the <literal><-></> (FOLLOWED BY) <type>tsquery</> operator
|
||||
effectively degenerates to a simple <literal>&</> (AND) test.
|
||||
will never match stripped input, since it cannot determine the
|
||||
distance between lexeme occurrences.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user