140 lines
5.7 KiB
HTML
140 lines
5.7 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>Linux Cross-Reference</TITLE>
|
|
</HEAD>
|
|
<BODY BGCOLOR=WHITE>
|
|
<H1 ALIGN=CENTER>
|
|
Help doing searches<BR>
|
|
<A HREF="http:/source/">
|
|
<I>Browse the code</I></A></H1>
|
|
|
|
|
|
<I> This text is directly stolen from the Glimpse manual page. I have
|
|
tried to remove things that do not apply to the lxr searcher, but
|
|
beware, some things might have slipped through. I'll try to put
|
|
together something better when I get the time. For more information on
|
|
glimpse go to the <A HREF="http://glimpse.cs.arizona.edu">Glimpse
|
|
homepage</A>.</I>
|
|
|
|
<A NAME="Patterns"></A><H2>Patterns</H2>
|
|
<P>
|
|
glimpse supports a large variety of patterns, including simple
|
|
strings, strings with classes of characters, sets of strings,
|
|
wild cards, and regular expressions (see <A HREF="#Limitations">Limitations</A>).
|
|
|
|
<P> <H3>Strings</H3>
|
|
|
|
Strings are any sequence of characters, including the special symbols
|
|
`^' for beginning of line and `$' for end of line. The following
|
|
special characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and
|
|
`\' ) as well as the following meta characters special to glimpse (and
|
|
agrep): `;', `,', `#', `>', `<', `-', and `.', should be preceded by
|
|
`\\' if they are to be matched as regular characters. For example,
|
|
\\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds
|
|
to the string abc at the beginning of a line.
|
|
|
|
<P> <H3>Classes of characters</H3>
|
|
a list of characters inside [] (in order) corresponds to any character
|
|
from the list. For example, [a-ho-z] is any character between a and h
|
|
or between o and z. The symbol `^' inside [] complements the list.
|
|
For example, [^i-n] denote any character in the character set except
|
|
character 'i' to 'n'.
|
|
The symbol `^' thus has two meanings, but this is consistent with
|
|
egrep.
|
|
The symbol `.' (don't care) stands for any symbol (except for the
|
|
newline symbol).
|
|
|
|
<P> <H3>Boolean operations</H3>
|
|
Glimpse
|
|
supports an `AND' operation denoted by the symbol `;'
|
|
an `OR' operation denoted by the symbol `,',
|
|
a limited version of a 'NOT' operation (starting at version 4.0B1)
|
|
denoted by the symbol `~',
|
|
or any combination.
|
|
For example, pizza;cheeseburger' will output all lines containing
|
|
both patterns.
|
|
'{political,computer};science' will match 'political science'
|
|
or 'science of computers'.
|
|
|
|
|
|
<P><H3>Wild cards</H3>
|
|
The symbol '#' is used to denote a sequence
|
|
of any number (including 0)
|
|
of arbitrary characters (see <A HREF="#Limitations">Limitations</A>).
|
|
The symbol # is equivalent to .* in egrep.
|
|
In fact, .* will work too, because it is a valid regular expression
|
|
(see below), but unless this is part of an actual regular expression,
|
|
# will work faster.
|
|
(Currently glimpse is experiencing some problems with #.)
|
|
|
|
|
|
<P><H3>Combination of exact and approximate matching</H3>
|
|
Any pattern inside angle brackets <> must match the text exactly even
|
|
if the match is with errors. For example, <mathemat>ics matches
|
|
mathematical with one error (replacing the last s with an a), but
|
|
mathe<matics> does not match mathematical no matter how many errors are
|
|
allowed. (This option is buggy at the moment.)
|
|
|
|
|
|
<H3>Regular expressions</H3>
|
|
Since the index is word based, a regular expression must match words
|
|
that appear in the index for glimpse to find it. Glimpse first strips
|
|
the regular expression from all non-alphabetic characters, and
|
|
searches the index for all remaining words. It then applies the
|
|
regular expression matching algorithm to the files found in the index.
|
|
For example, glimpse 'abc.*xyz' will search the index for all files
|
|
that contain both 'abc' and 'xyz', and then search directly for
|
|
'abc.*xyz' in those files. (If you use glimpse -w 'abc.*xyz', then
|
|
'abcxyz' will not be found, because glimpse will think that abc and
|
|
xyz need to be matches to whole words.) The syntax of regular
|
|
expressions in glimpse is in general the same as that for agrep. The
|
|
union operation `|', Kleene closure `*', and parentheses () are all
|
|
supported. Currently '+' is not supported. Regular expressions are
|
|
currently limited to approximately 30 characters (generally excluding
|
|
meta characters). The maximal number of errors
|
|
for regular expressions that use '*' or '|' is 4.
|
|
|
|
<P>
|
|
<A NAME="Limitations"></A><H2>Limitations</H2>
|
|
The index of glimpse is word based. A pattern that contains more than
|
|
one word cannot be found in the index. The way glimpse overcomes this
|
|
weakness is by splitting any multi-word pattern into its set of words
|
|
and looking for all of them in the index.
|
|
For example, <I>'linear programming'</I> will first consult the index
|
|
to find all files containing both <I>linear</I> and <I>programming</I>,
|
|
and then apply agrep to find the combined pattern.
|
|
This is usually an effective solution, but it can be slow for
|
|
cases where both words are very common, but their combination is not.
|
|
|
|
<P>
|
|
As was mentioned in the section on <A HREF="#Patterns">Patterns</A> above, some characters
|
|
serve as meta characters for glimpse and need to be
|
|
preceded by '\\' to search for them. The most common
|
|
examples are the characters '.' (which stands for a wild card),
|
|
and '*' (the Kleene closure).
|
|
So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
|
|
will not, and "glimpse ab*de" will not match ab*de, but
|
|
"glimpse ab\\*de" will.
|
|
The meta character - is translated automatically to a hypen
|
|
unless it appears between [] (in which case it denotes a range of
|
|
characters).
|
|
|
|
<P>
|
|
There is no size limit for simple patterns and simple patterns
|
|
within Boolean expressions.
|
|
More complicated patterns, such as regular expressions,
|
|
are currently limited to approximately 30 characters.
|
|
Lines are limited to 1024 characters.
|
|
|
|
<P>
|
|
<HR>
|
|
|
|
<ADDRESS>
|
|
<A HREF="mailto:lxr@linux.no">
|
|
Arne Georg Gleditsch and Per Kristian Gjermshus</A>
|
|
</ADDRESS>
|
|
|
|
</BODY>
|
|
</HTML>
|