Commit Graph

40 Commits

Author SHA1 Message Date
abhinav
357f7b44ef Encapsulate all the arguments required by the query callback function in a struct.
If we want to add or remove arguments from the callback functions, it requires
changing the callback interface all over the place. By letting the callback simply
expect a single struct argument, it would clean things up a bit.

ok christos
2017-11-25 14:29:38 +00:00
abhinav
f56c37233e Don't use the custom tokenizer when compiled with debugging on
Using the custom tokenizer means one cannot interactively query the database
through the SQLite shell, thus thwarting the purpose of the debug build option.

Thanks to leot@ for reporting it.

(While there change the debug macro from DEBUG to APROPOS_DEBUG)
2017-08-01 16:16:32 +00:00
abhinav
188f922ddf Add a custom tokenizer which does not stem certain keywords.
Which keywords should not be stemmed is specified in the nostem.txt file.
(Right now I have taken all the man page names, split them if they had
underscores, removed common English words and converted everything to
lowercase.)

The tokenizer itself is based on the Porter stemming tokenizer shipped with
Sqlite. The code in custom_apropos_tokenizer.c is copy of that code with
some modifications to prevent stemming keywords specified in nostem.txt.

Additionally, it now uses underscore `_' also as a token delimiter. Therefore,
now it's possible to do query for `lwp' and all `_lwp_*' man page names
will be matched. Or the query can be `unconst' and `__UNCONST' will be matched.
This was not possible earlier, because underscore was not a delimiter and therefore
the index would have __UNCONST as a key rather than UNCONST.

The tokenizer needs fts3_tokenizer.h file, which is not shipped with the
amalgamation build of Sqlite, therefore it needs to be added here (unless
we decide there is a better place for it).

To enforce using the new tokenizer, a schema version bump is needed

Since the tokenization is done both at the indexing time (via makemandb) and
also while query time (via apropos or whatis), it will be needed to bump
the schema version everytime nostem.txt is modified. Otherwise the
index will consist of old tokens and desired changes will not be seen with
apropos.

This should also fix the issue reported in PR bin/46255. Similar suggestion was
also made on tech-userlevel@ recently:
<http://mail-index.netbsd.org/tech-userlevel/2017/06/08/msg010620.html>

Thanks to christos@ for multiple rounds of reviews of the tokenizer code.
2017-06-18 16:24:10 +00:00
abhinav
1373f782a3 Simplify handling of the section arguments in apropos(1).
Earlier, a white space separated string was generated containing all the section
numbers passed through command line arguments. Later on that would have to be
tokenized and processed. Instead of that, use a NULL terminated array of strings.

Thanks to christos@ for reviewing and suggesting further improvements.
2017-05-01 05:28:00 +00:00
abhinav
c08af49426 Simplify 2017-04-30 16:56:30 +00:00
abhinav
b8c9b20183 Instead of dereferencing the pointer passed in as function argument, use a
temporary local buffer. Saves the cost of pointer dereferencing at so many places.
2017-04-30 15:27:24 +00:00
abhinav
ba948c919e Update the comment to be in sync with the code. 2017-04-30 14:53:58 +00:00
abhinav
3c0134393a Use sqlite3_mprintf() to generate SQL query instead of asprintf. 2017-04-30 14:49:26 +00:00
abhinav
231f71fb95 Disable the database compression if DEBUG is defined.
When debugging makemandb(8), it helps to be able to view the text being
stored in the database.
2017-04-27 08:02:24 +00:00
abhinav
e70b83fc18 Better handle MLINKS in apropos(1).
apropos(1) only indexes the first .Nm entry from the NAME section in the full
text index. Rest of the .Nm entries are stored in a separate table: mandb_links.

Till now apropos(1) did not use the mandb_links table. So whenever a query
was being made for one of the man page links, such as realloc(3), it was showing
malloc(3) in the results but not as the first result. And, also the result would
show up as malloc(3), rather than realloc(3) (which can be confusing).

With this change, for single keyword queries, apropos(1) would now utilise the
mandb_links table as well. If the query is for one of the links of a man page,
it would show as the first result. Also, the result would show up as the name
of the link rather than the original man page name. For example, if the query
was for realloc, the output would be realloc(3), rather than malloc(3).

Following are some example queries showing difference in the output before this
change and after this change:

#Before changes
$ apropos -n 5 -M realloc
reallocarr (3)    reallocate array
reallocarray (3)  reallocate memory for an array of elements checking
for overflow
fgetwln (3)       get a line of wide characters from a stream
fgetln (3)        get a line from a stream
posix_memalign (3)        aligned memory allocation

#After changes
$ ./apropos -n 5 -M realloc
realloc (3)       general memory allocation operations
realloc (3)       general purpose memory allocation functions
realloc (9)       general-purpose kernel memory allocator
reallocarr (3)    reallocate array
reallocarray (3)  reallocate memory for an array of elements checking
for overflow

#Before changes
$ apropos -n 5 -M TAILQ_REMOVE
SLIST_HEAD (3) implementations of singly-linked lists, lists, simple
queues, tail queues, and singly-linked tail queues

#After changes
$ ./apropos -n 5 -M TAILQ_REMOVE
TAILQ_REMOVE (3)  implementations of singly-linked lists, lists,
simple queues, tail queues, and singly-linked tail queues

#Before changes
$ apropos -n 5 -M falloc
filedesc (9)      file descriptor tables and operations
file (9)  operations on file entries

#After changes
$ ./apropos -n 5 -M falloc
falloc (9)        file descriptor tables and operations
file (9)  operations on file entries

ok christos@
2017-04-23 13:52:57 +00:00
kamil
2fe964ca6f Include <unistd.h> for R_OK W_OK STDOUT_FILENO access(2)
These symbols are undefined after switch to new zlib.
2017-01-10 04:34:07 +00:00
abhinav
1c4ff59f37 Mark the section and md5_hash columns as unindexed in the FTS table, as they are not used for search 2016-10-03 13:36:35 +00:00
abhinav
e4c681d955 Fix an off by one issue when concatenating strings. 2016-07-06 18:03:27 +00:00
abhinav
84549e3f9b Fix possible buffer overflow when concatenating strings.
Patch from christos@
2016-07-06 06:57:40 +00:00
abhinav
4647c1ec31 Refactor the function for executing the search SQL query into two parts.
One part is responsible for generating the SQL query
The other part is responsible for executing the generated query.

While there, also remove a comment which is not valid anymore.
And, don't call the snippet function when doing legacy mode search as we are
not using the full text feature there.
2016-06-01 15:59:18 +00:00
christos
2c6689d2dc CID 1358675: Wrong variable test 2016-04-24 18:11:43 +00:00
christos
5e64704ab9 PR/51062: Abhinav Upadhyay: Allow non numeric sections to be indexed and
searched by apropos(1).
Fold long lines.
2016-04-13 11:48:29 +00:00
christos
90f8d04e63 PR/51038: Abhinav Upadhyay: check for access permissions to the sqlite database 2016-04-13 01:37:50 +00:00
christos
b7d6e6d52a PR/51025: Abhinav Upadhyay: Remove unused includes from apropos-utils.c 2016-03-31 20:16:58 +00:00
christos
751d5fc660 PR/51004: Abhinav Upadhyay: apropos html mode doesn't handle especial
characters in the short description
2016-03-24 16:07:13 +00:00
christos
533b5973e2 PR/50460: Abhinav Upadhyay: Fix legacy apropos query to match both the name
and the one line description and delete extra args.
2016-03-20 17:31:09 +00:00
christos
48e922c8f8 CID 1341551: Don't bother formatting if ti == NULL 2015-12-03 21:01:50 +00:00
christos
62025e09ce PR/50344: Stephen Fisher: apropos shows formatting on console with vt100 term
type. Can't print terminfo sequences directly; need to process them with
ti_puts() to handle padding. This removes the padding delays, and stricly
could break on slow terminal hardware, but they way the code is structured
makes it impossible to fix properly (since the formatting strings are
passed in the query).
XXX: pullup-7
2015-11-23 22:34:00 +00:00
snj
f0a7346d21 src is too big these days to tolerate superfluous apostrophes. It's
"its", people!
2014-10-18 08:33:23 +00:00
wiz
fc7115c3f4 Fix an off by one bug in apropos.
The bug is in the html output where some garbage characters are
seen in the context match output.

From Abhinav Upadhyay in PR 49058.
2014-08-01 12:55:00 +00:00
christos
910ecac4db instead of having a format and no format flag, and exposing various formatters,
provide a format enum and expose html formatting too.
2013-04-02 17:16:50 +00:00
christos
82fc5158a9 fix legacy mode in pager filter. (don't ul format if we are not formatting). 2013-03-29 20:46:07 +00:00
christos
2b42c8b2ee - Fix legacy mode to use like instead of match. This loses ranking.
- default to unlimited lines
- fix formatting of legacy mode
2013-03-29 20:37:00 +00:00
christos
cb0641eb5a - If the stdout is not a tty, prevent formatting unless forced with -i
- Don't ever page unless asked for with -p
- Introduce "legacy mode" (-l)
  1. searches only name and name_desc, prints name(section) - name_description
  2. turns off escape formatting (can be forced on with -i)
  3. turns off context printing (can be forced on with -c)
- Parse the environment $APROPOS variable as an argument vector.

With these changes one can simply 'export APROPOS=-l' and get the old apropos
behavior.
2013-03-29 20:07:31 +00:00
christos
9d0d34e51f add -r flag to elide tty formatting 2013-02-10 23:58:27 +00:00
christos
9d8fe63b1b remove trailing whitespace 2013-02-10 23:24:18 +00:00
christos
6265ee0d3c - move the terminal handling in apropos-utils.c since htmp and pager are also
handled there.
- underline the name, section, and description so that it is prettier.
- change to bold terminal the terminal highlighting to match with less
2013-01-14 21:26:25 +00:00
christos
cc03b84d06 Since mdocml decided to name headers that conflict with system ones (term.h)
move the header inclusion one up.
2013-01-14 18:01:59 +00:00
wiz
b1203a9851 Make mandb path configurable. makemandb (and related tools) use
the path from the _mandb variable from man.conf now.

Set _mandb in man.conf to same value as was used before.

From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>.
2012-10-06 15:33:59 +00:00
joerg
52222de363 KNF 2012-05-10 15:36:09 +00:00
wiz
d099c69274 PR 46419 by Abhinav Upadhyay using his updated patch:
Clean up after removing man page aliases.
2012-05-07 11:18:16 +00:00
wiz
f41e473d4b Handle pages with slashes in their names better.
From Abhinav Upadhyay in private mail.
2012-04-15 15:56:52 +00:00
apb
d0663c218f Add the result from sqlite3_errmsg() to some error messages.
Now we can get "apropos: Unable to query schema version: database is locked"
instead of just "apropos: Unable to query schema version".
2012-04-07 10:44:58 +00:00
joerg
329b37d502 Fix C&P error with $NetBSD$ 2012-02-07 19:17:16 +00:00
joerg
410d0f4380 Import the new apropos/whatis.
This code has been developed by Abhinav Upadhyay as part of Google's Summer
of Code 2011. It uses libmandoc to parse man pages and builds a Full
Text Index in a SQLite database. The combination of indexing the full
manual page, filtering out stop words and ranking individual matches
based on the section gives a much improved user experience.

The old makewhatis and friends are kept under MKMAKEMANDB=no for now.
2012-02-07 19:13:24 +00:00