Commit Graph

140 Commits

Author SHA1 Message Date
abhinav
b0ca50fb4d PR bin/54343: We want the callback_args.machine to be NULL if it is not present in the DB.
The previous commit fixed the problem of allowing apropos to not crash and
produce output even if the database is missing values for certain mandatory
fields, such as name, section etc. Normally we don't expect those values
to be missing in the database but in case of parsing errors it can happen.

However, the machine architecture is an optional field since not all man pages
are hardware specific so that should be allowed to be set to NULL if not
present in the database.
2019-08-18 09:14:30 +00:00
christos
2d0aa66b2f PR/54343: Prevent NULL pointers in callback strings; use "*?*" for now to
identify them.
2019-08-15 10:29:07 +00:00
leot
a5fb0c00f0 Properly free section_clause. 2019-06-07 16:43:58 +00:00
leot
2710d0dc9d Document name_desc attribute of mandb_links.
Discussed with <abhinav> via PR misc/54213, thanks!
2019-05-18 10:38:04 +00:00
leot
83d9007765 Reintroduce case insensitive comparison of name accidentally lost in last
revision.

Discussed with <abhinav> via PR misc/54213, thanks!
2019-05-18 10:28:57 +00:00
abhinav
933b5da267 PR misc/54213: Fix performance of whatis(1) when no matches are found
In revision 1.6 of whatis.c the query was modified to return matches for names found
in MLINKS of the man pages as well. However it was slow. The reason probably being that it
required a join. But more importantly the where condition on an FTS virtual table column
is very slow. To avoid the join and the expensive where condition on the virtual table,
add the name_desc column to the mandb_links table as well. This improves the performance
of whatis(1) to the original level at the expense of slight data duplication.

Bump the schema to force database rebuild to take account for the new column addition
2019-05-18 07:56:43 +00:00
kamil
0c003f5999 Add a C99 symbol to libm: nexttowardl
It's an alias for an already existing symbol nextafterl.

Patch obtained from <mgorny>

Detected by the LLVM buildbot breakage in tests.
2019-04-27 23:04:31 +00:00
abhinav
6947938705 Memory allocated by sqlite3_mprintf should be free'd by sqlite3_free
This was causing memory corruption thus making apropos(1) fail in some cases.
Specifically following options were broken and should be fixed with this commit:

-n option was causing a core dump
apropos was giving warning when using -l and any of the section numbers as options
as reported by paulg on current-users.
2019-04-19 20:35:13 +00:00
abhinav
496b8ce373 Set the snippet_length field of the callback_args
Because of this field not being set, apropos was failing to show snippet when piped to a pager
or when used with -p argument.
2019-04-14 07:59:56 +00:00
christos
7f6ee53058 remove unneeded header. 2019-03-11 00:31:36 +00:00
christos
059d37ece9 adjust to the new mandoc api 2019-03-11 00:14:44 +00:00
christos
e6b2ce53d6 fix memory allocation problems detected by jemalloc... 2019-03-07 22:08:59 +00:00
abhinav
50d4d47f30 Adjust makemandb for the latest mandoc
ok christos@
2018-08-24 16:01:57 +00:00
kre
6a7c82e6b2 In the latest mandoc (mdocml) the signature (prototype) of
mparse_alloc() altered - update the call here to compensate.

This fixes the build (of makemandb), but I am not sure that
the changed version is what is desired - someone who knows
something about all of this should validate ... I just copied
the invocation from mandoc's demandoc.c (which seems likely
to be at least a similar kind of usage).
2018-08-16 05:07:22 +00:00
abhinav
357f7b44ef Encapsulate all the arguments required by the query callback function in a struct.
If we want to add or remove arguments from the callback functions, it requires
changing the callback interface all over the place. By letting the callback simply
expect a single struct argument, it would clean things up a bit.

ok christos
2017-11-25 14:29:38 +00:00
abhinav
b0184879c2 Casting variable of type int * to size_t *, may cause
alignment issues on some platforms (e.g. Sparc64)
So, Use a temporary variable to avoid the cast.

Thanks to Martin@ for noticing the issue and also suggesting the issue.
Fixes PR bin/52678
2017-10-31 10:14:27 +00:00
jmcneill
1385e4296e Make the 'no results found' message sound less harsh.
Changes "try using better keywords" to "try using different keywords".
2017-08-02 12:52:18 +00:00
abhinav
f56c37233e Don't use the custom tokenizer when compiled with debugging on
Using the custom tokenizer means one cannot interactively query the database
through the SQLite shell, thus thwarting the purpose of the debug build option.

Thanks to leot@ for reporting it.

(While there change the debug macro from DEBUG to APROPOS_DEBUG)
2017-08-01 16:16:32 +00:00
abhinav
188f922ddf Add a custom tokenizer which does not stem certain keywords.
Which keywords should not be stemmed is specified in the nostem.txt file.
(Right now I have taken all the man page names, split them if they had
underscores, removed common English words and converted everything to
lowercase.)

The tokenizer itself is based on the Porter stemming tokenizer shipped with
Sqlite. The code in custom_apropos_tokenizer.c is copy of that code with
some modifications to prevent stemming keywords specified in nostem.txt.

Additionally, it now uses underscore `_' also as a token delimiter. Therefore,
now it's possible to do query for `lwp' and all `_lwp_*' man page names
will be matched. Or the query can be `unconst' and `__UNCONST' will be matched.
This was not possible earlier, because underscore was not a delimiter and therefore
the index would have __UNCONST as a key rather than UNCONST.

The tokenizer needs fts3_tokenizer.h file, which is not shipped with the
amalgamation build of Sqlite, therefore it needs to be added here (unless
we decide there is a better place for it).

To enforce using the new tokenizer, a schema version bump is needed

Since the tokenization is done both at the indexing time (via makemandb) and
also while query time (via apropos or whatis), it will be needed to bump
the schema version everytime nostem.txt is modified. Otherwise the
index will consist of old tokens and desired changes will not be seen with
apropos.

This should also fix the issue reported in PR bin/46255. Similar suggestion was
also made on tech-userlevel@ recently:
<http://mail-index.netbsd.org/tech-userlevel/2017/06/08/msg010620.html>

Thanks to christos@ for multiple rounds of reviews of the tokenizer code.
2017-06-18 16:24:10 +00:00
abhinav
2f6fb75f1b Make the name comparison case insensitive.
(The old whatis(1) also used to do case insensitive string comparisons).
2017-05-23 15:27:54 +00:00
riastradh
ef315f7931 Remove MKCRYPTO option.
Originally, MKCRYPTO was introduced because the United States
classified cryptography as a munition and restricted its export.  The
export controls were substantially relaxed fifteen years ago, and are
essentially irrelevant for software with published source code.

In the intervening time, nobody bothered to remove the option after
its motivation -- the US export restriction -- was eliminated.  I'm
not aware of any other operating system that has a similar option; I
expect it is mainly out of apathy for churn that we still have it.
Today, cryptography is an essential part of modern computing -- you
can't use the internet responsibly without cryptography.

The position of the TNF board of directors is that TNF makes no
representation that MKCRYPTO=no satisfies any country's cryptography
regulations.

My personal position is that the availability of cryptography is a
basic human right; that any local laws restricting it to a privileged
few are fundamentally immoral; and that it is wrong for developers to
spend effort crippling cryptography to work around such laws.

As proposed on tech-crypto, tech-security, and tech-userlevel to no
objections:

https://mail-index.netbsd.org/tech-crypto/2017/05/06/msg000719.html
https://mail-index.netbsd.org/tech-security/2017/05/06/msg000928.html
https://mail-index.netbsd.org/tech-userlevel/2017/05/06/msg010547.html

P.S.  Reviewing all the uses of MKCRYPTO in src revealed a lot of
*bad* crypto that was conditional on it, e.g. DES in telnet...  That
should probably be removed too, but on the grounds that it is bad,
not on the grounds that it is (nominally) crypto.
2017-05-21 15:28:36 +00:00
abhinav
a46498cbc0 Get rid of unnecessary variable. 2017-05-10 12:09:52 +00:00
abhinav
0b6c27b077 We do need to copy the return value from dirname(3) since there it is a static
buffer and can be overwritten in between. I overzealously removed this in one
of my previous commits.
2017-05-02 13:54:08 +00:00
abhinav
520f86ec72 Avoid dereferencing pointer at multiple places, instead use a local variable. 2017-05-01 06:56:00 +00:00
abhinav
05f4872247 Remove the table name parameter from the check_md5 function.
There is only one table storing the md5 checksums, so we can hardcode the table
name instead of passing it as a function argument.
2017-05-01 06:43:56 +00:00
abhinav
1d50c960ff Avoid copying strings where it is not needed. 2017-05-01 05:52:33 +00:00
abhinav
1373f782a3 Simplify handling of the section arguments in apropos(1).
Earlier, a white space separated string was generated containing all the section
numbers passed through command line arguments. Later on that would have to be
tokenized and processed. Instead of that, use a NULL terminated array of strings.

Thanks to christos@ for reviewing and suggesting further improvements.
2017-05-01 05:28:00 +00:00
abhinav
c08af49426 Simplify 2017-04-30 16:56:30 +00:00
abhinav
b8c9b20183 Instead of dereferencing the pointer passed in as function argument, use a
temporary local buffer. Saves the cost of pointer dereferencing at so many places.
2017-04-30 15:27:24 +00:00
abhinav
ba948c919e Update the comment to be in sync with the code. 2017-04-30 14:53:58 +00:00
abhinav
3c0134393a Use sqlite3_mprintf() to generate SQL query instead of asprintf. 2017-04-30 14:49:26 +00:00
abhinav
e62bbc5df1 Avoid a call to strncmp when comparing only the first character of the string. 2017-04-30 08:41:18 +00:00
abhinav
116a5447e2 Bring the comment in sync with code (after changes brought by the last commit). 2017-04-29 16:49:51 +00:00
abhinav
c376a38e5b Don't parse Nm macro when it occurs anywhere outside the NAME section.
mandoc(3) already generates the text node representing the value for the .Nm macro.
Doing our own parsing for .Nm on top of that leads to large duplication of text
in the database. This gets specially worse for man pages with large NAME sections,
such as queue(3).
2017-04-29 14:43:09 +00:00
abhinav
231f71fb95 Disable the database compression if DEBUG is defined.
When debugging makemandb(8), it helps to be able to view the text being
stored in the database.
2017-04-27 08:02:24 +00:00
abhinav
b2c6ef38f4 Teach whatis(1) to handle MLINKS
Similar to apropos(1), whatis did not utilise the mandb_links table till now.
Therefore, if it was asked about one of the links to a man page, it would
error out. This change teaches whatis(1) to look up both the FTS table
as well as the links table, thus ensuring that it is able to answer queries
about MLINKS as well.

Comparision between outputs before this change and after this change:

#Before change
$ whatis realloc
realloc: not found

#after change
$ ./whatis realloc
realloc(3) - general memory allocation operations
realloc(3) - general purpose memory allocation functions
realloc(9) - general-purpose kernel memory allocator
2017-04-23 16:56:49 +00:00
abhinav
e70b83fc18 Better handle MLINKS in apropos(1).
apropos(1) only indexes the first .Nm entry from the NAME section in the full
text index. Rest of the .Nm entries are stored in a separate table: mandb_links.

Till now apropos(1) did not use the mandb_links table. So whenever a query
was being made for one of the man page links, such as realloc(3), it was showing
malloc(3) in the results but not as the first result. And, also the result would
show up as malloc(3), rather than realloc(3) (which can be confusing).

With this change, for single keyword queries, apropos(1) would now utilise the
mandb_links table as well. If the query is for one of the links of a man page,
it would show as the first result. Also, the result would show up as the name
of the link rather than the original man page name. For example, if the query
was for realloc, the output would be realloc(3), rather than malloc(3).

Following are some example queries showing difference in the output before this
change and after this change:

#Before changes
$ apropos -n 5 -M realloc
reallocarr (3)    reallocate array
reallocarray (3)  reallocate memory for an array of elements checking
for overflow
fgetwln (3)       get a line of wide characters from a stream
fgetln (3)        get a line from a stream
posix_memalign (3)        aligned memory allocation

#After changes
$ ./apropos -n 5 -M realloc
realloc (3)       general memory allocation operations
realloc (3)       general purpose memory allocation functions
realloc (9)       general-purpose kernel memory allocator
reallocarr (3)    reallocate array
reallocarray (3)  reallocate memory for an array of elements checking
for overflow

#Before changes
$ apropos -n 5 -M TAILQ_REMOVE
SLIST_HEAD (3) implementations of singly-linked lists, lists, simple
queues, tail queues, and singly-linked tail queues

#After changes
$ ./apropos -n 5 -M TAILQ_REMOVE
TAILQ_REMOVE (3)  implementations of singly-linked lists, lists,
simple queues, tail queues, and singly-linked tail queues

#Before changes
$ apropos -n 5 -M falloc
filedesc (9)      file descriptor tables and operations
file (9)  operations on file entries

#After changes
$ ./apropos -n 5 -M falloc
falloc (9)        file descriptor tables and operations
file (9)  operations on file entries

ok christos@
2017-04-23 13:52:57 +00:00
christos
dec46a9666 libarchive now needs crypto 2017-04-21 23:07:45 +00:00
joerg
c57cca78b1 Use libarchive 3.x interface and not obsolete 2.x versions. 2017-04-20 13:11:35 +00:00
kamil
2fe964ca6f Include <unistd.h> for R_OK W_OK STDOUT_FILENO access(2)
These symbols are undefined after switch to new zlib.
2017-01-10 04:34:07 +00:00
abhinav
01c3d3dc7f Escape hyphen when parsing .Nd 2016-12-19 14:10:57 +00:00
abhinav
e4137a4e3a Don't ignore symlinks.
There can be symlinks which are pointing to man pages not installed in
one of the _default locations mentioned in man.conf or MANPATH. For example
there are man pages in /usr/pkg/man which are symlinked to pages in
/usr/pkg/lib/perl5/man. If we ignore symlinks, we would not be able to
index such pages installed outside the default set of directories.

(Also, the symlink test was incorecct, so we never noticed this issue)

Ok christos@, wiz@
2016-12-17 17:04:38 +00:00
abhinav
e108642273 We don't need to parse the sections we don't index, so stop early. Saves few
instructions.
2016-10-03 16:11:11 +00:00
abhinav
150a47b73e With the latest release of mandoc, makemandb(8) started to parse some
sections multiple times. This started to happen because, pmdoc_Sh(), the handler function
responsible for parsing the Sh macros, used to recursively go through all the child
nodes and then the next nodes starting from top level Sh block node.
Now, once it has processed all the child nodes of the top level block node,
it moves to the next node, which is the top level block node of the next section and
in this way one call to pmdoc_Sh() was causing a complete pass through the
man page. Since, mandoc(3) calls pmdoc_Sh() for each .Sh macro in the man
page, it would result in parsing some of the sections multiple times.
This never happened with the previous versions of mandoc, so we never noticed.

I've fixed this by starting the parse sequence of the Sh macro from its body, which gurantees
that we will stop once that section ends.

ok christos@
2016-10-03 13:53:39 +00:00
abhinav
1c4ff59f37 Mark the section and md5_hash columns as unindexed in the FTS table, as they are not used for search 2016-10-03 13:36:35 +00:00
christos
330a03324f Add -lz to makefile to fix the build. 2016-07-21 12:24:54 +00:00
abhinav
ee829d24f5 Use deroff() from mandoc(3) to directly parse the Nd macro rather
than parsing it by hand.

With the latest mandoc(3), the .Nd macro was getting parsed twice. This fixes
that problem and cleans up the code as well.

ok christos@
2016-07-17 15:56:14 +00:00
abhinav
19584ea1f8 Fix extraction of section number and machine architecture from man pages.
With the latest API, mdoc_validate()/man_validate() needs to be called before
reading the roff_man.meta field, otherwise it is NULL.

Also, if a man page doesn't specify machine architecture, don't default to '?'
, let it be stored as null in the db. Otherwise, the output of apropos(1) shows
the names of the results as \?/<title>
2016-07-17 12:18:12 +00:00
christos
00e4117929 Sync with API changes. 2016-07-15 19:41:33 +00:00
abhinav
e4c681d955 Fix an off by one issue when concatenating strings. 2016-07-06 18:03:27 +00:00