410d0f4380
This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now.
53 lines
2.1 KiB
Plaintext
53 lines
2.1 KiB
Plaintext
There are three tables in the database at present:
|
|
|
|
(1) mandb:
|
|
This is the main FTS table which contains all the content from
|
|
man pages
|
|
|
|
COLUMN NAME DESCRIPTION
|
|
1. section Section number of the page
|
|
2. name The name of the page
|
|
3. name_desc The one line description from the NAME section
|
|
4. desc Contains the content from DESCRIPTION
|
|
section and any other section which
|
|
is not indexed in a separate column.
|
|
5. lib The LIBRARY section
|
|
6. return_vals RETURN VALUES section
|
|
7. env ENVIRONMENT section
|
|
8. files FILES section
|
|
9. exit_status EXIT STATUS section
|
|
10. diagnostics DIAGNOSTICS section
|
|
11. errors ERRORS section
|
|
12. md5_hash md5 hash of the man page (UNIQUE)
|
|
13. machine The machine architecture (if any) for which this page is
|
|
relevant.
|
|
|
|
(2) mandb_meta:
|
|
This table contains md5 hashes of all the indexed man
|
|
pages. This is there to make sure we do not index
|
|
the same man page twice, i.e. to prevent indexing
|
|
hardlinks.
|
|
|
|
COLUMN NAME DESCRIPTION
|
|
1. device (dev_t)Logical device number from stat(2)
|
|
2. inode (ino_t)Inode number from stat(2)
|
|
3. mtime Last modification time from stat(2)
|
|
4. file Absolute path name (UNIQUE constraint)
|
|
5. md5_hash MD5 Hash of the man page (UNIQUE)
|
|
6. id A unique integer ID for the page, which
|
|
refers to the docid column in mandb (Implicit foreign
|
|
key?) PRIMARY KEY
|
|
{device, inode} form a UNIQUE index
|
|
|
|
(3) mandb_links:
|
|
This is an index of all the hard/soft links of the man
|
|
pages. The list of the linked pages is usually obtained
|
|
from the multiple .Nm entries in the page.
|
|
|
|
COLUMN NAME DESCRIPTION
|
|
1. link The name of the hard/soft link (index)
|
|
2. target Name of the target page to which it points
|
|
3. section The section number
|
|
4. machine The machine architecture (if any) for which
|
|
the page is relevant
|