NetBSD/usr.sbin/makemandb/Makefile

# $NetBSD: Makefile,v 1.12 2020/11/12 17:53:43 nia Exp $

.include <bsd.own.mk>

MDOCDIR=${NETBSDSRCDIR}/external/bsd/mdocml
MANCONFDIR=${NETBSDSRCDIR}/usr.bin/man

PROGS=			makemandb apropos whatis
SRCS.makemandb=		makemandb.c apropos-utils.c manconf.c custom_apropos_tokenizer.c
SRCS.apropos=	apropos.c apropos-utils.c manconf.c custom_apropos_tokenizer.c
SRCS.whatis=	whatis.c apropos-utils.c manconf.c custom_apropos_tokenizer.c
MAN.makemandb=	makemandb.8
MAN.apropos=	apropos.1
MAN.whatis=	whatis.1

BINDIR.apropos=		/usr/bin
BINDIR.makemandb=	/usr/sbin
BINDIR.whatis=		/usr/bin

.PATH: ${MANCONFDIR}

CPPFLAGS+= -I${MDOCDIR} -I${MANCONFDIR} -I${.OBJDIR}

MDOCMLOBJDIR!=	cd ${MDOCDIR}/lib/libmandoc && ${PRINTOBJDIR}
MDOCMLLIB=	${MDOCMLOBJDIR}/libmandoc.a

DPADD.makemandb+= 	${MDOCMLLIB} ${LIBARCHIVE} ${LIBBZ2} ${LIBLZMA} ${LIBZ}
LDADD.makemandb+= 	-L${MDOCMLOBJDIR} -lmandoc -larchive -lbz2 -llzma -lz
LDADD.makemandb+=	-lcrypto
DPADD.makemandb+=	${LIBCRYPTO}

DPADD+=		${LIBSQLITE3} ${LIBM} ${LIBZ} ${LIBTERMLIB} ${LIBUTIL}
LDADD+=		-lsqlite3 -lm -lz -ltermlib -lutil

stopwords.c: stopwords.txt
	( set -e; ${TOOL_NBPERF} -n stopwords_hash -s -p ${.ALLSRC};	\
	echo 'static const char *stopwords[] = {';			\
	${TOOL_SED} -e 's|^\(.*\)$$|	"\1",|' ${.ALLSRC};		\
	echo '};'							\
	) > ${.TARGET}

nostem.c: nostem.txt
	( set -e; ${TOOL_NBPERF} -n nostem_hash -s -p ${.ALLSRC};	\
	echo 'static const char *nostem[] = {';			\
	${TOOL_SED} -e 's|^\(.*\)$$|	"\1",|' ${.ALLSRC};		\
	echo '};'							\
	) > ${.TARGET}

DPSRCS+=	stopwords.c nostem.c
CLEANFILES+=	stopwords.c nostem.c

.include <bsd.prog.mk>
Revert addition of pthread dependency on sqlite. It is less trivial than expected and introduced some surprising breakage. 2020-11-12 20:53:43 +03:00			`# $NetBSD: Makefile,v 1.12 2020/11/12 17:53:43 nia Exp $`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00
			`.include <bsd.own.mk>`

			`MDOCDIR=${NETBSDSRCDIR}/external/bsd/mdocml`
Make mandb path configurable. makemandb (and related tools) use the path from the _mandb variable from man.conf now. Set _mandb in man.conf to same value as was used before. From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>. 2012-10-06 19:33:59 +04:00			`MANCONFDIR=${NETBSDSRCDIR}/usr.bin/man`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00
			`PROGS= makemandb apropos whatis`
Add a custom tokenizer which does not stem certain keywords. Which keywords should not be stemmed is specified in the nostem.txt file. (Right now I have taken all the man page names, split them if they had underscores, removed common English words and converted everything to lowercase.) The tokenizer itself is based on the Porter stemming tokenizer shipped with Sqlite. The code in custom_apropos_tokenizer.c is copy of that code with some modifications to prevent stemming keywords specified in nostem.txt. Additionally, it now uses underscore `_' also as a token delimiter. Therefore, now it's possible to do query for `lwp' and all `_lwp_*' man page names will be matched. Or the query can be `unconst' and `__UNCONST' will be matched. This was not possible earlier, because underscore was not a delimiter and therefore the index would have __UNCONST as a key rather than UNCONST. The tokenizer needs fts3_tokenizer.h file, which is not shipped with the amalgamation build of Sqlite, therefore it needs to be added here (unless we decide there is a better place for it). To enforce using the new tokenizer, a schema version bump is needed Since the tokenization is done both at the indexing time (via makemandb) and also while query time (via apropos or whatis), it will be needed to bump the schema version everytime nostem.txt is modified. Otherwise the index will consist of old tokens and desired changes will not be seen with apropos. This should also fix the issue reported in PR bin/46255. Similar suggestion was also made on tech-userlevel@ recently: <http://mail-index.netbsd.org/tech-userlevel/2017/06/08/msg010620.html> Thanks to christos@ for multiple rounds of reviews of the tokenizer code. 2017-06-18 19:24:10 +03:00			`SRCS.makemandb= makemandb.c apropos-utils.c manconf.c custom_apropos_tokenizer.c`
			`SRCS.apropos= apropos.c apropos-utils.c manconf.c custom_apropos_tokenizer.c`
			`SRCS.whatis= whatis.c apropos-utils.c manconf.c custom_apropos_tokenizer.c`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00			`MAN.makemandb= makemandb.8`
			`MAN.apropos= apropos.1`
			`MAN.whatis= whatis.1`

			`BINDIR.apropos= /usr/bin`
			`BINDIR.makemandb= /usr/sbin`
			`BINDIR.whatis= /usr/bin`

Make mandb path configurable. makemandb (and related tools) use the path from the _mandb variable from man.conf now. Set _mandb in man.conf to same value as was used before. From Abhinav Upadhyay <er.abhinav.upadhyay@gmail.com>. 2012-10-06 19:33:59 +04:00			`.PATH: ${MANCONFDIR}`

Since mdocml decided to name headers that conflict with system ones (term.h) move the header inclusion one up. 2013-01-14 22:01:59 +04:00			`CPPFLAGS+= -I${MDOCDIR} -I${MANCONFDIR} -I${.OBJDIR}`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00
			`MDOCMLOBJDIR!= cd ${MDOCDIR}/lib/libmandoc && ${PRINTOBJDIR}`
			`MDOCMLLIB= ${MDOCMLOBJDIR}/libmandoc.a`

Add -lz to makefile to fix the build. 2016-07-21 15:24:54 +03:00			`DPADD.makemandb+= ${MDOCMLLIB} ${LIBARCHIVE} ${LIBBZ2} ${LIBLZMA} ${LIBZ}`
			`LDADD.makemandb+= -L${MDOCMLOBJDIR} -lmandoc -larchive -lbz2 -llzma -lz`
libarchive now needs crypto 2017-04-22 02:07:45 +03:00			`LDADD.makemandb+= -lcrypto`
			`DPADD.makemandb+= ${LIBCRYPTO}`

Revert addition of pthread dependency on sqlite. It is less trivial than expected and introduced some surprising breakage. 2020-11-12 20:53:43 +03:00			`DPADD+= ${LIBSQLITE3} ${LIBM} ${LIBZ} ${LIBTERMLIB} ${LIBUTIL}`
			`LDADD+= -lsqlite3 -lm -lz -ltermlib -lutil`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00
			`stopwords.c: stopwords.txt`
			`( set -e; ${TOOL_NBPERF} -n stopwords_hash -s -p ${.ALLSRC}; \`
			`echo 'static const char *stopwords[] = {'; \`
			`${TOOL_SED} -e 's\|^\(.*\)$$\| "\1",\|' ${.ALLSRC}; \`
			`echo '};' \`
			`) > ${.TARGET}`

Add a custom tokenizer which does not stem certain keywords. Which keywords should not be stemmed is specified in the nostem.txt file. (Right now I have taken all the man page names, split them if they had underscores, removed common English words and converted everything to lowercase.) The tokenizer itself is based on the Porter stemming tokenizer shipped with Sqlite. The code in custom_apropos_tokenizer.c is copy of that code with some modifications to prevent stemming keywords specified in nostem.txt. Additionally, it now uses underscore `_' also as a token delimiter. Therefore, now it's possible to do query for `lwp' and all `_lwp_*' man page names will be matched. Or the query can be `unconst' and `__UNCONST' will be matched. This was not possible earlier, because underscore was not a delimiter and therefore the index would have __UNCONST as a key rather than UNCONST. The tokenizer needs fts3_tokenizer.h file, which is not shipped with the amalgamation build of Sqlite, therefore it needs to be added here (unless we decide there is a better place for it). To enforce using the new tokenizer, a schema version bump is needed Since the tokenization is done both at the indexing time (via makemandb) and also while query time (via apropos or whatis), it will be needed to bump the schema version everytime nostem.txt is modified. Otherwise the index will consist of old tokens and desired changes will not be seen with apropos. This should also fix the issue reported in PR bin/46255. Similar suggestion was also made on tech-userlevel@ recently: <http://mail-index.netbsd.org/tech-userlevel/2017/06/08/msg010620.html> Thanks to christos@ for multiple rounds of reviews of the tokenizer code. 2017-06-18 19:24:10 +03:00			`nostem.c: nostem.txt`
			`( set -e; ${TOOL_NBPERF} -n nostem_hash -s -p ${.ALLSRC}; \`
			`echo 'static const char *nostem[] = {'; \`
			`${TOOL_SED} -e 's\|^\(.*\)$$\| "\1",\|' ${.ALLSRC}; \`
			`echo '};' \`
			`) > ${.TARGET}`

			`DPSRCS+= stopwords.c nostem.c`
			`CLEANFILES+= stopwords.c nostem.c`
Import the new apropos/whatis. This code has been developed by Abhinav Upadhyay as part of Google's Summer of Code 2011. It uses libmandoc to parse man pages and builds a Full Text Index in a SQLite database. The combination of indexing the full manual page, filtering out stop words and ranking individual matches based on the section gives a much improved user experience. The old makewhatis and friends are kept under MKMAKEMANDB=no for now. 2012-02-07 23:13:24 +04:00
			`.include <bsd.prog.mk>`