From d0f052be516f0a8e033333b7880f6e256981534d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Duval?= Date: Thu, 14 Apr 2005 16:10:58 +0000 Subject: [PATCH] added PolyglotMan 3.2 from http://polyglotman.sourceforge.net/ added rules to produce xml from man pages still to be sharpened : xml files could only be objects but they are located in distro ATM users don't read xml AFAIK, so final documentation is to be reworked git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@12392 a95241bf-73f2-0310-859d-f6bbb57e9c96 --- src/tools/Jamfile | 1 + src/tools/rman/CHANGES | 118 + src/tools/rman/Jamfile | 45 + src/tools/rman/MANIFEST | 25 + src/tools/rman/Makefile | 160 ++ src/tools/rman/README-rman.txt | 52 + src/tools/rman/rman.1 | 273 ++ src/tools/rman/rman.c | 4843 ++++++++++++++++++++++++++++++++ src/tools/rman/rman.html | 342 +++ 9 files changed, 5859 insertions(+) create mode 100644 src/tools/rman/CHANGES create mode 100644 src/tools/rman/Jamfile create mode 100644 src/tools/rman/MANIFEST create mode 100644 src/tools/rman/Makefile create mode 100644 src/tools/rman/README-rman.txt create mode 100644 src/tools/rman/rman.1 create mode 100644 src/tools/rman/rman.c create mode 100644 src/tools/rman/rman.html diff --git a/src/tools/Jamfile b/src/tools/Jamfile index 63d5b3cef0..6291ccc96d 100644 --- a/src/tools/Jamfile +++ b/src/tools/Jamfile @@ -7,6 +7,7 @@ SubInclude OBOS_TOP src tools gensyscalls ; SubInclude OBOS_TOP src tools hey ; SubInclude OBOS_TOP src tools rc ; SubInclude OBOS_TOP src tools resattr ; +SubInclude OBOS_TOP src tools rman ; SubInclude OBOS_TOP src tools translation ; SubInclude OBOS_TOP src tools unflatten ; diff --git a/src/tools/rman/CHANGES b/src/tools/rman/CHANGES new file mode 100644 index 0000000000..9118abad78 --- /dev/null +++ b/src/tools/rman/CHANGES @@ -0,0 +1,118 @@ +1993 + 1 Apr as bs2tk posted to comp.lang.tcl (126 lines) + 2 bullets, change bars, copyright symbol + 5 boldface, other SGI nicks + 7 skip unrecognized escape codes +10 small caps +13 underscores considered uppercase so show up in default small caps font + screen out Ultrix junk (code getting pretty tangled now) +14 until Tk text has better tab support, replace tabs by spaces until get to next tab stop (for Ultrix); -t gives tabstop spacing +20 Solaris support (Larry Tsui) + 3 Jun section subheading parsing (Per-Erik Martin) +28 hyphenated man pages in SEE ALSO show up correctly in Links (Mike Steele) +13 Jul under FILES, fully qualified path names are added to Links, but this taken out immediately because not useful +14 option to keep changebars on right (Warren Jessop) + 5 Aug search for header, footer dynamically--no need to edit or search large list of patterns +11 -m kicks in man page formatting beyond nroff backspace kludges +27 handle double digit numbers better by trying again relative to end of line +19 Sep -T gives Tk extras (otherwise ASCII only) + -H gives headers only (implies -T off) +10 Oct -r reverse compiles to [tn]roff source (as Geoff Collyer's nam and fontch, but leveraging existing analysis so only addition of ~60 lines) (The code is device-driver obscure now--obfuscated C contest next.) +13 header and footer optionally available at bottom in Tk view (Marty Leisner) +19 "reflected" odd and even page headers&footers zapped +20 keep count of sections and subsections, using smaller font for larger numbers + 1 Nov reverse compiles to Ensemble, except for character ranges + 4 started rman rewrite for cleaner support of multiple output targets, including: plain ascii, headers only, TkMan, [nt]roff, Ensemble, SGML, HTML + 5 line filtering separated from other logic despite greater sophistication, RosettaMan faster than bs2tk (!) +28 Dec man page reference recognition (Michael Harrison) + + +1994 + 1 Jan identify descriptive lists by comparing scnt2 with s_avg + 3 tail-end table of contents in HTML documents + 5 -f and LaTeX output mode +24 proof-of-concept RTF output mode +26 handle man pages that don't have a header on the first page +28 parse "handwritten" man pages +22 Feb alpha version released + 6 Mar various bug fixes +10 beta version released +13 Jun fixed surious generation on
's (the existence of which pointed out by David Sibley) +22 Jul table recognition experiment. works reasonably well, except for tables with centered headers + 3 Aug allow for off-by-one (and -two) in identification of header and footer + fixed problem with recurrent/leftover text with OSF/1 bold bullets (yeesh) +12 Sep 2.0gamma released +13 check for *third* header, possibly centered, possibly after blank lines (Charles Anderson) + fixed tag ranges for lines following blank lines (just \n) of pages with global indentation (Owen Rees) +19 fixed two small problems with LaTeX (^ => \^, \bullet => $\bullet$) (Neal Becker) +24 simple check for erroneously being fed roff source +26 deal with bold +- as in ksh (ugh) +30 2.0delta released + 9 Oct special check for OSF to guard against section head interpreted as footer + 8 Nov Perl pod output format (result still needs work, but not much) + 7 Dec 2.0epsilon released (last one before final 2.0) +22 Happy Winter Solstice! 2.0 released + deprecated gets() replaced (Robert Withrow) +25 TkMan module's $w.show => $t, saving about 9% in generated characters + + +1995 + 1 Jan experiment with TkMan output to take advantage of my hack to Tk text (i.e., $t insert end "text" => $t insert end "text1" tag1 "text2" tag2 ...); results => output size reduced about 25%, time reduced about 12-15% +25 Mar back to old mark command for Tk module +8 May hyphens in SEE ALSO section would confuse link-finder, so re-linebreak if necessary(!) (Greg Earle & Uri Guttman) + 4 Aug put formats and options into tables (inspired by Steve Maguire's Writing Solid Code) +19 -V accepts colon-separated list of valid volume names (Dag Nygren) +22 MIME output format that's usable in Emacs 19.29 (just three hours at hacking HTML module) (Neal Becker) + 9 Sep nits in HTML and better Solaris code snippets (Drazen Kacar) +13 Nov Macintosh port by Matthias Neeracher +18 Dec adapted to LaTeX2e, null manRef yields italicized man ref (H. Palme) +28 allow long option names (Larry Schwimmer) + + +1996 +22 Jan fixed problem with hyphenation supression and tabs in man page--sheesh! (H. Palme) +23 May split TkMan format into Tk and TkMan (which calls Tk) +25 in TkMan format, initial spaces converted to tabs +24 Sep experiment with formatting from source's macros, for better transcription short of full nroff interpreter +27 commented out Ensemble output format, which nobody used + 2 Oct >4000 lines +11 + 8 Nov release 3.0 alpha. source code parsing works well for Solaris, SunOS, HP-UX; in generating HTML +25 recognize URLs (Mic Campanel) + + +1997 +19 Mar bug fixes, more special characters (roff expert Larry Jones) + 8 Aug renamed to PolyglotMan + 4 Nov TkMan module: Rebus and NoteMark identification for paragraph lengths and command line options taken over from Tcl (still have search, highlight in Tcl, necessarily) (Chad Loder) + >5000 lines, or nearly 40X the lines of code of version 1.0 (which just supported TkMan) + + +1998 +20 Mar automatic detection of Tcl/Tk source (within automatic detection of source; already had within formatted) + centralize casifying SH and SS for formatted and source + group exception table by category +17 Apr in source translation, pass along comment lines too +20 incorporate (RCS) versioning diff's, for HTML + + +2000 +22 Jun eliminate last of encumbered code, release as Open Source + release version 3.0.9 + + +2002 +24 Aug used in Apple's OS X 10.2 (Jaguar) to convert manual pages for display in Project Builder + + +2003 +28 Mar updated for groff 1.18's new escape codes + remove Ensemble output format, which is obsolete + remove Texinfo output format, which is not useful + HTML tags in lowercase + released version 3.1 + 5 Jun applied Aaron Hawley's patches for DocBook XML + 6 Jul assume HTML browsers support full set of entity references + discontinue support for Mac OS 9 and earlier (compiles out of the box on OS X) +25 tags well nested for troff source input (at last!) +26 release version 3.2 diff --git a/src/tools/rman/Jamfile b/src/tools/rman/Jamfile new file mode 100644 index 0000000000..ebc0c89b40 --- /dev/null +++ b/src/tools/rman/Jamfile @@ -0,0 +1,45 @@ +SubDir OBOS_TOP src tools rman ; + +NotFile doc_files ; +Depends files : doc_files ; + +SubDirCcFlags -w ; + +BinCommand rman : rman.c : ; + + +rule Man2Doc +{ + local source = [ FGristFiles $(2) ] ; + local binary = $(1) ; + + SEARCH on $(source) = $(SEARCH_SOURCE) ; + + MakeLocate $(binary) : [ FDirName $(OBOS_DISTRO_TARGET) beos documentation Shell_Tools ] ; + + Depends $(binary) : $(source) rman ; + + LocalDepends doc_files : $(binary) ; + Man2Doc1 $(binary) : rman $(source) ; + LocalClean clean : $(binary) ; +} + +actions Man2Doc1 +{ + $(2[1]) -f XML "$(2[2])" > "$(1)" ; +} + +rule Man2Docs +{ + # Man2Docs ; + local source ; + for source in [ FGristFiles $(1) ] + { + local target = $(source:S=.xml) ; + + Man2Doc $(target) : $(source) ; + } +} + + +Man2Docs rman.1 ; diff --git a/src/tools/rman/MANIFEST b/src/tools/rman/MANIFEST new file mode 100644 index 0000000000..b7599bd4a3 --- /dev/null +++ b/src/tools/rman/MANIFEST @@ -0,0 +1,25 @@ +gcksum crc length name +---------- ------ ---- +719617051 139770 rman.c +3816320114 2260 README-rman.txt +696804442 4302 Makefile +2527388467 11935 rman.1 +2005445684 13618 site/rman.html +1619945447 6260 CHANGES +3825874035 4647 contrib/README-contrib +953283015 911 contrib/authried.txt +641079878 13609 contrib/bennett.txt +1827405570 1220 contrib/gzip.patch +4230791356 250 contrib/hman.cgi +1743159239 359 contrib/hman.ksh +2919099478 8005 contrib/hman.pl +1821945783 4978 contrib/http-rman.c +3091246851 661 contrib/http-rman.html +2646244070 350 contrib/lewis.pl +2860919314 2049 contrib/man2html +1315989744 6596 contrib/rman.pl +3466647040 7531 contrib/rman_html_split +75383086 2482 contrib/rman_html_split.1 +2032677288 150 contrib/sco-wrapper.sh +884546947 3806 contrib/sutter.txt +3112317992 5086 contrib/youki.pl diff --git a/src/tools/rman/Makefile b/src/tools/rman/Makefile new file mode 100644 index 0000000000..4d65c10f83 --- /dev/null +++ b/src/tools/rman/Makefile @@ -0,0 +1,160 @@ +# +# Makefile for PolyglotMan +# It's helpful to read the README-rman.txt file first. +# You should read over all parts of this file, +# down to the "you shouldn't modify" line +# +# Tom Phelps (phelps@ACM.org) +# + + +### you need to localize the paths on these lines + +# The executable `rman' is placed in BINDIR. +# If you're also installing TkMan (available separately--see README-rman.txt), +# this must be a directory that's in your bin PATH. +# MANDIR holds the man page. + +BINDIR = /opt/local/bin +#BINDIR = /usr/local/bin +#BINDIR = //C/bin +MANDIR = /usr/local/man/man1 +# popular alternative +#BINDIR = /opt/local/bin +#MANDIR = /opt/local/man/man1 + + +### if you have GNU gcc, use these definitions +CC = gcc +CFLAGS = -O2 -finline-functions + +### if you just have a standard UNIX, use these instead of GNU. +### CC must be an ANSI C compiler + +#CC = cc +#CFLAGS = -O + +# Solaris and SysV people may need this +#CFLAGS = -O2 -finline-functions + +# For HP-UX +#CC = cc +#CFLAGS = -Aa -O +# HP-UX 10.20 +#CFLAGS = -Ae -O + +# DEC Alpha and Ultrix, -std1 needed to conform to ANSI C +#CC = cc +#CFLAGS = -std1 -O3 -Olimit 1000 + + +# list of valid volume numbers and letters +# you can also set these at runtime with -V +VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p" +# SCO Unix has expanded set of volume letters +#VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p:C:X:S:L:M:F:G:A:H" +# SGI and UnixWare 2.0 +#VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p:D" + + +# the printf strings used to set the HTML and +# to set URL hyperlinks to referenced manual pages +# can be defined at runtime. The defaults are defined below. +# The first %s parameter is the manual page name, +# the second the volume/section number. +# you can set these at runtime with -l and -r, respectively + +MANTITLEPRINTF = "%s(%s) manual page" +# relative link to pregenerated file in same directory +MANREFPRINTF = "%s.%s" +# on-the-fly through a cgi-bin script +#MANREFPRINTF = "/cgi-bin/man2html?%s&%s" +#MANREFPRINTF = "/cgi-bin/man2html?m=%s&n=%s" + + +# # # these lines are probably fine + +CP = cp +# or you can use GNU's cp and backup files that are about to be overwritten +#CP = cp -b +RM = rm + + +#-------------------------------------------------- +# +# you shouldn't modify anything below here +# +#-------------------------------------------------- + +version = 3.2 +rman = rman-$(version) +srcs = rman.c +objs = rman +defs = -DVOLLIST='$(VOLLIST)' -DMANTITLEPRINTF='$(MANTITLEPRINTF)' -DMANREFPRINTF='$(MANREFPRINTF)' +libs = +aux = README-rman.txt Makefile rman.1 site/rman.html CHANGES +distrib = $(srcs) $(libs) $(aux) contrib + + +all: rman + @echo 'Files made in current directory.' + @echo 'You should "make install".' + +# everyone but me zaps assertions with the -DNDEBUG flag +rman: rman.c Makefile + $(CC) -DNDEBUG $(defs) -DPOLYGLOTMANVERSION=\"$(version)\" $(CFLAGS) -o rman rman.c + + +debug: + $(CC) $(defs) -DDEBUG -DPOLYGLOTMANVERSION=\"debug\" -g -Wall -o rman rman.c + +prof: + quantify -cache-dir=/home/orodruin/h/bair/phelps/spine/rman/cache $(CC) -DNDEBUG $(defs) -DPOLYGLOTMANVERSION=\"QUANTIFY\" -g -o rman rman.c + +install: rman +# $(INSTALL) -s rman $(BINDIR) + $(RM) -f $(BINDIR)/rman + $(CP) rman $(BINDIR) + $(RM) -f $(MANDIR)/rman.1 + $(CP) rman.1 $(MANDIR) + +# test version includes assertions +# ginstall rman $(BINDIR)/`arch` +test: rman.c Makefile + $(CC) $(defs) -DPOLYGLOTMANVERSION=\"$(version)\" $(CFLAGS) -Wall -ansi -pedantic -o rman rman.c + ls -l rman + ginstall rman $(BINDIR) + rman -v + rman --help + @echo 'Assertion checks:' + rman -f html weirdman/hp-tbl.1 > /dev/null + rman -f html weirdman/Pnews.1 > /dev/null + nroff -man rman.1 | rman -f html > /dev/null + +sww: + rm -f rman $(wildcard ~/bin/{sun4,snake,alpha}/rman) + rman + +clean: + rm -f $(objs) + +dist: + rm -rf $(rman)* + mkdir $(rman) + $(CP) -RH $(distrib) $(rman) +# expand -4 rman.c > $(rman)/rman.c + rm -f $(rman)/contrib/*~ + @echo 'gcksum crc length name' > MANIFEST + @echo '---------- ------ ----' >> MANIFEST + @cksum $(filter-out contrib, $(filter-out %~, $(distrib) $(wildcard contrib/*))) | tee -a MANIFEST + mv MANIFEST $(rman) + tar chvf $(rman).tar $(rman) + gzip -9v $(rman).tar + rm -rf $(rman) +# ANNOUNCE-rman rman.1 + @echo "*** Did you remember to ci -l first?" + +uu: tar + gznew $(rman).tar.Z + echo 'uudecode, gunzip (from GNU), untar' > $(rman).tar.gz.uu + uuencode $(rman).tar.gz $(rman).tar.gz >> $(rman).tar.gz.uu diff --git a/src/tools/rman/README-rman.txt b/src/tools/rman/README-rman.txt new file mode 100644 index 0000000000..4089a5d134 --- /dev/null +++ b/src/tools/rman/README-rman.txt @@ -0,0 +1,52 @@ +The home location for PolyglotMan is polyglotman.sourceforge.net + + +*** INSTALLING *** + +Set BINDIR in the Makefile to where you keep your binaries and MANDIR +to where you keep your man pages (in their source form). (If you're +using PolyglotMan with TkMan, BINDIR needs to be a component of your +bin PATH.) After properly editing the Makefile, type `make install'. +Thereafter (perhaps after a `rehash') type `rman' to invoke PolyglotMan. +PolyglotMan requires an ANSI C compiler. To compile on a Macintosh +under MPW, use Makefile.mac. + +If you send me bug reports and/or suggestions for new features, +include the version of PolyglotMan (available by typing `rman -v'). +PolyglotMan doesn't parse every aspect of every man page perfectly, but +if it blows up spectacularly where it doesn't seem like it should, you +can send me the man page (or a representative man page if it blows up +on a class of man pages) in BOTH: (1) [tn]roff source form, from +.../man/manX and (2) formatted form (as formatted by `nroff -man'), +uuencoded to preserve the control characters, from .../man/catX. + +If you discover a bug and you obtained PolyglotMan at some other site, +check the home site to see if a newer version has already fixed the problem. + +Be sure to look in the contrib directory for WWW server interfaces, +a batch converter, and a wrapper for SCO. + +-------------------------------------------------- + +*** NOTES ON CURRENT VERSION *** + +Help! I'm looking for people to help with the following projects. +(1) Better RTF output format. The current one works, but could be +made better. (2) Extending the macro sets for source recognition. If +you write an output format or otherwise improve PolyglotMan, please +send in your code so that I may share the wealth in future releases. +(3) Fixing output for various (accented?) characters in the Latin1 +character set. + +-------------------------------------------------- + + +License + +This software is distributed under the Artistic License (see +http://www.opensource.org/licenses/artistic-license.html). + +(This version of PolyglotMan represents a complete rewrite of bs2tk, +which was packaged with TkMan in 1993, which is copyrighted by the +Regents of the University of California, and therefore is not under +their jurisdiction.) diff --git a/src/tools/rman/rman.1 b/src/tools/rman/rman.1 new file mode 100644 index 0000000000..91e4c1e01f --- /dev/null +++ b/src/tools/rman/rman.1 @@ -0,0 +1,273 @@ +.TH PolyglotMan 1 +.SH "NAME " +PolyglotMan, rman - reverse compile man pages from formatted +form to a number of source formats +.SH "SYNOPSIS " +rman [ \fIoptions \fR] [ \fIfile \fR] +.SH "DESCRIPTION " +Up-to-date instructions can be found at +http://polyglotman.sourceforge.net/rman.html + +.PP +\fIPolyglotMan \fR takes man pages from most of the popular flavors +of UNIX and transforms them into any of a number of text source +formats. PolyglotMan was formerly known as RosettaMan. The name +of the binary is still called \fIrman \fR, for scripts that depend +on that name; mnemonically, just think "reverse man". Previously \fI +PolyglotMan \fR required pages to be formatted by nroff prior +to its processing. With version 3.0, it \fIprefers [tn]roff source \fR +and usually produces results that are better yet. And source +processing is the only way to translate tables. Source format +translation is not as mature as formatted, however, so try formatted +translation as a backup. +.PP +In parsing [tn]roff source, one could implement an arbitrarily +large subset of [tn]roff, which I did not and will not do, so +the results can be off. I did implement a significant subset +of those use in man pages, however, including tbl (but not eqn), +if tests, and general macro definitions, so usually the results +look great. If they don't, format the page with nroff before +sending it to PolyglotMan. If PolyglotMan doesn't recognize a +key macro used by a large class of pages, however, e-mail me +the source and a uuencoded nroff-formatted page and I'll see +what I can do. When running PolyglotMan with man page source +that includes or redirects to other [tn]roff source using the .so (source +or inclusion) macro, you should be in the parent directory of +the page, since pages are written with this assumption. For example, +if you are translating /usr/man/man1/ls.1, first cd into /usr/man. +.PP +\fIPolyglotMan \fR accepts man pages from: SunOS, Sun Solaris, +Hewlett-Packard HP-UX, AT&T System V, OSF/1 aka Digital UNIX, +DEC Ultrix, SGI IRIX, Linux, FreeBSD, SCO. Source processing +works for: SunOS, Sun Solaris, Hewlett-Packard HP-UX, AT&T System +V, OSF/1 aka Digital UNIX, DEC Ultrix. It can produce printable +ASCII-only (control characters stripped), section headers-only, +Tk, TkMan, [tn]roff (traditional man page source), SGML, HTML, +MIME, LaTeX, LaTeX2e, RTF, Perl 5 POD. A modular architecture +permits easy addition of additional output formats. +.PP +The latest version of PolyglotMan is available from \fI +http://polyglotman.sourceforge.net/ \fR. +.SH "OPTIONS " +The following options should not be used with any others and +exit PolyglotMan without processing any input. +.TP 15 +-h|--help +Show list of command line options and exit. +.TP 15 +-v|--version +Show version number and exit. +.PP +\fIYou should specify the filter first, as this sets a number +of parameters, and then specify other options. +.TP 15 +-f|--filter <ASCII|roff|TkMan|Tk|Sections|HTML|SGML|MIME|LaTeX|LaTeX2e|RTF|POD> +Set the output filter. Defaults to ASCII. +.TP 15 +-S|--source +PolyglotMan tries to automatically determine whether its input +is source or formatted; use this option to declare source input. +.TP 15 +-F|--format|--formatted +PolyglotMan tries to automatically determine whether its input +is source or formatted; use this option to declare formatted +input. +.TP 15 +-l|--title \fIprintf-string \fR +In HTML mode this sets the <TITLE> of the man pages, given the +same parameters as \fI-r \fR. +.TP 15 +-r|--reference|--manref \fIprintf-string \fR +In HTML and SGML modes this sets the URL form by which to retrieve +other man pages. The string can use two supplied parameters: +the man page name and its section. (See the Examples section.) +If the string is null (as if set from a shell by "-r ''"), `-' +or `off', then man page references will not be HREFs, just set +in italics. If your printf supports XPG3 positions specifier, +this can be quite flexible. +.TP 15 +-V|--volumes \fI<colon-separated list> \fR +Set the list of valid volumes to check against when looking for +cross-references to other man pages. Defaults to \fI1:2:3:4:5:6:7:8:9:o:l:n:p \fR(volume +names can be multicharacter). If an non-whitespace string in +the page is immediately followed by a left parenthesis, then +one of the valid volumes, and ends with optional other characters +and then a right parenthesis--then that string is reported as +a reference to another manual page. If this -V string starts +with an equals sign, then no optional characters are allowed +between the match to the list of valids and the right parenthesis. (This +option is needed for SCO UNIX.) +.PP +The following options apply only when formatted pages are given +as input. They do not apply or are always handled correctly with +the source. +.TP 15 +-b|--subsections +Try to recognize subsection titles in addition to section titles. +This can cause problems on some UNIX flavors. +.TP 15 +-K|--nobreak +Indicate manual pages don't have page breaks, so don't look for +footers and headers around them. (Older nroff -man macros always +put in page breaks, but lately some vendors have realized that +printout are made through troff, whereas nroff -man is used to +format pages for reading on screen, and so have eliminated page +breaks.) \fIPolyglotMan \fR usually gets this right even without +this flag. +.TP 15 +-k|--keep +Keep headers and footers, as a canonical report at the end of +the page. changeleft +Move changebars, such as those found in the Tcl/Tk manual pages, +to the left. --> notaggressive +\fIDisable \fR aggressive man page parsing. Aggressive manual, +which is on by default, page parsing elides headers and footers, +identifies sections and more. --> +.TP 15 +-n|--name \fIname \fR +Set name of man page (used in roff format). If the filename is +given in the form " \fIname \fR. \fIsection \fR", the name and +section are automatically determined. If the page is being parsed +from [tn]roff source and it has a .TH line, this information +is extracted from that line. +.TP 15 +-p|--paragraph +paragraph mode toggle. The filter determines whether lines should +be linebroken as they were by nroff, or whether lines should +be flowed together into paragraphs. Mainly for internal use. +.TP 15 +-s|section \fI# \fR +Set volume (aka section) number of man page (used in roff format). +tables +Turn on aggressive table parsing. --> +.TP 15 +-t|--tabstops \fI# \fR +For those macros sets that use tabs in place of spaces where +possible in order to reduce the number of characters used, set +tabstops every \fI# \fR columns. Defaults to 8. +.SH "NOTES ON FILTER TYPES " +.SS "ROFF " +Some flavors of UNIX ship man page without [tn]roff source, making +one's laser printer little more than a laser-powered daisy wheel. +This filer tries to intuit the original [tn]roff directives, +which can then be recompiled by [tn]roff. +.SS "TkMan " +TkMan, a hypertext man page browser, uses \fIPolyglotMan \fR +to show man pages without the (usually) useless headers and footers +on each pages. It also collects section and (optionally) subsection +heads for direct access from a pulldown menu. TkMan and Tcl/Tk, +the toolkit in which it's written, are available via anonymous +ftp from \fIftp://ftp.smli.com/pub/tcl/ \fR +.SS "Tk " +This option outputs the text in a series of Tcl lists consisting +of text-tags pairs, where tag names roughly correspond to HTML. +This output can be inserted into a Tk text widget by doing an \fI +eval <textwidget> insert end <text> \fR. This format should be +relatively easily parsible by other programs that want both the +text and the tags. Also see ASCII. +.SS "ASCII " +When printed on a line printer, man pages try to produce special +text effects by overstriking characters with themselves (to produce +bold) and underscores (underlining). Other text processing software, +such as text editors, searchers, and indexers, must counteract +this. The ASCII filter strips away this formatting. Piping nroff +output through \fIcol -b \fR also strips away this formatting, +but it leaves behind unsightly page headers and footers. Also +see Tk. +.SS "Sections " +Dumps section and (optionally) subsection titles. This might +be useful for another program that processes man pages. +.SS "HTML " +With a simple extention to an HTTP server for Mosaic or other +World Wide Web browser, \fIPolyglotMan \fR can produce high quality +HTML on the fly. Several such extensions and pointers to several +others are included in \fIPolyglotMan \fR's \fIcontrib \fR directory. +.SS "SGML " +This is appoaching the Docbook DTD, but I'm hoping that someone +that someone with a real interest in this will polish the tags +generated. Try it to see how close the tags are now. +.SS "MIME " +MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563, +good for consumption by MIME-aware e-mailers or as Emacs (>=19.29) +enriched documents. +.SS "LaTeX and LaTeX2e " +Why not? +.SS "RTF " +Use output on Mac or NeXT or whatever. Maybe take random man +pages and integrate with NeXT's documentation system better. +Maybe NeXT has own man page macros that do this. +.SS "PostScript and FrameMaker " +To produce PostScript, use \fIgroff \fR or \fIpsroff \fR. To +produce FrameMaker MIF, use FrameMaker's builtin filter. In both +cases you need \fI[tn]roff \fR source, so if you only have a +formatted version of the manual page, use \fIPolyglotMan \fR's +roff filter first. +.SH "EXAMPLES " +To convert the \fIformatted \fR man page named \fIls.1 \fR back +into [tn]roff source form: +.PP +\fIrman -f roff /usr/local/man/cat1/ls.1 > /usr/local/man/man1/ls.1 \fR +.br +.PP +Long man pages are often compressed to conserve space (compression +is especially effective on formatted man pages as many of the +characters are spaces). As it is a long man page, it probably +has subsections, which we try to separate out (some macro sets +don't distinguish subsections well enough for \fIPolyglotMan \fR +to detect them). Let's convert this to LaTeX format: +.br +.PP +\fIpcat /usr/catman/a_man/cat1/automount.z | rman -b -n automount -s 1 -f +latex > automount.man \fR +.br +.PP +Alternatively, \fIman 1 automount | rman -b -n automount -s 1 -f +latex > automount.man \fR +.br +.PP +For HTML/Mosaic users, \fIPolyglotMan \fR can, without modification +of the source code, produce HTML links that point to other HTML +man pages either pregenerated or generated on the fly. First +let's assume pregenerated HTML versions of man pages stored in \fI/usr/man/html \fR. +Generate these one-by-one with the following form: +.br +\fIrman -f html -r 'http:/usr/man/html/%s.%s.html' /usr/man/cat1/ls.1 > /usr/man/html/ls.1.html \fR +.br +.PP +If you've extended your HTML client to generate HTML on the fly +you should use something like: +.br +\fIrman -f html -r 'http:~/bin/man2html?%s:%s' /usr/man/cat1/ls.1 \fR +.br +when generating HTML. +.SH "BUGS/INCOMPATIBILITIES " +\fIPolyglotMan \fR is not perfect in all cases, but it usually +does a good job, and in any case reduces the problem of converting +man pages to light editing. +.PP +Tables in formatted pages, especially H-P's, aren't handled very +well. Be sure to pass in source for the page to recognize tables. +.PP +The man pager \fIwoman \fR applies its own idea of formatting +for man pages, which can confuse \fIPolyglotMan \fR. Bypass \fI +woman \fR by passing the formatted manual page text directly +into \fIPolyglotMan \fR. +.PP +The [tn]roff output format uses fB to turn on boldface. If your +macro set requires .B, you'll have to a postprocess the \fIPolyglotMan \fR +output. +.SH "SEE ALSO " +\fItkman(1) \fR, \fIxman(1) \fR, \fIman(1) \fR, \fIman(7) \fR +or \fIman(5) \fR depending on your flavor of UNIX +.SH "AUTHOR " +PolyglotMan +.br +by Thomas A. Phelps ( \fIphelps@ACM.org \fR) +.br +developed at the +.br +University of California, Berkeley +.br +Computer Science Division +.PP +Manual page last updated on $Date: 1998/07/13 09:47:28 $ diff --git a/src/tools/rman/rman.c b/src/tools/rman/rman.c new file mode 100644 index 0000000000..d09e547301 --- /dev/null +++ b/src/tools/rman/rman.c @@ -0,0 +1,4843 @@ +static char cvsid[] = "$Header: /Users/phelps/cvs/prj/RosettaMan/rman.c,v 1.154 2003/07/26 19:00:48 phelps Exp $"; + +/* + PolyglotMan by Thomas A. Phelps (phelps@ACM.org) + + accept man pages as formatted by (10) + Hewlett-Packard HP-UX, AT&T System V, SunOS, Sun Solaris, OSF/1, + DEC Ultrix, SGI IRIX, Linux, FreeBSD, SCO + + output as (9) + printable ASCII, section headers only, TkMan, [tn]roff, HTML, + LaTeX, LaTeX2e, RTF, Perl pod, MIME, DocBook XML + + written March 24, 1993 + bs2tk generalized into RosettaMan November 4-5, 1993 + source interpretation added September 24, 1996 + renamed PolyglotMan due to lawsuit by Rosetta, Inc. August 8, 1997 +*/ + +#include <unistd.h> +#include <stdio.h> +#include <string.h> +#include <ctype.h> +#include <stdlib.h> +#include <assert.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <fcntl.h> + + +/*** make #define's into consts? => can't because compilers not smart enough ***/ +/* maximum number of tags per line */ +#define MAXTAGS 50*100 +#define MAXBUF 2*5000 +#define MAXLINES 20000 +#define MAXTOC 500 +#define xputchar(c) (fcharout? putchar(c): (c)) +#define sputchar(c) (fcharout? plain[sI++]=(char)c: (char)(c)) +#define stagadd(tag) tagadd(tag,sI,0) +enum { c_dagger='\xa7', c_bullet='\xb7', c_plusminus='\xb1' }; + + +/*** tag management ***/ + +enum tagtype { NOTAG, TITLE, ITALICS, BOLD, SYMBOL, SMALLCAPS, BOLDITALICS, MONO, MANREF }; /* MANREF last */ +struct { enum tagtype type; int first; int last; } tags[MAXTAGS], tagtmp; +int tagc=0; +struct { char *text; int type; int line; } toc[MAXTOC]; +int tocc=0; + + +/* characters in this list automatically prefixed by a backslash (set in output format function */ +char *escchars=""; +char *vollist = VOLLIST; +const char *manvalid = "._-+:"; /* in addition to alphanumerics, valid characters to find in a man page name */ +char *manrefname; +char *manrefsect; + +enum command { + + /*BEGINCHARTAGS,*/ + CHARTAB='\t', + CHARPERIOD='.', CHARLSQUOTE='`', CHARRSQUOTE='\'', CHARGT='>', CHARLT='<', + CHARAMP='&', CHARBACKSLASH='\\', CHARDASH='-', CHARHAT='^', CHARVBAR='|', + CHARNBSP=0xa0, CHARCENT=0xa2, CHARSECT=0xa7, CHARCOPYR=0xa9, CHARNOT=0xac, + CHARDAGGER=0xad, CHARREGTM=0xae, CHARDEG=0xb0, CHARPLUSMINUS=0xb1, + CHARACUTE=0xb4, CHARBULLET=0xb7, CHAR14=0xbc, CHAR12=0xbd, CHAR34=0xbe, + CHARMUL=0xd7, CHARDIV=0xf7, + CHANGEBAR=0x100, CHARLQUOTE, CHARRQUOTE, + /*ENDCHARTAGS,*/ + + /*BEGINFONTTAGS,*/ + BEGINBOLD, ENDBOLD, BEGINITALICS, ENDITALICS, BEGINBOLDITALICS, ENDBOLDITALICS, + BEGINSC, ENDSC, BEGINY, ENDY, BEGINCODE, ENDCODE, BEGINMANREF, ENDMANREF, + FONTSIZE, + /*ENDFONTTAGS*/ + + /*BEGINLAYOUTTAGS,*/ + ITAB, BEGINCENTER, ENDCENTER, HR, + /*ENDLAYOUTTAGS,*/ + + /*BEGINSTRUCTTAGS,*/ + BEGINDOC, ENDDOC, BEGINCOMMENT, ENDCOMMENT, COMMENTLINE, BEGINBODY, ENDBODY, + BEGINHEADER, ENDHEADER, BEGINFOOTER, ENDFOOTER, BEGINLINE, ENDLINE, SHORTLINE, + BEGINSECTION, ENDSECTION, BEGINSUBSECTION, ENDSUBSECTION, + BEGINSECTHEAD, ENDSECTHEAD, BEGINSUBSECTHEAD, ENDSUBSECTHEAD, + BEGINBULPAIR, ENDBULPAIR, BEGINBULLET, ENDBULLET, BEGINBULTXT, ENDBULTXT, + BEGINTABLE, ENDTABLE, BEGINTABLELINE, ENDTABLELINE, BEGINTABLEENTRY, ENDTABLEENTRY, + BEGININDENT, ENDINDENT, BEGINCODEBLOCK, ENDCODEBLOCK, + + BEGINDIFFA, ENDDIFFA, BEGINDIFFD, ENDDIFFD + /*,*//*ENDSTRUCTTAGS,*/ +}; + +const char *tcltkOP[] = { "Command-Line Name", "Database Name", "Database Class" }; + + +/* characters that need special handling in any output format, *more than just a backslash* */ +/* characters in this list need a corresponding case statement in each output format */ +/*char *trouble="\t.`'><&\\^|-\xa7\xb7\xb1";*/ +const unsigned char trouble[]= { CHARTAB, CHARPERIOD, CHARLSQUOTE, CHARRSQUOTE, + CHARGT, CHARLT, CHARAMP, CHARBACKSLASH, CHARDASH, CHARHAT, CHARVBAR, CHARCENT, + CHARSECT, CHARCOPYR, CHARNOT, CHARDAGGER, CHARREGTM, CHARDEG, CHARPLUSMINUS, + CHARACUTE, CHARBULLET, CHAR14, CHAR12, CHAR34, CHARMUL, CHARDIV, + 0 }; + + +enum command tagbeginend[][2] = { /* parallel to enum tagtype */ + { -1,-1 }, + { -1,-1 }, + { BEGINITALICS, ENDITALICS }, + { BEGINBOLD, ENDBOLD }, + { BEGINY, ENDY }, + { BEGINSC, ENDSC }, + { BEGINBOLDITALICS, ENDBOLDITALICS }, + { -1,-1 }, + { BEGINMANREF, ENDMANREF } +}; + +void (*fn)(enum command) = NULL; +enum command prevcmd = BEGINDOC; + + +/*** globals ***/ + +int fSource=-1; /* -1 => not determined yet */ +int finlist=0; +int fDiff=0; +FILE *difffd; +char diffline[MAXBUF]; +char diffline2[MAXBUF]; +char *message = NULL; +int fontdelta=0; +int intArg; + +int fPara=0; /* line or paragraph groupings of text */ +int fSubsections=0; /* extract subsection titles too? */ +int fChangeleft=0; /* move change bars to left? (-1 => delete them) */ +int fReflow=0; +int fURL=0; /* scan for URLs too? */ +/*int fMan=1; /* invoke agressive man page filtering? */ +int fQS=0; /* squeeze out spaces (scnt and interword)? */ +int fIQS=0; /* squeeze out initial spaces (controlled separately from fQS) */ +int fILQS=0; /* squeeze out spaces for usual indent */ +int fHeadfoot=0; /* show canonical header and footer at bottom? */ +int falluc=0; +int itabcnt=0; +int fQuiet=0; +int fTclTk=0; + +/* patterns observed in section heads that don't conform to first-letter-uppercase-rest-lowercase pattern (stay all uc, or go all lc, or have subsequent uc) */ +int lcexceptionslen = -1; /* computed by system */ +char *lcexceptions[] = { +/* new rule: double/all consonants == UC? */ + /* articles, verbs, conjunctions, prepositions, pronouns */ + "a", "an", "the", + "am", "are", "is", "were", + "and", "or", + "by", "for", "from", "in", "into", "it", "of", "on", "to", "with", + "that", "this", + + /* terms */ + "API", "CD", "GUI", "UI", /*I/O=>I/O already*/ "ID", "IDs", "OO", + "IOCTLs", "IPC", "RPC", + + /* system names */ + "AWK", "cvs", "rcs", "GL", "vi", "PGP", "QuickTime", "DDD", "XPG/3", + "NFS", "NIS", "NIS+", "AFS", + "UNIX", "SysV", + "XFree86", "ICCCM", + "MH", "MIME", + "TeX", "LaTeX", "PicTeX", + "PostScript", "EPS", "EPSF", "EPSI", + "HTML", "URL", "WWW", + + /* institution names */ + "ANSI", "CERN", "GNU", "ISO", "NCSA", + + /* Sun-specific */ + "MT-Level", "SPARC", + + NULL +}; + + +int TabStops=8; +int hanging=0; /* location of hanging indent (if ==0, none) */ +enum { NAME, SYNOPSIS, DESCRIPTION, SEEALSO, FILES, AUTHOR, RANDOM/*last!*/ }; +char *sectheadname[] = { + "NAME:NOMBRE", "SYNOPSIS", "DESCRIPTION:INTRODUCTION", "SEE ALSO:RELATED INFORMATION", "FILES", "AUTHOR:AUTHORS", "RANDOM" +}; +int sectheadid = RANDOM; +int oldsectheadid = RANDOM; + +int fCodeline=0; +int fNOHY=0; /* re-linebreak so no words are hyphenated; not used by TkMan, but gotta keep for people converting formatted text */ +int fNORM=0; /* normalize? initial space => tabs, no changebars, exactly one blank line between sections */ +const char TABLEOFCONTENTS[] = "Table of Contents"; +const char HEADERANDFOOTER[] = "Header and Footer"; +char manName[80] = "man page"; +char manSect[10] = "1"; +const char PROVENANCE[] = + "manual page source format generated by PolyglotMan v" POLYGLOTMANVERSION; +const char HOME[] = "available at http://polyglotman.sourceforge.net/"; +const char horizontalrule[] = "------------------------------------------------------------"; + +const int LINEBREAK = 70; +int linelen = 0; /* length of result in plain[] */ +int spcsqz; /* number of spaces squeezed out */ +int ccnt = 0; /* # of changebars */ +int scnt, scnt2; /* counts of initial spaces in line */ +int s_sum, s_cnt; +int bs_sum, bs_cnt; +int ncnt=0, oncnt=0; /* count of interline newlines */ +int CurLine=1; +int AbsLine=1-1; /* absolute line number */ +int indent=0; /* global indentation */ +int lindent=0; /* usual local indent */ +int auxindent=0; /* aux indent */ +int I; /* index into line/paragraph */ +int fcharout=1; /* show text or not */ +char lookahead; +/*int tabgram[MAXBUF];*/ /* histogram of first character positions */ +char buf[MAXBUF]; +char plain[MAXBUF]; /* current text line with control characters stripped out */ +char hitxt[MAXBUF]; /* highlighted text (available at time of BEGIN<highlight> signal */ + +char header[MAXBUF]; /* complete line */ +char header2[MAXBUF]; /* SGIs have two lines of headers and footers */ +char header3[MAXBUF]; /* GNU and some others have a third! */ +char footer[MAXBUF]; +char footer2[MAXBUF]; +#define CRUFTS 5 +char *cruft[CRUFTS] = { header, header2, header3, footer, footer2 }; + +char *File, *in; /* File = pointer to full file contents, in = current file pointer */ +char *argv0; +int finTable=0; +char tableSep='\0'; /*\t';*/ +/*int fTable=0; +int fotable=0;*/ +char *tblcellformat; +int tblcellspan; +/*int tblspanmax;*/ +int listtype=-1; /* current list type bogus to begin with */ +enum listtypes { DL, OL, UL }; + +int fIP=0; + + + +/*** utility functions ***/ + + +/* case insensitive versions of strcmp and strncmp */ + +int +stricmp(const char *s1, const char *s2) { + assert(s1!=NULL && s2!=NULL); + /*strincmp(s1, s2, strlen(s1)+1);*/ + + while (tolower(*s1)==tolower(*s2)) { + if (*s1=='\0' /*&& *s2=='\0'*/) return 0; + s1++; s2++; + } + + if (tolower(*s1)<tolower(*s2)) return -1; + else return 1; +} + +int lcexceptionscmp(const char **a, const char **b) { return stricmp(*a, *b); } + +int +strincmp(const char *s1, const char *s2, size_t n) { + assert(s1!=NULL && s2!=NULL && n>0); + + while (n>0 && tolower(*s1)==tolower(*s2)) { + n--; s1++; s2++; + } + if (n==0) return 0; + else if (tolower(*s1)<tolower(*s2)) return -1; + else return 1; +} + +/* compare string and a colon-separated list of strings */ +int +strcoloncmp2(char *candidate, int end, const char *list, int sen) { + const char *l = list; + char *c,c2; + + assert(candidate!=NULL && list!=NULL); + assert(end>=-1 && end<=255); + assert(sen==0 || sen==1); + + if (*l==':') l++; /* tolerate a leading colon */ + + /* invariant: c and v point to start of strings to compare */ + while (*l) { + assert(l==list || l[-1]==':'); + for (c=candidate; *c && *l; c++,l++) + if ((sen && *c!=*l) || (!sen && tolower(*c)!=tolower(*l))) + break; + + /* if candidate matches a valid one as far as valid goes, it's a keeper */ + if ((*l=='\0' || *l==':') && (*c==end || end==-1)) { + if (*c=='\b') { + c2 = c[-1]; + while (*c=='\b' && c[1]==c2) c+=2; + } + /* no volume qualifiers with digits */ + if (!isdigit(*c)) return 1; + } + + /* bump to start of next valid */ + while (*l && *l++!=':') /* nada */; + } + + return 0; +} + +int +strcoloncmp(char *candidate, int end, const char *list) { + int sen=1; + const char *l = list; + + assert(candidate!=NULL && list!=NULL); + assert(end>=-1 && end<=255); + + if (*l=='=') l++; else end=-1; + if (*l=='i') { sen=0; l++; } + + return strcoloncmp2(candidate, end, l, sen); +} + +/* strdup not universally available */ +char * +mystrdup(char *p) { + char *q; + + if (p==NULL) return NULL; + + q = malloc(strlen(p)+1); /* +1 gives space for \0 that is not reported by strlen */ + if (q!=NULL) strcpy(q,p); + return q; +} + + +/* given line of text, return "casified" version in place: + if word in exceptions list, return exception conversion + else uc first letter, lc rest +*/ +void casify(char *p) { + char tmpch, *q, **exp; + int fuc; + + for (fuc=1; *p; p++) { + if (isspace(*p) || strchr("&/",*p)!=NULL) fuc=1; + else if (fuc) { + /* usually */ + if (p[1] && isupper(p[1]) /*&& p[2] && isupper(p[2])*/) fuc=0; + /* check for exceptions */ + for (q=p; *q && !isspace(*q); q++) /*nada*/; + tmpch = *q; *q='\0'; + exp = (char **)bsearch(&p, lcexceptions, lcexceptionslen, sizeof(char *), lcexceptionscmp); + *q = tmpch; + if (exp!=NULL) { + for (q=*exp; *q; q++) *p++=*q; + fuc = 1; + } + } else *p=tolower(*p); + } +} + + +/* add an attribute tag to a range of characters */ + +void +tagadd(int /*enum tagtype--abused in source parsing*/ type, int first, int last) { + assert(type!=NOTAG); + + if (tagc<MAXTAGS) { + tags[tagc].type = type; + tags[tagc].first = first; + tags[tagc].last = last; + tagc++; + } +} + + +/* + collect all saves to string table one one place, so that + if decide to go with string table instead of multiple malloc, it's easy + (probably few enough malloc's that more sophistication is unnecessary) +*/ + +void +tocadd(char *text, enum command type, int line) { + char *r; + + assert(text!=NULL && strlen(text)>0); + assert(type==BEGINSECTION || type==BEGINSUBSECTION); + + if (tocc<MAXTOC) { + r = malloc(strlen(text)+1); if (r==NULL) return; + strcpy(r,text); + toc[tocc].text = r; + toc[tocc].type = type; + toc[tocc].line = line; + tocc++; + } +} + + + +char *manTitle = MANTITLEPRINTF; +char *manRef = MANREFPRINTF; +char *href; +int fmanRef=1; /* make 'em links or just show 'em? */ + +void +manrefextract(char *p) { + char *p0; + static char *nonhref = "\">'"; + + while (*p==' ') p++; + if (strincmp(p,"http",4)==0) { + href="%s"; manrefname = p; + p+=4; + while (*p && !isspace(*p) && !strchr(nonhref,*p)) p++; + } else { + href = manRef; + + manrefname = p; + while (*p && *p!=' ' && *p!='(') p++; *p++='\0'; + while (*p==' ' || *p=='(') p++; p0=p; + while (*p && *p!=')') p++; + manrefsect = p0; + } + *p='\0'; +} + + + + +/* + * OUTPUT FORMATS + */ + +void +formattedonly(void) { + fprintf(stderr, "The output formats for Tk and TkMan require nroff-formatted input\n"); + exit(1); +} + + +/* + * DefaultFormat -- in weak OO inheritance, top of hierarchy for everybody + */ +void +DefaultFormat(enum command cmd) { + int i; + + switch (cmd) { + case ITAB: + for (i=0; i<itabcnt; i++) putchar('\t'); + break; + default: + /* nada */ + break; + } +} + + +/* + * DefaultLine -- in weak OO inheritance, top of hierarchy for line-based formats + * for output format to "inherit", have "default: DefaultLine(cmd)" and override case statement "methods" + */ + +void +DefaultLine(enum command cmd) { + switch (cmd) { + default: + DefaultFormat(cmd); + } +} + + +/* + * DefaultPara -- top of hierarchy for output formats that are formatted by their viewers + */ + +void +DefaultPara(enum command cmd) { + switch (cmd) { + default: + DefaultFormat(cmd); + } +} + + + +/* + * Tk -- just emit list of text-tags pairs + */ + +void +Tk(enum command cmd) { + static int skip=0; /* skip==1 when line has no text */ + int i; + + if (fSource) formattedonly(); + + /* invariant: always ready to insert text */ + + switch (cmd) { + case BEGINDOC: + I=0; CurLine=1; + escchars = "\"[]$"; + printf(/*$t insert end */ "\""); + break; + case ENDDOC: + if (fHeadfoot) { +/* grr, should have +mark syntax for Tk text widget! -- maybe just just +sect#, +subsect# + printf("\\n\\n\" {} \"%s\\n\" {+headfoot h2}\n", HEADERANDFOOTER); +*/ + printf("\\n\\n\" {} \"%s\\n\" h2\n",HEADERANDFOOTER); + /*printf("$t mark set headfoot %d.0\n", CurLine);*/ + CurLine++; + + for (i=0; i<CRUFTS; i++) { + if (*cruft[i]) { + printf(/*$t insert end */"{%s} sc \\n\n", cruft[i]); + CurLine++; + } + } + } else printf("\"\n"); + break; + + case COMMENTLINE: printf("# "); break; + + case BEGINLINE: + /*I=0; -- need to do this at end of line so set for filterline() */ + /* nothing to do at start of line except catch up on newlines */ + for (i=0; i<ncnt; i++) printf("\\n"); + CurLine+=ncnt; + /*if (fSource) for (i=0; i<indent; i++) putchar('\t');*/ + break; + case ENDLINE: + /*if (!fSource) {*/ + if (!skip) /*if (ncnt)*/ printf("\\n"); /*else xputchar(' ');*/ + skip=0; + CurLine++; I=0; + /* + } else { + putchar(' '); I++; + } + */ + break; + + case ENDSECTHEAD: + printf("\\n\" h2 \""); + tagc=0; + skip=1; + break; + case ENDSUBSECTHEAD: + printf("\\n\" h3 \""); /* add h3? */ + tagc=0; + skip=1; + break; + case HR: /*printf("\\n%s\\n", horizontalrule); CurLine+=2; I=0;*/ break; + case BEGINTABLEENTRY: + /*if (fSource) putchar('\t');*/ + break; + case BEGINTABLELINE: + case ENDTABLEENTRY: + break; + case ENDTABLELINE: + printf("\" tt \""); + /*tagadd(MONO, 0, I);*/ + break; + + case CHANGEBAR: putchar('|'); I++; break; + case CHARLQUOTE: + case CHARRQUOTE: + putchar('\\'); putchar('"'); I++; + break; + case CHARLSQUOTE: + case CHARRSQUOTE: + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARLT: + case CHARGT: + case CHARHAT: + case CHARVBAR: + case CHARAMP: + case CHARPLUSMINUS: + case CHARNBSP: + case CHARCENT: + case CHARSECT: + case CHARCOPYR: + case CHARNOT: + case CHARREGTM: + case CHARDEG: + case CHARACUTE: + case CHAR14: + case CHAR12: + case CHAR34: + case CHARMUL: + case CHARDIV: + putchar(cmd); I++; break; + case CHARDAGGER: + putchar('+'); I++; break; + case CHARBACKSLASH: printf("\\\\"); I++; break; + case CHARBULLET: printf("\" {} %c symbol \"",c_bullet); I++; break; + + + case BEGINSECTHEAD: + case BEGINSUBSECTHEAD: + /*if (fSource && sectheadid!=NAME) { printf("\\n\\n"); CurLine+=2; I=0; }*/ + tagc=0; /* section and subsection formatting controlled descriptively */ + /* no break;*/ + + case BEGINBOLD: + case BEGINITALICS: + case BEGINBOLDITALICS: + case BEGINCODE: + case BEGINY: + case BEGINSC: + case BEGINMANREF: + /* end text, begin attributed text */ + printf("\" {} \""); + break; + + /* rely on the fact that no more than one tag per range of text */ + case ENDBOLD: printf("\" b \""); break; + case ENDITALICS: printf("\" i \""); break; + case ENDBOLDITALICS: printf("\" bi \""); break; + case ENDCODE: printf("\" tt \""); break; + case ENDY: printf("\" symbol \""); break; + case ENDSC: printf("\" sc \""); break; + case ENDMANREF: printf("\" manref \""); break; + /* presentation attributes dealt with at end of line */ + + case BEGINBODY: + /*if (fSource) { printf("\\n\\n"); CurLine+=2; I=0; }*/ + break; + case SHORTLINE: + /*if (fSource) { printf("\\n"); CurLine++; I=0; }*/ + break; + case ENDBODY: + case BEGINBULPAIR: case ENDBULPAIR: + /*if (fSource) { printf("\\n"); CurLine++; I=0; }*/ + break; + case BEGINBULTXT: + /*if (fSource) putchar('\t');*/ + break; + case BEGINBULLET: case ENDBULLET: + case ENDBULTXT: + case BEGINSECTION: case ENDSECTION: + case BEGINSUBSECTION: case ENDSUBSECTION: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case BEGINTABLE: case ENDTABLE: + case FONTSIZE: + case BEGININDENT: case ENDINDENT: + /* no action */ + break; + default: + DefaultLine(cmd); + } +} + + + + +/* + * TkMan -- Tk format wrapped with commands + */ + +int linetabcnt[MAXLINES]; /* don't want to bother with realloc */ +int clocnt=0, clo[MAXLINES]; +int paracnt=0, para[MAXLINES]; +int rebuscnt=0, rebus[MAXLINES]; +int rebuspatcnt=0, rebuspatlen[25]; +char *rebuspat[25]; + +void +TkMan(enum command cmd) { + static int lastscnt=-1; + static int lastlinelen=-1; + static int lastsect=0; + /*static int coalese=0;*/ + static int finflow=0; + int i; + char c,*p; + + /* invariant: always ready to insert text */ + + switch (cmd) { + case BEGINDOC: + printf("$t insert end "); /* opening quote supplied in Tk() below */ + Tk(cmd); + break; + case ENDDOC: + Tk(ENDLINE); + + if (fHeadfoot) { +/* grr, should have +mark syntax for Tk text widget! + printf("\\n\\n\" {} \"%s\\n\" {+headfoot h2}\n", HEADERANDFOOTER); +*/ + printf("\\n\\n\" {} \"%s\\n\" h2\n", HEADERANDFOOTER); +/* printf("$t mark set headfoot end-2l\n");*/ + CurLine++; + + for (i=0; i<CRUFTS; i++) { + if (*cruft[i]) { + printf("$t insert end {%s} sc \\n\n", cruft[i]); + CurLine++; + } + } + } else printf("\"\n"); + +/* + printf("$t insert 1.0 {"); + for (i=0; i<MAXBUF; i++) if (tabgram[i]) printf("%d=%d, ", i, tabgram[i]); + printf("\\n\\n}\n"); +*/ + + printf("set manx(tabcnts) {"); for (i=1; i<CurLine; i++) printf("%d ", linetabcnt[i]); printf("}\n"); + printf("set manx(clo) {"); for (i=0; i<clocnt; i++) printf("%d ", clo[i]); printf("}\n"); + printf("set manx(para) {"); for (i=0; i<paracnt; i++) printf("%d ", para[i]); printf("}\n"); + printf("set manx(reb) {"); for (i=0; i<rebuscnt; i++) printf("%d ", rebus[i]); printf("}\n"); + + break; + + case BEGINCOMMENT: fcharout=0; break; + case ENDCOMMENT: fcharout=1; break; + case COMMENTLINE: break; + + case ENDSECTHEAD: + case ENDSUBSECTHEAD: + lastsect=1; + Tk(cmd); + break; + + case BEGINLINE: + Tk(cmd); + linetabcnt[CurLine] = itabcnt; + /* old pattern for command line options "^\\|*\[ \t\]+-\[^-\].*\[^ \t\]" */ + c = plain[0]; + if (linelen>=2 && ((c=='-' || c=='%' || c=='\\' || c=='$' /**/ /* not much talk of money in man pages so reasonable */) && (isalnum(plain[1]) /*<= plain[1]!='-'*//*no dash*/ || ncnt/*GNU long option*/) && plain[1]!=' ') ) clo[clocnt++] = CurLine; + /* + would like to require second letter to be a capital letter to cut down on number of matches, + but command names usually start with lowercase letter + maybe use a uppercase requirement as secondary strategy, but probably not + */ + if ((ncnt || lastsect) && linelen>0 && scnt>0 && scnt<=7/*used to be <=5 until groff spontaneously started putting in 7*/) para[paracnt++] = CurLine; + lastsect=0; + + + /* rebus too, instead of search through whole Tk widget */ + if (rebuspatcnt && scnt>=5 /* not sect or subsect heads */) { + for (p=plain; *p && *p!=' '; p++) /*empty*/; /* never first word */ + while (*p) { + for (i=0; i<rebuspatcnt; i++) { + if (tolower(*p) == tolower(*rebuspat[i]) && strincmp(p, rebuspat[i], rebuspatlen[i])==0) { + /* don't interfere with man page refs */ + for (; *p && !isspace(*p); p++) if (*p=='(') continue; + rebus[rebuscnt++] = CurLine; + p=""; /* break for outer */ + break; /* just locating any line with any rebus, not exact positions */ + } + } + /* just check start of words, though doesn't have to be full word (if did, could use strlen rather than strnlen) */ + while (*p && *p!=' ') p++; + while (*p && *p==' ') p++; + } + } + + + if (fReflow && !ncnt && (finflow || lastlinelen>50) && (abs(scnt-lastscnt)<=1 || abs(scnt-hanging)<=1)) { + finflow=1; + putchar(' '); + } else { + Tk(ENDLINE); + /*if ((CurLine&0x3f)==0x3f) printf("\"\nupdate idletasks\n$t insert end \""); blows up some Tk text buffer, apparently, on long lines*/ + if ((CurLine&0x1f)==0x1f) printf("\"\nupdate idletasks\n$t insert end \""); + finflow=0; + + /*if (fCodeline) printf("CODE");*/ + } + lastlinelen=linelen; lastscnt=scnt; + break; + + case ENDLINE: + /* don't call Tk(ENDLINE) */ + break; + + default: /* if not caught above, it's the same as Tk */ + Tk(cmd); + } +} + + + + +/* + * ASCII + */ + +void +ASCII(enum command cmd) { + int i; + + switch (cmd) { + case ENDDOC: + if (fHeadfoot) { + printf("\n%s\n", HEADERANDFOOTER); + for (i=0; i<CRUFTS; i++) if (*cruft[i]) printf("%s\n", cruft[i]); + } + break; + case CHARRQUOTE: + case CHARLQUOTE: + putchar('"'); + break; + case CHARLSQUOTE: + putchar('`'); + break; + case CHARRSQUOTE: + case CHARACUTE: + putchar('\''); + break; + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARLT: + case CHARAMP: + case CHARBACKSLASH: + case CHARGT: + case CHARHAT: + case CHARVBAR: + case CHARNBSP: + putchar(cmd); break; + case CHARDAGGER: putchar('+'); break; + case CHARBULLET: putchar('*'); break; + case CHARPLUSMINUS: printf("+-"); break; + case CHANGEBAR: putchar('|'); break; + case CHARCENT: putchar('c'); break; + case CHARSECT: putchar('S'); break; + case CHARCOPYR: printf("(C)"); break; + case CHARNOT: putchar('~'); break; + case CHARREGTM: printf("(R)"); break; + case CHARDEG: putchar('o'); break; + case CHAR14: printf("1/4"); break; + case CHAR12: printf("1/2"); break; + case CHAR34: printf("3/4"); break; + case CHARMUL: putchar('X'); break; + case CHARDIV: putchar('/'); break; + case HR: printf("\n%s\n", horizontalrule); break; + + case BEGINLINE: + for (i=0; i<ncnt; i++) putchar('\n'); + break; + case BEGINBODY: + case SHORTLINE: + if (!fSource) break; + case ENDLINE: + putchar('\n'); + CurLine++; + break; + + case BEGINDOC: + case ENDBODY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case BEGINSECTION: case ENDSECTION: + case BEGINSECTHEAD: case ENDSECTHEAD: + case BEGINSUBSECTHEAD: case ENDSUBSECTHEAD: + case BEGINBULPAIR: case ENDBULPAIR: + case BEGINBULLET: case ENDBULLET: + case BEGINBULTXT: case ENDBULTXT: + case BEGINSUBSECTION: case ENDSUBSECTION: + + case BEGINTABLE: case ENDTABLE: + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + case BEGINBOLD: case ENDBOLD: + case BEGINCODE: case ENDCODE: + case BEGINITALICS: case ENDITALICS: + case BEGINMANREF: case ENDMANREF: + case BEGINBOLDITALICS: case ENDBOLDITALICS: + case BEGINY: case ENDY: + case BEGINSC: case ENDSC: + /* nothing */ + break; + default: + DefaultLine(cmd); + } +} + + + +/* + * Perl 5 pod ("plain old documentation") + */ + +void +pod(enum command cmd) { + static int curindent=0; + int i; + + if (hanging==-1) { + if (curindent) hanging=curindent; else hanging=5; + } + + + if (cmd==BEGINBULPAIR) { + /* want to have multiply indented text */ + if (curindent && hanging!=curindent) printf("\n=back\n\n"); + if (hanging!=curindent) printf("\n=over %d\n\n",hanging); + curindent=hanging; + } else if (cmd==ENDBULPAIR) { + /* nothing--wait until next command */ + } else if (cmd==BEGINLINE && !scnt) { + if (curindent) printf("\n=back\n\n"); + curindent=0; + } else if (cmd==BEGINBODY) { + if (curindent) { + printf("\n=back\n\n"); + curindent=0; + auxindent=0; + } + } +/* + case BEGINBULPAIR: + printf("=over %d\n\n", hanging); + break; + case ENDBULPAIR: + printf("\n=back\n\n"); + break; +*/ + switch (cmd) { + case BEGINDOC: I=0; break; + + case BEGINCOMMENT: fcharout=0; break; + case ENDCOMMENT: fcharout=1; break; + case COMMENTLINE: break; + + case CHARRQUOTE: + case CHARLQUOTE: + putchar('"'); + break; + case CHARLSQUOTE: + putchar('`'); + break; + case CHARRSQUOTE: + case CHARACUTE: + putchar('\''); + break; + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARLT: + case CHARAMP: + case CHARBACKSLASH: + case CHARGT: + case CHARHAT: + case CHARVBAR: + case CHARNBSP: + putchar(cmd); break; + case CHARDAGGER: putchar('+'); break; + case CHARPLUSMINUS: printf("+-"); break; + case CHANGEBAR: putchar('|'); break; + case CHARCENT: putchar('c'); break; + case CHARSECT: putchar('S'); break; + case CHARCOPYR: printf("(C)"); break; + case CHARNOT: putchar('~'); break; + case CHARREGTM: printf("(R)"); break; + case CHARDEG: putchar('o'); break; + case CHAR14: printf("1/4"); break; + case CHAR12: printf("1/2"); break; + case CHAR34: printf("3/4"); break; + case CHARMUL: putchar('X'); break; + case CHARDIV: putchar('/'); break; + case HR: printf("\n%s\n", horizontalrule); break; + case CHARBULLET: putchar('*'); break; + + case BEGINLINE: + for (i=0; i<ncnt; i++) putchar('\n'); + CurLine+=ncnt; + break; + case ENDLINE: + putchar('\n'); + CurLine++; + I=0; + break; + + case BEGINSECTHEAD: printf("=head1 "); break; + case BEGINSUBSECTHEAD: printf("=head2 "); break; + + case ENDSECTHEAD: + case ENDSUBSECTHEAD: + printf("\n"); + break; + + case BEGINCODE: + case BEGINBOLD: printf("B<"); break; + case BEGINITALICS: printf("I<"); break; + case BEGINMANREF: printf("L<"); break; + + case ENDBOLD: + case ENDCODE: + case ENDITALICS: + case ENDMANREF: + printf(">"); + break; + + case BEGINBULLET: + printf("\n=item "); + break; + case ENDBULLET: + printf("\n\n"); + fcharout=0; + break; + case BEGINBULTXT: + fcharout=1; + auxindent=hanging; + break; + case ENDBULTXT: + auxindent=0; + break; + + + case ENDDOC: + case BEGINBODY: case ENDBODY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case BEGINSECTION: case ENDSECTION: + case BEGINSUBSECTION: case ENDSUBSECTION: + case BEGINBULPAIR: case ENDBULPAIR: + + case SHORTLINE: + case BEGINTABLE: case ENDTABLE: + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + case BEGINBOLDITALICS: case ENDBOLDITALICS: + case BEGINY: case ENDY: + case BEGINSC: case ENDSC: + /* nothing */ + break; + default: + DefaultLine(cmd); + } +} + + + +void +Sections(enum command cmd) { + + switch (cmd) { + case ENDSECTHEAD: + case ENDSUBSECTHEAD: + putchar('\n'); + case BEGINDOC: + fcharout=0; + break; + + case BEGINCOMMENT: fcharout=0; break; + case ENDCOMMENT: fcharout=1; break; + case COMMENTLINE: break; + + case BEGINSUBSECTHEAD: + printf(" "); + /* no break */ + case BEGINSECTHEAD: + fcharout=1; + break; + case CHARRQUOTE: + case CHARLQUOTE: + xputchar('"'); + break; + case CHARLSQUOTE: + xputchar('`'); + break; + case CHARRSQUOTE: + case CHARACUTE: + xputchar('\''); + break; + case BEGINTABLE: case ENDTABLE: + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + break; + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARBACKSLASH: + case CHARLT: + case CHARGT: + case CHARHAT: + case CHARVBAR: + case CHARAMP: + case CHARNBSP: + xputchar(cmd); break; + case CHARDAGGER: xputchar('+'); break; + case CHARBULLET: xputchar('*'); break; + case CHARPLUSMINUS: xputchar('+'); xputchar('-'); break; + case CHARCENT: xputchar('c'); break; + case CHARSECT: xputchar('S'); break; + case CHARCOPYR: xputchar('('); xputchar('C'); xputchar(')'); break; + case CHARNOT: xputchar('~'); break; + case CHARREGTM: xputchar('('); xputchar('R'); xputchar(')'); break; + case CHARDEG: xputchar('o'); break; + case CHAR14: xputchar('1'); xputchar('/'); xputchar('4'); break; + case CHAR12: xputchar('1'); xputchar('/'); xputchar('2'); break; + case CHAR34: xputchar('3'); xputchar('/'); xputchar('4'); break; + case CHARMUL: xputchar('X'); break; + case CHARDIV: xputchar('/'); break; + case ITAB: DefaultLine(cmd); break; + + + default: + /* nothing */ + break; + } +} + + + +void +Roff(enum command cmd) { + switch (cmd) { + case BEGINDOC: + I=1; + printf(".TH %s %s \"generated by PolyglotMan\" UCB\n", manName, manSect); + printf(".\\\" %s,\n", PROVENANCE); + printf(".\\\" %s\n", HOME); + CurLine=1; + break; + case BEGINBODY: printf(".LP\n"); break; + + case BEGINCOMMENT: + case ENDCOMMENT: + break; + case COMMENTLINE: printf("'\\\" "); break; + + case BEGINSECTHEAD: printf(".SH "); break; + case BEGINSUBSECTHEAD:printf(".SS "); break; + case BEGINBULPAIR: printf(".IP "); break; + case SHORTLINE: printf("\n.br"); break; + case BEGINBOLD: printf("\\fB"); break; /* \n.B -- grr! */ + case ENDCODE: + case ENDBOLD: printf("\\fR"); break; /* putchar('\n'); */ + case BEGINITALICS: printf("\\fI"); break; + case ENDITALICS: printf("\\fR"); break; + case BEGINCODE: + case BEGINBOLDITALICS:printf("\\f4"); break; + case ENDBOLDITALICS: printf("\\fR"); break; + + case CHARLQUOTE: printf("\\*(rq"); break; + case CHARRQUOTE: printf("\\*(lq"); break; + case CHARNBSP: printf("\\|"); break; + case CHARLSQUOTE: putchar('`'); break; + case CHARRSQUOTE: putchar('\''); break; + case CHARPERIOD: if (I==1) printf("\\&"); putchar('.'); I++; break; + case CHARDASH: printf("\\-"); break; + case CHARTAB: + case CHARLT: + case CHARGT: + case CHARHAT: + case CHARVBAR: + case CHARAMP: + putchar(cmd); break; + case CHARBULLET: printf("\\(bu"); break; + case CHARDAGGER: printf("\\(dg"); break; + case CHARPLUSMINUS: printf("\\(+-"); break; + case CHANGEBAR: putchar('|'); break; + case CHARCENT: printf("\\(ct"); break; + case CHARSECT: printf("\\(sc"); break; + case CHARCOPYR: printf("\\(co"); break; + case CHARNOT: printf("\\(no"); break; + case CHARREGTM: printf("\\(rg"); break; + case CHARDEG: printf("\\(de"); break; + case CHARACUTE: printf("\\(aa"); break; + case CHAR14: printf("\\(14"); break; + case CHAR12: printf("\\(12"); break; + case CHAR34: printf("\\(34"); break; + case CHARMUL: printf("\\(mu"); break; + case CHARDIV: printf("\\(di"); break; + case HR: /*printf("\n%s\n", horizontalrule);*/ break; + case CHARBACKSLASH: printf("\\\\"); break; /* correct? */ + + case BEGINLINE: + /*for (i=0; i<ncnt; i++) putchar('\n');*/ + break; + + case BEGINBULLET: putchar('"'); break; + case ENDBULLET: printf("\"\n"); break; + + case ENDLINE: + CurLine++; + I=1; + /* no break */ + case ENDSUBSECTHEAD: + case ENDSECTHEAD: + case ENDDOC: + putchar('\n'); + break; + + case BEGINCODEBLOCK: printf(".nf\n"); + case ENDCODEBLOCK: printf(".fi\n"); + + case ENDBODY: + case ENDBULPAIR: + case BEGINBULTXT: case ENDBULTXT: + case BEGINSECTION: case ENDSECTION: + case BEGINSUBSECTION: case ENDSUBSECTION: + case BEGINY: case ENDY: + case BEGINSC: case ENDSC: + case BEGINTABLE: case ENDTABLE: + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case BEGINMANREF: case ENDMANREF: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + + +/* + * HTML + */ + +void +HTML(enum command cmd) { + static int pre=0; + int i; + int lasttoc; + + /* always respond to these signals */ + switch (cmd) { + case CHARNBSP: printf(" "); I++; break; + case CHARTAB: printf("<tt> </tt> <tt> </tt> "); break; + case CHARLQUOTE: printf("“"); break; + case CHARRQUOTE: printf("”"); break; + case CHARLSQUOTE: printf("‘"); break; + case CHARRSQUOTE: printf("’"); break; + case CHARPERIOD: + case CHARDASH: + case CHARBACKSLASH: + case CHARVBAR: /*printf("¦"); -- broken bar no good */ + case CHARHAT: + putchar(cmd); + break; + case CHARDAGGER: printf("†"); break; + case CHARBULLET: if (I>0 || !finlist) printf("·"/*"·"*//*§--middot hardly visible*/); + break; + case CHARPLUSMINUS: printf("±"); break; + case CHARGT: printf(">"); break; + case CHARLT: printf("<"); break; + case CHARAMP: printf("&"); break; + case CHARCENT: printf("¢"); break; + case CHARSECT: printf("§"); break; + case CHARCOPYR: printf("©"); break; + case CHARNOT: printf("¬"); break; + case CHARREGTM: printf("®"); break; + case CHARDEG: printf("°"); break; + case CHARACUTE: printf("´"); break; + case CHAR14: printf("¼"); break; + case CHAR12: printf("½"); break; + case CHAR34: printf("¾"); break; + case CHARMUL: printf("×"); break; + case CHARDIV: printf("÷"); break; + default: + break; + } + + /* while in pre mode... */ + if (pre) { + switch (cmd) { + case ENDLINE: I=0; CurLine++; if (!fPara && scnt) printf("<br>"); printf("\n"); break; + case ENDTABLE: + if (fSource) { + printf("</table>\n"); + } else { + printf("</pre><br>\n"); pre=0; fQS=fIQS=fPara=1; + } + break; + case ENDCODEBLOCK: printf("</pre>"); pre=0; break; + case SHORTLINE: + case ENDBODY: + printf("\n"); + break; + default: + /* nothing */ + break; + } + return; + } + + /* usual operation */ + switch (cmd) { + case BEGINDOC: + /* escchars = ... => HTML doesn't backslash-quote metacharacters */ + printf("<!-- %s, -->\n", PROVENANCE); + printf("<!-- %s -->\n\n", HOME); + printf("<html>\n<head>\n"); +/* printf("<isindex>\n");*/ + /* better title possible? */ + printf("<title>"); printf(manTitle, manName, manSect); printf("\n"); + printf("\n\n"); + printf("%s

\n", TABLEOFCONTENTS); + I=0; + break; + case ENDDOC: + /* header and footer wanted? */ + printf("

\n"); + if (fHeadfoot) { + printf("


%s

\n", HEADERANDFOOTER); + for (i=0; i\n", cruft[i]); + } + + if (!tocc) { + /*printf("\n

ERROR: Empty man page

\n");*/ + } else { + printf("\n

\n"); + printf("%s

\n", TABLEOFCONTENTS); + printf("

    \n"); + for (i=0, lasttoc=BEGINSECTION; i\n"); + else printf("
\n"); + } + printf("
  • %s
  • \n", i, i, toc[i].text); + } + if (lasttoc==BEGINSUBSECTION) printf(""); + printf("\n"); + } + printf("\n\n"); + break; + case BEGINBODY: + printf("

    \n"); + break; + case ENDBODY: break; + + case BEGINCOMMENT: printf("\n\n"); break; + case COMMENTLINE: break; + + case BEGINSECTHEAD: + printf("\n

    ", tocc, tocc); + break; + case ENDSECTHEAD: + printf("

    \n"); + /* useful extraction from FILES, ENVIRONMENT? */ + break; + case BEGINSUBSECTHEAD: + printf("\n

    ", tocc, tocc); + break; + case ENDSUBSECTHEAD: + printf("

    \n"); + break; + case BEGINSECTION: break; + case ENDSECTION: + if (sectheadid==NAME && message!=NULL) printf(message); + break; + case BEGINSUBSECTION: break; + case ENDSUBSECTION: break; + + case BEGINBULPAIR: + if (listtype==OL) printf("\n
      \n"); + else if (listtype==UL) printf("\n
        \n"); + else printf("\n
        \n"); + break; + case ENDBULPAIR: + if (listtype==OL) printf("\n
    \n"); + else if (listtype==UL) printf("\n\n"); + else printf("
    \n"); + break; + case BEGINBULLET: + if (listtype==OL || listtype==UL) fcharout=0; + else printf("\n
    "); + break; + case ENDBULLET: + if (listtype==OL || listtype==UL) fcharout=1; + else printf("
    "); + break; + case BEGINBULTXT: + if (listtype==OL || listtype==UL) printf("
  • "); + else printf("\n
    "); + break; + case ENDBULTXT: + if (listtype==OL || listtype==UL) printf("
  • "); + else printf("\n"); + break; + + case BEGINLINE: + /* if (ncnt) printf("

    \n"); -- if haven't already generated structural tag */ + if (ncnt) printf("\n

    "); + + /* trailing spaces already trimmed off, so look for eol now */ + if (fCodeline) { + printf(""); + for (i=0; i
    "); fCodeline=0; } + I=0; CurLine++; if (!fPara && scnt) printf("
    "); printf("\n"); + break; + + case SHORTLINE: + if (fCodeline) { printf("
    "); fCodeline=0; } + if (!fIP) printf("
    \n"); + break; + + + case BEGINTABLE: + if (fSource) { + /*printf("

    \n");*/ + printf("
    \n"); + } else { + printf("
    \n"); pre=1; fQS=fIQS=fPara=0;
    +		}
    +		break;
    +	   case ENDTABLE:
    +		if (fSource) {
    +		  printf("
    \n"); + } else { + printf("
    \n"); pre=0; fQS=fIQS=fPara=1; + } + break; + case BEGINTABLELINE: printf(""); break; + case ENDTABLELINE: printf("\n"); break; + case BEGINTABLEENTRY: + printf("1) printf(" colspan=%d", tblcellspan); + printf("'>"); + break; + case ENDTABLEENTRY: + printf(""); + break; + + /* something better with CSS */ + case BEGININDENT: printf("
    "); break; + case ENDINDENT: printf("
    \n"); break; + + case FONTSIZE: + /* HTML font step sizes are bigger than troff's */ + if ((fontdelta+=intArg)!=0) printf("", (intArg>0)?'+':'-'); else printf("\n"); + break; + + case BEGINBOLD: printf(""); break; + case ENDBOLD: printf(""); break; + case BEGINITALICS: printf(""); break; + case ENDITALICS: printf(""); break; + case BEGINBOLDITALICS: + case BEGINCODE: printf(""); break; + case ENDBOLDITALICS: + case ENDCODE: printf(""); break; + case BEGINCODEBLOCK: printf("
    "); pre=1; break;	/* wrong for two-column lists in kermit.1, pine.1, perl4.1 */
    +	   case ENDCODEBLOCK:	printf("
    "); pre=0; break; + case BEGINCENTER: printf("
    "); break; + case ENDCENTER: printf("
    "); break; + case BEGINMANREF: + manrefextract(hitxt); + if (fmanRef) { printf(""); } + else printf(""); + break; + case ENDMANREF: + if (fmanRef) printf("\n"); else printf(""); + break; + case HR: printf("\n
    \n"); break; + + /* U (was B, I), strike -- all temporary until HTML 4.0's INS and DEL widespread */ + case BEGINDIFFA: printf(""); break; + case ENDDIFFA: printf(""); break; + case BEGINDIFFD: printf(""); break; + case ENDDIFFD: printf(""); break; + + case BEGINSC: case ENDSC: + case BEGINY: case ENDY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case CHANGEBAR: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + + +/* + * DocBook XML + * improvements by Aaron Hawley applied 2003 June 5 + * + * N.B. The framework for XML is in place but not done. If you + * are familiar with the DocBook DTD, however, it shouldn't be + * too difficult to finish it. If you do so, please send your + * code to me so that I may share the wealth in the next release. + */ + +const char *DOCBOOKPATH = "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd"; + +void +XML(enum command cmd) { + static int pre=0; + int i; + int lasttoc; + char *p; + static int fRefEntry=0; + static int fRefPurpose=0; + /*static char *bads => XML doesn't backslash-quote metacharacters */ + +/* +*/ + + /* always respond to these signals */ + switch (cmd) { + case CHARLQUOTE: case CHARRQUOTE: printf("""); break; + case CHARBULLET: printf("•"); break; + case CHARDAGGER: printf("†"); break; + case CHARPLUSMINUS: printf("±"); break; + case CHARCOPYR: printf("©"); break; + case CHARNOT: printf("¬"); break; + case CHARMUL: printf("×"); break; + case CHARDIV: printf("÷"); break; + case CHARAMP: printf("&"); break; + case CHARDASH: + if (sectheadid==NAME && !fRefPurpose) { + printf(""); + fRefPurpose=1; + } else putchar('-'); + break; + case CHARBACKSLASH: putchar('\\'); break; + case CHARGT: printf(">"); break; + case CHARLT: printf("<"); break; + case CHARLSQUOTE: + case CHARRSQUOTE: + case CHARPERIOD: + case CHARTAB: + case CHARHAT: + case CHARVBAR: + case CHARNBSP: + case CHARCENT: + case CHARSECT: + case CHARREGTM: + case CHARDEG: + case CHARACUTE: + case CHAR14: + case CHAR12: + case CHAR34: + putchar(cmd); + break; + default: + break; + } + + /* while in pre mode... */ + if (pre) { + switch (cmd) { + case ENDLINE: I=0; CurLine++; if (!fPara && scnt) putchar(' '); break; + case ENDTABLE: + if (fSource) printf("\n"); + else { printf("\n"); pre=0; fQS=fIQS=fPara=1; } + break; + default: + /* nothing */ + break; + } + return; + } + + /* usual operation */ + switch (cmd) { + case BEGINDOC: + printf("\n\n", DOCBOOKPATH); + + printf("\n"); + + printf("\n\n",HOME); + /* better title possible? */ + for (p=manName; *p; p++) *p = tolower(*p); + printf("\n", manName, manSect); + printf("\n%s\n", manName); + printf("%s\n\n\n", manSect); + + I=0; + break; + + case ENDDOC: + /* header and footer wanted? */ + if (fHeadfoot) { + printf("\n\n%s\n", HEADERANDFOOTER); + for (i=0; i%s\n", cruft[i]); + printf("\n"); + } + + /* table of contents, such as found in HTML, can be generated automatically by XML software */ + + printf("\n"); + break; + case BEGINBODY: + if (fPara) printf("\n"); + printf(""); fPara = 1; + break; + case ENDBODY: + if (fRefPurpose) { printf(""); fRefPurpose=0; } + else { printf("\n"); fPara=0; } + break; + + case BEGINCOMMENT: printf("\n\n"); break; + case COMMENTLINE: break; + + case BEGINSECTHEAD: + case BEGINSUBSECTHEAD: + if (sectheadid != NAME && sectheadid != SYNOPSIS) printf(""); + break; + case ENDSECTHEAD: + case ENDSUBSECTHEAD: + if (sectheadid == NAME) printf("<refname>"); + else if (sectheadid == SYNOPSIS) {} + else { printf("\n"); fPara=1; } + break; + + case BEGINSECTION: + if (sectheadid==NAME) printf("\n"); + /*printf(""); -- do lotsa parsing here for RefName, RefPurpose*/ + else if (sectheadid==SYNOPSIS) printf("\n\n"); + else printf("\n\n"); + break; + case ENDSECTION: + if (sectheadid==NAME) { + if (fRefPurpose) { printf(""); fRefPurpose=0; } + printf("\n\n\n"); + } else if (sectheadid==SYNOPSIS) printf("\n\n\n"); + else { + if (fPara) { printf("\n"); fPara=0; } + printf("\n\n"); + } + break; + + case BEGINSUBSECTION: printf("\n"); break; + case ENDSUBSECTION: printf("\n"); break; + + /* need to update this for enumerated and plain lists */ + case BEGINBULPAIR: printf("\n"); break; + case ENDBULPAIR: printf("\n"); break; + case BEGINBULLET: printf(""); break; + case ENDBULLET: printf("\n"); break; + case BEGINBULTXT: printf("\n"); break; + case ENDBULTXT: printf("\n\n"); break; + + case BEGINLINE: + /* remember, get BEGINBODY call at start of paragraph */ + if (fRefEntry) { + if (fRefPurpose) { + for (p=plain; *p!='-'; p++) { + /* nothing?! */ + } + } + } + + break; + + case ENDLINE: + /*if (fCodeline) { fIQS=1; fCodeline=0; }*/ + if (fCodeline) { printf(""); fCodeline=0; } /* */ + I=0; CurLine++; if (!fPara && scnt) printf(""); else putchar(' '); + break; + + case SHORTLINE: + if (fCodeline) { printf(""); fCodeline=0; } + if (!fIP && !fPara) printf("\n"); + break; + + case BEGINTABLE: + if (fSource) printf("\n"); + else { printf("\n"); pre=1; fQS=fIQS=fPara=0; } + break; + case ENDTABLE: + if (fSource) printf("
    \n"); + else { printf("\n"); pre=0; fQS=fIQS=fPara=1; } + break; + case BEGINTABLELINE: printf(""); break; + case ENDTABLELINE: printf("\n"); break; + case BEGINTABLEENTRY: printf(""); break; + case ENDTABLEENTRY: printf(""); break; + + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + break; + + /* have to make some guess about bold and italics */ + case BEGINBOLD: printf(""); break; + case ENDBOLD: printf(""); break; + case BEGINITALICS: printf(""); break; /* could be literal or arg */ + case ENDITALICS: printf(""); break; + case BEGINBOLDITALICS: case BEGINCODE: printf(""); break; + case ENDBOLDITALICS: case ENDCODE: printf(""); break; + case BEGINMANREF: + manrefextract(hitxt); + if (fmanRef) { printf(""); } + break; + case ENDMANREF: + if (fmanRef) printf(""); + break; + + case HR: + case BEGINSC: case ENDSC: + case BEGINY: case ENDY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case CHANGEBAR: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + + +/* generates MIME compliant to RFC 1563 */ + +void +MIME(enum command cmd) { + static int pre=0; + int i; + + /* always respond to these signals */ + switch (cmd) { + case CHARDASH: + case CHARAMP: + case CHARPERIOD: + case CHARTAB: + putchar(cmd); break; + case CHARLSQUOTE: putchar('`'); break; + case CHARACUTE: + case CHARRSQUOTE: putchar('\''); break; + case CHARBULLET: putchar('*'); break; + case CHARDAGGER: putchar('|'); break; + case CHARPLUSMINUS: printf("+-"); break; + case CHARNBSP: putchar(' '); break; + case CHARCENT: putchar('c'); break; + case CHARSECT: putchar('S'); break; + case CHARCOPYR: printf("(C)"); break; + case CHARNOT: putchar('~'); break; + case CHARREGTM: printf("(R)"); break; + case CHARDEG: putchar('o'); break; + case CHAR14: printf("1/4"); break; + case CHAR12: printf("1/2"); break; + case CHAR34: printf("3/4"); break; + case CHARMUL: putchar('X'); break; + case CHARDIV: putchar('/'); break; + case CHARLQUOTE: + case CHARRQUOTE: + putchar('"'); + break; + case CHARBACKSLASH: /* these should be caught as escaped chars */ + case CHARGT: + case CHARLT: + assert(1); + break; + default: + break; + } + + /* while in pre mode... */ + if (pre) { + switch (cmd) { + case ENDLINE: I=0; CurLine++; if (!fPara && scnt) printf("\n\n"); break; + case ENDTABLE: printf("\n\n"); pre=0; fQS=fIQS=fPara=1; break; + default: + /* nothing */ + break; + } + return; + } + + /* usual operation */ + switch (cmd) { + case BEGINDOC: + printf("Content-Type: text/enriched\n"); + printf("Text-Width: 60\n"); + escchars = "<>\\"; + + I=0; + break; + case ENDDOC: + /* header and footer wanted? */ + printf("\n\n"); + if (fHeadfoot) { + printf("\n"); + MIME(BEGINSECTHEAD); printf("%s",HEADERANDFOOTER); MIME(ENDSECTHEAD); + for (i=0; i\n"); + printf("%s\n%s\n", PROVENANCE, HOME); + printf("\n\n"); +*/ + +/* + printf("\n

    \n"); + printf("%s

    \n", TABLEOFCONTENTS); + printf("

      \n"); + for (i=0, lasttoc=BEGINSECTION; i\n"); + else printf("
    \n"); + } + printf("
  • %s
  • \n", i, i, toc[i].text); + } + if (lasttoc==BEGINSUBSECTION) printf(""); + printf("\n"); + printf("\n"); +*/ + break; + case BEGINBODY: + printf("\n\n"); + break; + case ENDBODY: break; + + case BEGINCOMMENT: fcharout=0; break; + case ENDCOMMENT: fcharout=1; break; + case COMMENTLINE: break; + + case BEGINSECTHEAD: + printf("\n"); + /*A NAME=\"sect%d\" HREF=\"#toc%d\">

    ", tocc, tocc);*/ + break; + case ENDSECTHEAD: + printf("\n\n"); + /* useful extraction from files, environment? */ + break; + case BEGINSUBSECTHEAD: + printf(""); + /*\n

    ", tocc, tocc);*/ + break; + case ENDSUBSECTHEAD: + printf("\n\n"); + break; + case BEGINSECTION: + case BEGINSUBSECTION: + break; + case ENDSECTION: + case ENDSUBSECTION: + printf("\n"); + break; + + case BEGINBULPAIR: break; + case ENDBULPAIR: break; + case BEGINBULLET: printf(""); break; + case ENDBULLET: printf("\t"); break; + case BEGINBULTXT: + case BEGININDENT: + printf(""); + break; + case ENDBULTXT: + case ENDINDENT: + printf("\n"); + break; + + case FONTSIZE: + if ((fontdelta+=intArg)==0) { + if (intArg>0) printf(""); else printf(""); + } else { + if (intArg>0) printf(""); else printf(""); + } + break; + + case BEGINLINE: /*if (ncnt) printf("\n\n");*/ break; + case ENDLINE: I=0; CurLine++; printf("\n"); break; + case SHORTLINE: if (!fIP) printf("\n\n"); break; + case BEGINTABLE: printf("\n"); pre=1; fQS=fIQS=fPara=0; break; + case ENDTABLE: printf("\n"); pre=0; fQS=fIQS=fPara=1; break; + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + break; + /* could use a new list type */ + + case BEGINBOLD: printf(""); break; + case ENDBOLD: printf(""); break; + case BEGINITALICS: printf(""); break; + case ENDITALICS: printf(""); break; + case BEGINCODE: + case BEGINBOLDITALICS:printf(""); break; + case ENDCODE: + case ENDBOLDITALICS: printf(""); break; + case BEGINMANREF: + printf("blue"); +/* how to make this hypertext? + manrefextract(hitxt); + if (fmanRef) { printf("\n"); } + else printf(""); + break; +*/ + break; + case ENDMANREF: + printf(""); + break; + + case HR: printf("\n\n%s\n\n", horizontalrule); break; + + case BEGINSC: case ENDSC: + case BEGINY: case ENDY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case CHANGEBAR: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + + +/* + * LaTeX + */ + +void +LaTeX(enum command cmd) { + + switch (cmd) { + case BEGINDOC: + escchars = "$&%#_{}"; /* and more to come? */ + printf("%% %s,\n", PROVENANCE); + printf("%% %s\n\n", HOME); + /* definitions */ + printf( + "\\documentstyle{article}\n" + "\\def\\thefootnote{\\fnsymbol{footnote}}\n" + "\\setlength{\\parindent}{0pt}\n" + "\\setlength{\\parskip}{0.5\\baselineskip plus 2pt minus 1pt}\n" + "\\begin{document}\n" + ); + I=0; + break; + case ENDDOC: + /* header and footer wanted? */ + printf("\n\\end{document}\n"); + + break; + case BEGINBODY: + printf("\n\n"); + break; + case ENDBODY: break; + + case BEGINCOMMENT: + case ENDCOMMENT: + break; + case COMMENTLINE: printf("%% "); break; + + + case BEGINSECTION: break; + case ENDSECTION: break; + case BEGINSECTHEAD: printf("\n\\section{"); tagc=0; break; + case ENDSECTHEAD: + printf("}"); +/* + if (CurLine==1) printf("\\footnote{" + "\\it conversion to \\LaTeX\ format by PolyglotMan " + "available via anonymous ftp from {\\tt ftp.berkeley.edu:/ucb/people/phelps/tcltk}}" + ); +*/ + /* useful extraction from files, environment? */ + printf("\n"); + break; + case BEGINSUBSECTHEAD:printf("\n\\subsection{"); break; + case ENDSUBSECTHEAD: + printf("}"); + break; + case BEGINSUBSECTION: break; + case ENDSUBSECTION: break; + case BEGINBULPAIR: printf("\\begin{itemize}\n"); break; + case ENDBULPAIR: printf("\\end{itemize}\n"); break; + case BEGINBULLET: printf("\\item ["); break; + case ENDBULLET: printf("] "); break; + case BEGINLINE: /*if (ncnt) printf("\n\n");*/ break; + case ENDLINE: I=0; putchar('\n'); CurLine++; break; + case BEGINTABLE: printf("\\begin{verbatim}\n"); break; + case ENDTABLE: printf("\\end{verbatim}\n"); break; + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + break; + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + break; + case SHORTLINE: if (!fIP) printf("\n\n"); break; + case BEGINBULTXT: break; + case ENDBULTXT: putchar('\n'); break; + + case CHARLQUOTE: printf("``"); break; + case CHARRQUOTE: printf("''"); break; + case CHARLSQUOTE: + case CHARRSQUOTE: + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARNBSP: + putchar(cmd); break; + case CHARBACKSLASH: printf("$\\backslash$"); break; + case CHARGT: printf("$>$"); break; + case CHARLT: printf("$<$"); break; + case CHARHAT: printf("$\\char94{}$"); break; + case CHARVBAR: printf("$|$"); break; + case CHARAMP: printf("\\&"); break; + case CHARBULLET: printf("$\\bullet$ "); break; + case CHARDAGGER: printf("\\dag "); break; + case CHARPLUSMINUS: printf("\\pm "); break; + case CHARCENT: printf("\\hbox{\\rm\\rlap/c}"); break; + case CHARSECT: printf("\\S "); break; + case CHARCOPYR: printf("\\copyright "); break; + case CHARNOT: printf("$\\neg$"); break; + case CHARREGTM: printf("(R)"); break; + case CHARDEG: printf("$^\\circ$"); break; + case CHARACUTE: putchar('\''); break; + case CHAR14: printf("$\\frac{1}{4}$"); break; + case CHAR12: printf("$\\frac{1}{2}$"); break; + case CHAR34: printf("$\\frac{3}{4}$"); break; + case CHARMUL: printf("\\times "); break; + case CHARDIV: printf("\\div "); break; + + case BEGINCODE: + case BEGINBOLD: printf("{\\bf "); break; + case BEGINSC: printf("{\\sc "); break; + case BEGINITALICS: printf("{\\it "); break; + case BEGINBOLDITALICS:printf("{\\bf\\it "); break; + case BEGINMANREF: printf("{\\sf "); break; + case ENDCODE: + case ENDBOLD: + case ENDSC: + case ENDITALICS: + case ENDBOLDITALICS: + case ENDMANREF: + putchar('}'); + break; + case HR: /*printf("\n%s\n", horizontalrule);*/ break; + + case BEGINY: case ENDY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case CHANGEBAR: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + +void +LaTeX2e(enum command cmd) { + + switch (cmd) { + /* replace selected commands ... */ + case BEGINDOC: + escchars = "$&%#_{}"; + printf("%% %s,\n", PROVENANCE); + printf("%% %s\n\n", HOME); + /* definitions */ + printf( + "\\documentclass{article}\n" + "\\def\\thefootnote{\\fnsymbol{footnote}}\n" + "\\setlength{\\parindent}{0pt}\n" + "\\setlength{\\parskip}{0.5\\baselineskip plus 2pt minus 1pt}\n" + "\\begin{document}\n" + ); + I=0; + break; + case BEGINCODE: + case BEGINBOLD: printf("\\textbf{"); break; + case BEGINSC: printf("\\textsc{"); break; + case BEGINITALICS: printf("\\textit{"); break; + case BEGINBOLDITALICS:printf("\\textbf{\\textit{"); break; + case BEGINMANREF: printf("\\textsf{"); break; + case ENDBOLDITALICS: printf("}}"); break; + + /* ... rest same as old LaTeX */ + default: + LaTeX(cmd); + } +} + + + +/* + * Rich Text Format (RTF) + */ + +/* RTF could use more work */ + +void +RTF(enum command cmd) { + + switch (cmd) { + case BEGINDOC: + escchars = "{}"; + /* definitions */ + printf( + /* fonts */ + "{\\rtf1\\deff2 {\\fonttbl" + "{\\f20\\froman Times;}{\\f150\\fnil I Times Italic;}" + "{\\f151\\fnil B Times Bold;}{\\f152\\fnil BI Times BoldItalic;}" + "{\\f22\\fmodern Courier;}{\\f23\\ftech Symbol;}" + "{\\f135\\fnil I Courier Oblique;}{\\f136\\fnil B Courier Bold;}{\\f137\\fnil BI Courier BoldOblique;}" + "{\\f138\\fnil I Helvetica Oblique;}{\\f139\\fnil B Helvetica Bold;}}" + "\n" + + /* style sheets */ + "{\\stylesheet{\\li720\\sa120 \\f20 \\sbasedon222\\snext0 Normal;}" + "{\\s2\\sb200\\sa120 \\b\\f3\\fs20 \\sbasedon0\\snext2 section head;}" + "{\\s3\\li180\\sa120 \\b\\f20 \\sbasedon0\\snext3 subsection head;}" + "{\\s4\\fi-1440\\li2160\\sa240\\tx2160 \\f20 \\sbasedon0\\snext4 detailed list;}}" + "\n" + +/* more header to come--do undefined values default to nice values? */ + ); + I=0; + break; + case ENDDOC: + /* header and footer wanted? */ + printf("\\par{\\f150 %s,\n%s}", PROVENANCE, HOME); + printf("}\n"); + break; + case BEGINBODY: + printf("\n\n"); + break; + case ENDBODY: + CurLine++; + printf("\\par\n"); + tagc=0; + break; + + case BEGINCOMMENT: fcharout=0; break; + case ENDCOMMENT: fcharout=1; break; + case COMMENTLINE: break; + + case BEGINSECTION: break; + case ENDSECTION: printf("\n\\par\n"); break; + case BEGINSECTHEAD: printf("{\\s2 "); tagc=0; break; + case ENDSECTHEAD: + printf("}\\par"); + /* useful extraction from files, environment? */ + printf("\n"); + break; + case BEGINSUBSECTHEAD:printf("{\\s3 "); break; + case ENDSUBSECTHEAD: + printf("}\\par\n"); + break; + case BEGINSUBSECTION: break; + case ENDSUBSECTION: break; + case BEGINLINE: /*if (ncnt) printf("\n\n");*/ break; + case ENDLINE: I=0; putchar(' '); /*putchar('\n'); CurLine++;*/ break; + case SHORTLINE: if (!fIP) printf("\\line\n"); break; + case BEGINBULPAIR: printf("{\\s4 "); break; + case ENDBULPAIR: printf("}\\par\n"); break; + case BEGINBULLET: break; + case ENDBULLET: printf("\\tab "); fcharout=0; break; + case BEGINBULTXT: fcharout=1; break; + case ENDBULTXT: break; + + case CHARLQUOTE: printf("``"); break; + case CHARRQUOTE: printf("''"); break; + case CHARLSQUOTE: + case CHARRSQUOTE: + case CHARPERIOD: + case CHARTAB: + case CHARDASH: + case CHARBACKSLASH: + case CHARGT: + case CHARLT: + case CHARHAT: + case CHARVBAR: + case CHARAMP: + case CHARNBSP: + case CHARCENT: + case CHARSECT: + case CHARCOPYR: + case CHARNOT: + case CHARREGTM: + case CHARDEG: + case CHARACUTE: + case CHAR14: + case CHAR12: + case CHAR34: + case CHARMUL: + case CHARDIV: + putchar(cmd); break; + case CHARBULLET: printf("\\bullet "); break; + case CHARDAGGER: printf("\\dag "); break; + case CHARPLUSMINUS: printf("\\pm "); break; + + case BEGINCODE: + case BEGINBOLD: printf("{\\b "); break; + case BEGINSC: printf("{\\fs20 "); break; + case BEGINITALICS: printf("{\\i "); break; + case BEGINBOLDITALICS:printf("{\\b \\i "); break; + case BEGINMANREF: printf("{\\f22 "); break; + case ENDBOLD: + case ENDCODE: + case ENDSC: + case ENDITALICS: + case ENDBOLDITALICS: + case ENDMANREF: + putchar('}'); + break; + case HR: printf("\n%s\n", horizontalrule); break; + + case BEGINY: case ENDY: + case BEGINHEADER: case ENDHEADER: + case BEGINFOOTER: case ENDFOOTER: + case BEGINTABLE: case ENDTABLE: + case BEGINTABLELINE: case ENDTABLELINE: case BEGINTABLEENTRY: case ENDTABLEENTRY: + case BEGININDENT: case ENDINDENT: + case FONTSIZE: + case CHANGEBAR: + /* nothing */ + break; + default: + DefaultPara(cmd); + } +} + + + +/* + * pointers to existing tools + */ + +void +PostScript(enum command cmd) { + fprintf(stderr, "Use groff or psroff to generate PostScript.\n"); + exit(1); +} + + +void +FrameMaker(enum command cmd) { + fprintf(stderr, "FrameMaker comes with filters that convert from roff to MIF.\n"); + exit(1); +} + + + + +/* + * Utilities common to both parses + */ + + +/* + level 0: DOC - need match + level 1: SECTION - need match + level 2: SUBSECTION | BODY | BULLETPAIR + level 3: BODY (within SUB) | BULLETPAIR (within SUB) | BULTXT (within BULLETPAIR) + level 4: BULTXT (within BULLETPAIR within SUBSECTION) + + never see: SECTHEAD, SUBSECTHEAD, BULLET +*/ + +int Psect=0, Psub=0, Pbp=0, Pbt=0, Pb=0, Pbul=0; + +void +pop(enum command cmd) { + assert(cmd==ENDINDENT || cmd==BEGINBULLET || cmd==BEGINBULTXT || cmd==BEGINBULPAIR || cmd==BEGINBODY || cmd==BEGINSECTION || cmd==BEGINSUBSECTION || cmd==ENDDOC); +/* + int i; + int p; + int match; + + p=cmdp-1; + for (i=cmdp-1;i>=0; i--) + if (cmd==cmdstack[i]) { match=i; break; } +*/ + + /* if match, pop off all up to and including match */ + /* otherwise, pop off one level*/ + + if (Pbul) { + (*fn)(ENDBULLET); Pbul=0; + if (cmd==BEGINBULLET) return; + } /* else close off ENDBULTXT */ + + if (Pbt) { (*fn)(ENDBULTXT); Pbt=0; } + if (cmd==BEGINBULTXT || cmd==BEGINBULLET) return; + + if (Pb && cmd==BEGINBULPAIR) { (*fn)(ENDBODY); Pb=0; } /* special */ + if (Pbp) { (*fn)(ENDBULPAIR); Pbp=0; } + if (cmd==BEGINBULPAIR || cmd==ENDINDENT) return; + + if (Pb) { (*fn)(ENDBODY); Pb=0; } + if (cmd==BEGINBODY) return; + + if (Psub) { (*fn)(ENDSUBSECTION); Psub=0; } + if (cmd==BEGINSUBSECTION) return; + + if (Psect) { (*fn)(ENDSECTION); Psect=0; } + if (cmd==BEGINSECTION) return; +} + + +void +poppush(enum command cmd) { + assert(cmd==ENDINDENT || cmd==BEGINBULLET || cmd==BEGINBULTXT || cmd==BEGINBULPAIR || cmd==BEGINBODY || cmd==BEGINSECTION || cmd==BEGINSUBSECTION); + + pop(cmd); + + switch (cmd) { + case BEGINBULLET: Pbul=1; break; + case BEGINBULTXT: Pbt=1; break; + case BEGINBULPAIR: Pbp=1; break; + case BEGINBODY: Pb=1; break; + case BEGINSECTION: Psect=1; break; + case BEGINSUBSECTION: Psub=1; break; + default: + if (!fQuiet) fprintf(stderr, "poppush: unrecognized code %d\n", cmd); + } + + (*fn)(cmd); + prevcmd = cmd; +} + + + +/* + * PREFORMATTED PAGES PARSING + */ + +/* wrapper for getchar() that expands tabs, and sends maximum of n=40 consecutive spaces */ + +int +getchartab(void) { + static int tabexp = 0; + static int charinline = 0; + static int cspccnt = 0; + char c; + + c = lookahead; + if (tabexp) tabexp--; + else if (c=='\n') { + charinline=0; + cspccnt=0; + } else if (c=='\t') { + tabexp = TabStops-(charinline%TabStops); if (tabexp==TabStops) tabexp=0; + lookahead = c = ' '; + } else if (cspccnt>=40) { + if (*in==' ') { + while (*in==' '||*in=='\t') in++; + in--; + } + cspccnt=0; + } + + if (!tabexp && lookahead) lookahead = *in++; + if (c=='\b') charinline--; else charinline++; + if (c==' ') cspccnt++; + return c; +} + + +/* replace gets. handles hyphenation too */ +char * +la_gets(char *buf) { + static char la_buf[MAXBUF]; /* can lookahead a full line, but nobody does now */ + static int fla=0, hy=0; + char *ret,*p; + int c,i; + + assert(buf!=NULL); + + if (fla) { + /* could avoid copying if callers used return value */ + strcpy(buf,la_buf); fla=0; + ret=buf; /* correct? */ + } else { + /*ret=gets(buf); -- gets is deprecated (since it can read too much?) */ + /* could do this... + ret=fgets(buf, MAXBUF, stdin); + buf[strlen(buf)-1]='\0'; + ... but don't want to have to rescan line with strlen, so... */ + + i=0; p=buf; + + /* recover spaces if re-linebreaking */ + for (; hy; hy--) { *p++=' '; i++; } + + while (lookahead && (c=getchartab())!='\n' && ibuf && p[-1]=='-' && isspace(lookahead)) { + p--; /* zap hyphen */ + /* zap boldfaced hyphens, gr! */ + while (p[-1]=='\b' && p[-2]=='-') p-=2; + + /* start getting next line, spaces first ... */ + while (lookahead && isspace(lookahead) && lookahead!='\n') { getchartab(); hy++; } + + /* ... append next nonspace string to previous ... */ + while (lookahead && !isspace(lookahead) && i++=3 spaces) */ +int phraselen; + +void +filterline(char *buf, char *plain) { + char *p,*q,*r; + char *ph; + int iq; + int i,j; + int hl=-1, hl2=-1; + int iscnt=0; /* interword space count */ + int tagci; + int I0; + int etype; + int efirst; + enum tagtype tag = NOTAG; + int esccode; + + assert(buf!=NULL && plain!=NULL); + + etype=NOTAG; + efirst=-1; + tagci=tagc; + ph=phrase; phraselen=0; + scnt=scnt2=0; + s_sum=s_cnt=0; + bs_sum=bs_cnt=0; + ccnt=0; + spcsqz=0; + + /* strip only certain \x1b's and only at very beginning of line */ + for (p=buf; *p=='\x1b' && (p[1]=='8'||p[1]=='9'); p+=2) + /* nop */; + + strcpy(plain,p); + q=&plain[strlen(p)]; + + /*** spaces and change bars ***/ + for (scnt=0,p=plain; *p==' '; p++) scnt++; /* initial space count */ + if (scnt>200) scnt=130-(q-p); + + assert(*q=='\0'); + q--; + if (fChangeleft) + for (; q-40>plain && *q=='|'; q--) { /* change bars */ + if (fChangeleft!=-1) ccnt++; + while (q-2>=plain && q[-1]=='\b' && q[-2]=='|') q-=2; /* boldface changebars! */ + } + + /*if (q!=&plain[scnt-1])*/ /* zap trailing spaces */ + for (; *q==' ' && q>plain; q--) /* nop */; + + /* second changebar way out east! HACK HACK HACK */ + if (q-plain>100 && *q=='|') { + while (*q=='|' && q>plain) { q--; if (fChangeleft!=-1) ccnt++; } + while ((*q==' ' || *q=='_' || *q=='-') && q>plain) q--; + } + + for (r=q; (*r&0xff)==CHARDAGGER; r--) *r='-'; /* convert daggers at end of line to hyphens */ + + if (q-plain < scnt) scnt = q-plain+1; + q[1]='\0'; + + /* set I for tags below */ + if (indent>=0 && scnt>=indent) scnt-=indent; + if (!fPara && !fIQS) { + if (fChangeleft) I+=(scnt>ccnt)?scnt:ccnt; + else I+=scnt; + } + I0=I; + + /*** tags and filler spaces ***/ + + iq=0; falluc=1; + for (q=plain; *p; p++) { + + iscnt=0; + if (*p==' ') { + for (r=p; *r==' '; r++) { iscnt++; spcsqz++; } + s_sum+=iscnt; s_cnt++; + if (iscnt>1 && !scnt2 && *p==' ') scnt2=iscnt; + if (iscnt>2) { bs_cnt++; bs_sum+=iscnt; } /* keep track of large gaps */ + iscnt--; /* leave last space for tail portion of loop */ + + /* write out spaces */ + if (fQS && iscnt<3) { p=r-1; iscnt=0; } /* reduce strings of <3 spaces to 1 */ + /* else if (fQS && iscnt>=3) { replace with tab? } */ + else { + for (i=0; iplain && q[-1]=='+') { + /* bold plus/minus(!) */ + q[-1]=c_plusminus; + while (*p=='\b' && p[1]=='_') p+=2; + continue; + } else if ((*p=='_' && p[1]=='\b' && p[2]!='_' && p[3]!='\b') + || (*p=='\b' && p[1]=='_')) { + /* italics */ + if (tag!=ITALICS && hl>=0) { tagadd(tag, hl, I+iq); hl=-1; } + if (hl==-1) hl=I+iq; + tag=ITALICS; + p+=2; + } else if (*p=='_' && p[2]==p[4] && p[1]=='\b' && p[3]=='\b' && p[2]!='_') { + /* bold italics (for Solaris) */ + for (p+=2; *p==p[2] && p[1]=='\b';) p+=2; + if (tag!=BOLDITALICS && hl>=0) { tagadd(tag, hl, I+iq); hl=-1; } + if (hl==-1) hl=I+iq; + tag=BOLDITALICS; + } else if (*p==p[2] && p[1]=='\b') { + /* boldface */ + while (*p==p[2] && p[1]=='\b') p+=2; + if (tag!=BOLD && hl>=0) { tagadd(tag, hl, I+iq); hl=-1; } + if (hl==-1) hl=I+iq; + tag=BOLD; + } else if (p[1]=='\b' && + ((*p=='o' && p[2]=='+') || + (*p=='+' && p[2]=='o')) ) { + /* bullets */ + p+=2; + while (p[1]=='\b' && (*p=='o' || p[2]=='+') ) p+=2; /* bold bullets(!) */ + *q++=c_bullet; iq++; + continue; + } else if (*p=='\b' && p>plain && p[-1]=='o' && p[1]=='+') { + /* OSF bullets */ + while (*p=='\b' && p[1]=='+') p+=2; /* bold bullets(!) */ + q[-1]=c_bullet; p--; + continue; + } else if (p[1]=='\b' && *p=='+' && p[2]=='_') { + /* plus/minus */ + p+=2; + *q++=c_plusminus; iq++; + continue; + } else if (p[1]=='\b' && *p=='|' && p[2]=='-') { + /* dagger */ + *q++=c_dagger; iq++; + p+=2; continue; + } else if (*p=='\b') { + /* supress unattended backspaces */ + continue; + } else if (*p=='\x1b') { + p++; + if (*p=='[' && isdigit(p[1])) { /* 0/1/22/24/.../8/9/... */ + esccode=0; for (p++; isdigit(*p); p++) esccode = esccode * 10 + *p - '0'; + + if (efirst>=0 /*&& (esccode==0 || esccode==1 || esccode==4 || esccode==22 || esccode==24) /*&& hl>=0 && hl2==-1 && tags[MAXTAGS].first=0 || isupper(p[1]) || (p[1]=='_' && p[2]!='\b') || p[1]=='&')) { + if (hl==-1 && efirst==-1) { hl=I+iq; tag=SMALLCAPS; } + } else { + /* end of tag, one way or another */ + /* collect tags in this pass, interspersed later if need be */ + /* can't handle overlapping tags */ + if (hl>=0) { + if (hl2==-1) tagadd(tag, hl, I+iq); + hl=-1; + } + } + + /** non-backspace related filtering **/ + /* case statement here in place of if chain? */ +/* Tk 3.x's text widget tabs too crazy + if (*p==' ' && strncmp(" ",p,5)==0) { + xputchar('\t'); i+=5-1; ci++; continue; + } else +*/ +/* copyright symbol: too much work for so little + if (p[i]=='o' && (strncmp("opyright (C) 19",&p[i],15)==0 + || strncmp("opyright (c) 19",&p[i],15)==0)) { + printf("opyright \xd3 19"); + tagadd(SYMBOL, ci+9, ci+10); + i+=15-1; ci+=13; continue; + } else +*/ + if (*p=='(' && q>plain && (isalnum(q[-1])||strchr(manvalid/*"._-+"*/,q[-1])!=NULL) + && strcoloncmp(&p[1],')',vollist) + /* && p[1]!='s' && p[-1]!='`' && p[-1]!='\'' && p[-1]!='"'*/ ) { + hl2=I+iq; + for (r=q-1; r>=plain && (isalnum(*r)||strchr(manvalid/*"._-+:"*/,*r)!=NULL); r--) + hl2--; + /* else ref to a function? */ + /* maybe save position of opening paren so don't highlight it later */ + } else if (*p==')' && hl2!=-1) { + /* don't overlap tags on man page references */ + while (tagc>0 && tags[tagc-1].last>hl2) tagc--; + tagadd(MANREF, hl2, I+iq+1); + hl2=hl=-1; + } else if (hl2!=-1) { + /* section names are alphanumic or '+' for C++ */ + if (!isalnum(*p) && *p!='+') hl2=-1; + } + + + /*assert(*p!='\0');*/ + if (!*p) break; /* not just safety check -- check out sgmls.1 */ + + *q++=*p; +/* falluc = falluc && (isupper(*p) || isspace(*p) || isdigit(*p) || strchr("-+&_'/()?!.,;",*p)!=NULL);*/ + falluc = falluc && !islower(*p); + if (!scnt2) { *ph++=*p; phraselen++; } + iq+=iscnt+1; + } + if (hl>=0) tagadd(tag, hl, I+iq); + else if (efirst>=0) tagadd(etype, efirst, I+iq); + *q=*ph='\0'; + linelen=iq+ccnt; + + + /* special case for Solaris: + if line has ONLY tags AND they SPAN line, convert to one tag */ + fCodeline=0; + if (tagc && tags[0].first==0 && tags[tagc-1].last==linelen) { + fCodeline=1; + j=0; + /* invariant: at start of a tag */ + for (i=0; fCodeline && iNOTAG && tags[i].type<=MANREF); + assert(tags[i].first>=I0 && tags[i].last<=linelen+I0); + assert(tags[i].first<=tags[i].last); + + /* verify for no overlap with other tags */ + for (j=i+1; j=tags[j].last*/); + } + } +} + + +/* + buf[] == input text (read only) + plain[] == output (initial, trailing spaces stripped; tabs=>spaces; + underlines, overstrikes => tag array; spaces squeezed, if requested) + ccnt = count of changebars + scnt = count of initial spaces + linelen = length result in plain[] +*/ + +int fHead=0; +int fFoot=0; + +void +preformatted_filter(void) { + const int MINRM=50; /* minimum column for right margin */ + const int MINMID=20; + const int HEADFOOTSKIP=20; + const int HEADFOOTMAX=25; + int curtag; + char *p,*r; + char head[MAXBUF]=""; /* first "word" */ + char foot[MAXBUF]=""; + int header_m=0, footer_m=0; + int headlen=0, footlen=0; +/* int line=1-1; */ + int i,j,k,l,off; + int sect=0,subsect=0,bulpair=0,osubsect=0; + int title=1; + int oscnt=-1; + int empty=0,oempty; + int fcont=0; + int Pnew=0,I0; + float s_avg=0.0; + int spaceout; + int skiplines=0; + int c; + + /* try to keep tabeginend[][] in parallel with enum tagtype */ + assert(tagbeginend[ITALICS][0]==BEGINITALICS); + assert(tagbeginend[MANREF][1]==ENDMANREF); + in++; /* lookahead = current character, in points to following */ + + /* for (i=0; i=2 && bs_cnt<=5 && ((float) bs_sum / (float) bs_cnt)>3.0)); + if (finTable) { + if (!fotable) (*fn)(BEGINTABLE); + } else if (fotable) { + (*fn)(ENDTABLE); + I=I0; tagc=0; filterline(buf,plain); /* rescan first line out of table */ + } +#endif + + s_avg=(float) s_sum; + if (s_cnt>=2) { + /* don't count large second space gap */ + if (scnt2) s_avg= (float) (s_sum - scnt2) / (float) (s_cnt-1); + else s_avg= (float) (s_sum) / (float) (s_cnt); + } + + p=plain; /* points to current character in plain */ + + /*** determine header and global indentation ***/ + if (/*fMan && (*/!fHead || indent==-1/*)*/) { + if (!linelen) continue; + if (!*header) { + /* check for missing first header--but this doesn't catch subsequent pages */ + if (stricmp(p,"NAME")==0 || stricmp(p,"NOMBRE")==0) { /* works because line already filtered */ + indent=scnt; /*filterline(buf,plain);*/ scnt=0; I=I0; fHead=1; + } else { + fHead=1; + (*fn)(BEGINHEADER); + /* grab header and its first word */ + strcpy(header,p); + if ((header_m=HEADFOOTSKIP)>linelen) header_m=0; + strcpy(head,phrase); headlen=phraselen; + la_gets(buf); filterline(buf,plain); + if (linelen) { + strcpy(header2,plain); + if (strincmp(plain,"Digital",7)==0 || strincmp(plain,"OSF",3)==0) { + fFoot=1; + fSubsections=0; + } + } + (*fn)(ENDHEADER); tagc=0; + continue; + } + } else { + /* some idiot pages have a *third* header line, possibly after a null line */ + if (*header && scnt>MINMID) { strcpy(header3,p); ncnt=0; continue; } + /* indent of first line ("NAME") after header sets global indent */ + /* check '<' for Plan 9(?) */ + if (*p!='<') { + indent=scnt; I=I0; scnt=0; + } else continue; + } +/* if (indent==-1) continue;*/ + } + if (!lindent && scnt) lindent=scnt; +/*printf("lindent = %d, scnt=%d\n", lindent,scnt);*/ + + + /**** for each ordinary line... *****/ + + /*** skip over global indentation */ + oempty=empty; empty=(linelen==0); + if (empty) {ncnt++; continue;} + + /*** strip out per-page titles ***/ + + if (/*fMan && (*/scnt==0 || scnt>MINMID/*)*/) { +/*printf("***ncnt = %d, fFoot = %d, line = %d***", ncnt,fFoot,AbsLine);*/ + if (!fFoot && !isspace(*p) && (scnt>5 || (*p!='-' && *p!='_')) && + /* don't add ncnt -- AbsLine gets absolute line number */ + (((ncnt>=2 && AbsLine/*+ncnt*/>=61/*was 58*/ && AbsLine/*+ncnt*/<70) + || (ncnt>=4 && AbsLine/*+ncnt*/>=59 && AbsLine/*+ncnt*/<74) + || (ncnt && AbsLine/*+ncnt*/>=61 && AbsLine/*+ncnt*/<=66)) + && (/*lookahead!=' ' ||*/ (s_cnt>=1 && s_avg>1.1) || !falluc) ) + ) { + (*fn)(BEGINFOOTER); + /* grab footer and its first word */ + strcpy(footer,p); +/* if ((footer_m=linelen-HEADFOOTSKIP)<0) footer_m=0;*/ + if ((footer_m=HEADFOOTSKIP)>linelen) footer_m=0; + /*grabphrase(p);*/ strcpy(foot,phrase); footlen=phraselen; + /* permit variations at end, as for SGI "Page N", but keep minimum length */ + if (footlen>3) footlen--; + la_gets(buf); filterline(buf,plain); if (linelen) strcpy(footer2,plain); + title=1; + (*fn)(ENDFOOTER); tagc=0; + + /* if no header on first page, try again after first footer */ + if (!fFoot && *header=='\0') fHead=0; /* this is dangerous */ + fFoot=1; + continue; + } else + /* a lot of work, but only for a few lines (about 4%) */ + if (fFoot && (scnt==0 || scnt+indent>MINMID) && + ( (headlen && strncmp(head,p,headlen)==0) + || strcmp(header2,p)==0 || strcmp(header3,p)==0 + || (footlen && strncmp(foot,p,footlen)==0) + || strcmp(footer2,p)==0 + /* try to recognize lines with dates and page numbers */ + /* skip into line */ + || (header_m && header_mnew paragraph, line mode=>blank lines */ + /* need to chop up lines for Roff */ + + /*tabgram[scnt]++;*/ + if (title) ncnt=(scnt!=oscnt || (/*scnt<4 &&*/ isupper(*p))); + itabcnt = scnt/5; + if (CurLine==1) {ncnt=0; tagc=0;} /* gobble all newlines before first text line */ + sect = (scnt==0 && isupper(*p)); + subsect = (fSubsections && (scnt==2||scnt==3)); + if ((sect || subsect) && ncnt>1) ncnt=1; /* single blank line between sections */ + (*fn)(BEGINLINE); + if (/*fPara &&*/ ncnt) Pnew=1; + title=0; /*ncnt=0;--moved down*/ + /*if (finTable) (*fn)(BEGINTABLELINE);*/ + oscnt=scnt; /*fotable=finTable;*/ + +/* let output modules decide what to do at the start of a paragraph + if (fPara && !Pnew && (prevcmd==BEGINBODY || prevcmd==BEGINBULTXT)) { + putchar(' '); I++; + } +*/ + + /*** identify structural sections and notify fn */ + + /*if (fMan) {*/ +/* bulpair = (scnt<7 && (*p==c_bullet || *p=='-'));*/ + /* decode the below */ + bulpair = ((!auxindent || scnt!=lindent+auxindent) /*!bulpair*/ + && ((scnt>=2 && scnt2>5) || scnt>=5 || (tagc>0 && tags[0].first==scnt) ) /* scnt>=2?? */ + && (((*p==c_bullet || strchr("-+.",*p)!=NULL || falluc) && (ncnt || scnt2>4)) || + (scnt2-s_avg>=2 && phrase[phraselen-1]!='.') || + (scnt2>3 && s_cnt==1) + )); + if (bulpair) { + if (tagc>0 && tags[0].first==scnt) { + k=tags[0].last; + for (l=1; l=5 && kccnt)?(scnt-ccnt):0; + if (fILQS) { if (spaceout>=lindent) spaceout-=lindent; else spaceout=0; } + if (auxindent) { if (spaceout>=auxindent) spaceout-=auxindent; else spaceout=0; } + if (fNORM) { + if (itabcnt>0) (*fn)(ITAB); + for (i=0; i<(scnt%5); i++) putchar(' '); + } else printf("%*s",spaceout,""); + } + + + /*** iterate over each character in line, ***/ + /*** handling underlining, tabbing, copyrights ***/ + + off=(!fIQS&&!fPara)?scnt:0; + for (i=0, p=plain, curtag=0, fcont=0; *p; p++,i++,fcont=0) { + /* interspersed presentation signals */ + /* start tags in reverse order of addition (so structural first) */ + if (curtag \-opt */ + if (p==plain || (isspace(p[-1]) && !isspace(p[1]))) { + (*fn)(CHARDASH); fcont=1; + } + break; + } + + /* troublemaker characters */ + c = (*p)&0xff; + if (!fcont && fcharout) { + if (strchr(escchars,c)!=NULL) { + putchar('\\'); putchar(c); I++; + } else if (strchr(trouble,c)!=NULL) { + (*fn)(c); fcont=1; + } else { + putchar(c); I++; + } + } + +/*default:*/ + if (curtag */ + falluc = falluc && !islower(*in); + *p++ = *in++; + } + if (*in) in++; + *p='\0'; + + /* normalize commands */ + p=tmpbuf; q=buf; /* copy from tmpbuf to buf */ + /* no spaces between command-initiating period and command letters */ + if (*p=='\'') { *p='.'; } /* what's the difference? */ + if (*p=='.') { *q++ = *p++; while (isspace(*p)) p++; } + + + /* convert lines with tabs to tables? */ + fsourceTab=0; + + /* if comment at start of line, OK */ + /* dynamically determine iff Tcl/Tk page by scanning comments */ + if (*p=='\\' && *(p+1)=='"') { + if (!fTclTk && strstr(p+1,"supplemental macros used in Tcl/Tk")!=NULL) fTclTk=1; + p+=2; + } + + while (*p) { + if (*p=='\t') fsourceTab++; + if (*p=='\\') { + p++; + if (*p=='n') { + p++; + if (*p=='(') { + p++; name[0]=*p++; name[1]=*p++; name[2]='\0'; + } else { + name[0]=*p++; name[1]='\0'; + } + *q='0'; *(q+1)='\0'; /* defaults to 0, in case doesn't exist */ + for (i=0; ibuf && isspace(*q)) q--; /* trim tailing whitespace */ + q++; *q='\0'; + } else { + /* verbatim character (often a backslash) */ + *q++ = '\\'; /* postpone interpretation (not the right thing but...) */ + *q++ = *p++; + } + } else *q++ = *p++; + } + + /* dumb Digital--later */ + /*if (q-3>plain && q[-1]=='{' && q[-2]=='\\' && q[-3]==' ') q[-3]='\n';*/ + + /* close off buf */ + *q='\0'; + + /*if (q>buf && q[-1]=='\\' && *in=='.') { /* append next line * /} else break;*/ + break; + } + + /*printf("*ret = |%s|\n", ret!=NULL?ret : "NULL");*/ + return ret; +} + + +/* dump characters from buffer, signalling right tags along the way */ +/* all this work to introduce an internal second pass to recognize man page references */ +/* now for HTTP references too */ + +int sI=0; +/* use int linelen from up top */ +int fFlush=1; + +void +source_flush(void) { + int i,j; + char *p,*q,*r; + int c; + int manoff,posn; + + if (!sI) return; + plain[sI] = '\0'; + + /* flush called often enough that all man page references are at end of text to be flushed */ + /* find man page ref */ + if (sI>=4/*+1*/ && (plain[sI-(manoff=1)-1]==')' || plain[sI-(manoff=0)-1]==')')) { + for (q=&plain[sI-manoff-1-1]; q>plain && isalnum(*q) && *q!='('; q--) /* nada */; + if (*q=='(' && strcoloncmp(&q[1],')',vollist)) { + r=q-1; + if (*r==' ' && (sectheadid==SEEALSO || /*single letter volume */ *(q+2)==')' || *(q+3)==')')) r--; /* permitted single intervening space */ + for ( ; r>=plain && (isalnum(*r) || strchr(manvalid,*r)!=NULL); r--) /* nada */; + r++; + if (isalpha(*r) && r= posn) tagc--;*/ + + /* add MANREF tags */ + strcpy(hitxt,r); + tagadd(BEGINMANREF, posn, 0); + /* already generated other start tags, so move BEGINMANREF to start in order to be well nested (ugh) */ + tagtmp = tags[tagc-1]; for (j=tagc-1; j>0; j--) tags[j]=tags[j-1]; tags[0]=tagtmp; + tagadd(ENDMANREF, sI-manoff-1+1, 0); + } + } + + /* HTML hyperlinks */ + } else if (fURL && sI>=4 && (p=strstr(plain,"http"))!=NULL) { + i = p-plain; + tagadd(BEGINMANREF, i, 0); tagtmp = tags[tagc-1]; for (j=tagc-1; j>0; j--) tags[j]=tags[j-1]; tags[0]=tagtmp; + for (j=0; i=LINEBREAK && c==' ') { (*fn)(ENDLINE); linelen=0; + } else { /* normal character */ + xputchar(c); + if (fcharout) linelen++; + } + + /*if (linelen>=LINEBREAK && c==' ') { (*fn)(ENDLINE); linelen=0; } -- leaves space at end of line*/ + } + /* dump tags at end */ + /*for ( ; j program code */ + styles[++style] = BOLDITALICS; + stagadd(BEGINBOLDITALICS); + break; + case '1': case '0': case 'R': case 'P': /* back to Roman */ + /*sputchar(' '); -- taken out; not needed, I hope */ + funwind=1; + break; + case '-': + p++; + break; + } + break; + case '(': /* multicharacter macros */ + p++; + for (i=0; i can't because next line might start with a command */ + supresseol=1; + p++; + break; + case '-': /* minus sign */ + sputchar(CHARDASH); + p++; + break; + /*----------------------- + } else if (*p=='^') { + /* end stylings? (found in Solaris) * / + p++; + -------------------*/ + default: /* unknown escaped character */ + sputchar(*p++); + } + + } else { /* normal character */ + if (*p) sputchar(*p++); + } + + + /* unwind character formatting stack */ + if (funwind) { + for ( ; style>=0; style--) { + if (styles[style]==BOLD) stagadd(ENDBOLD); + else if (styles[style]==ITALICS) stagadd(ENDITALICS); + else stagadd(ENDBOLDITALICS); + } /* else error */ + assert(style==-1); + + funwind=0; + } + + /* check for man page reference and flush buffer if safe */ + /* postpone check until after following character so catch closing tags */ + if ((sI>=4+1 && plain[sI-1-1]==')') || + /* (plain[sI-1]==' ' && (q=strchr(plain,' '))!=NULL && q<&plain[sI-1])) {*/ + (plain[sI-1]==' ' && !isalnum(plain[sI-1-1]))) { + /* regardless, flush buffer */ + source_flush(); + } + } + + if (*p && *p!=' ') p++; /* skip over end character */ + return p; +} + +/* oh, for function overloading. inlined by compiler, probably */ +char *source_out(char *p) { + return source_out0(p,'\0'); +} + + +char * +source_out_word(char *p) { + char end = ' '; + + while (*p && isspace(*p)) p++; + if (*p=='"' /* || *p=='`' ? */) { + end = *p; + p++; + } + p = source_out0(p,end); + /*while (*p && isspace(*p)) p++;*/ + return p; +} + + +void +source_struct(enum command cmd) { + source_out("\\fR\\s0"); /* don't let run-on stylings run past structural units */ + source_flush(); + if (cmd==SHORTLINE) linelen=0; + (*fn)(cmd); +} + +#define checkcmd(str) strcmp(cmd,str)==0 + +int finnf=0; + +void source_line(char *p); +void +source_subfile(char *newin) { + char *p; + char *oldin = in; + + sublevel++; + + in = newin; + while ((p=source_gets())!=NULL) { + source_line(p); + } + in = oldin; + + sublevel--; +} + +/* have to delay acquisition of list tag */ +void +source_list(void) { + static int oldlisttype; /* OK to have just one because nested lists done with RS/RE */ + char *q; + int i; + + /* guard against empty bullet */ + for (i=0, q=plain; i or other comment closer, but unlikely */ + + /* structural commands */ + } else if (checkcmd("TH")) { + /* sample: .TH CC 1 "Dec 1990" */ + /* overrides command line -- should fix this */ + if (!finitDoc) { + while (isspace(*p)) p++; + if (*p) { + /* name */ + q=strchr(p, ' '); if (q!=NULL) *q++='\0'; + strcpy(manName, p); + /* number */ + p = q; + if (p!=NULL) { + while (isspace(*p)) p++; + if (*p) { q=strchr(p,' '); if (q!=NULL) *q++='\0'; } + } + strcpy(manSect, p!=NULL? p: "?"); + } + sI=0; + finitDoc=1; + (*fn)(BEGINDOC); + /* emit information in .TH line? */ + } /* else complain about multiple definitions? */ + + } else if (checkcmd("SH") || checkcmd("Sh")) { /* section title */ + while (indent) { source_command("RE"); } + source_flush(); + + pop(BEGINSECTION); /* before reset sectheadid */ + + if (*p) { + if (*p=='"') { p++; q=p; while (*q && *q!='"') q++; *q='\0'; } + finnf=0; + for (j=0; (sectheadid=j) leave to output format */ + /* HTML handles tables but not tabs, Tk's text tabs but not tables */ + /* does cause a linebreak */ + stagadd(BEGINBODY); + } else if (checkcmd("ce")) { + /* get line count, recursively filter for that many lines */ + if (sscanf(p, "%d", &i)) { + source_struct(BEGINCENTER); + for (; i>0 && (p=source_gets())!=NULL; i--) source_line(p); + source_struct(ENDCENTER); + } + + /* limited selection of control structures */ + } else if (checkcmd("if") || (checkcmd("ie"))) { /* if cmd, if command and else on next line */ + supresseol=1; + ie = checkcmd("ie"); + mylastif=lastif; + + if (*p=='!') { invcond=1; p++; } + + if (*p=='n') { cond=1; p++; } /* masquerading as nroff the right thing to do? */ + else if (*p=='t') { cond=0; p++; } + else if (*p=='(' || *p=='-' || *p=='+' || isdigit(*p)) { + if (*p=='(') p++; + nif0=atof(p); + if (*p=='-' || *p=='+') p++; while (isdigit(*p)) p++; + op = *p++; /* operator: =, >, < */ + if (op==' ') { + cond = (nif0!=0); + } else { + nif1=atoi(p); + while (isdigit(*p)) p++; + if (*p==')') p++; + if (op=='=') cond = (nif0==nif1); + else if (op=='<') cond = (nif0' -- ignore >=, <= */ cond = (nif0>nif1); + } + } else if (!isalpha(*p)) { /* usually quote, ^G in Digital UNIX */ + /* gobble up comparators between delimiters */ + delim = *p++; + q = if0; while (*p!=delim) { *q++=*p++; } *q='\0'; p++; + q = if1; while (*p!=delim) { *q++=*p++; } *q='\0'; p++; + cond = (strcmp(if0,if1)==0); + } else cond=0; /* a guess, seems to be right bettern than half the time */ + if (invcond) cond=1-cond; + while (isspace(*p)) p++; + + lastif = cond; + if (strncmp(p,"\\{",2)==0) { /* rather than handle groups here, have turn on/off output flag? */ + p+=2; while (isspace(*p)) p++; + while (strncmp(p,".\\}",3)!=0 || strncmp(p,"\\}",2)!=0 /*Solaris*/) { + if (cond) source_line(p); + if ((p=source_gets())==NULL) break; + } + } else if (cond) source_line(p); + + if (ie) source_line(source_gets()); /* do else part with prevailing lastif */ + + lastif=mylastif; + + } else if (checkcmd("el")) { + mylastif=lastif; + + /* should centralize gobbling of groups */ + cond = lastif = !lastif; + if (strncmp(p,"\\{",2)==0) { + p+=2; while (isspace(*p)) p++; + while (strncmp(p,".\\}",3)!=0 || strncmp(p,"\\}",2)!=0 /*Solaris*/) { + if (cond) source_line(p); + if ((p=source_gets())==NULL) break; + } + } else if (cond) source_line(p); + + lastif=mylastif; + + } else if (checkcmd("ig")) { /* "ignore group" */ + strcpy(endig,".."); if (*p) { endig[0]='.'; strcpy(&endig[1],p); } + while ((p=source_gets())!=NULL) { + if (strcmp(p,endig)==0) break; + if (!lastif) source_line(p); /* usually ignore line, except in one weird case */ + } + + + /* macros and substitutions */ + } else if (checkcmd("de")) { + /* grab key */ + q=p; while (*q && !isspace(*q)) q++; *q='\0'; + + /* if already have a macro of that name, override it */ + /* could use a good dictionary class */ + for (insertat=0; insertattblspanmax) tblspanmax=i;*/ + tbl[tblc++][i]=""; /* mark end */ + if (*p=='.') break; + } + tbli=0; + source_struct(BEGINTABLE); + + while ((p=source_gets())!=NULL) { + if (strncmp(p,".TE",3)==0) break; + if (*p=='.') { source_line(p); continue; } + + /* count number of entries on line. if >1, can use to set tableSep */ + insertat=0; for (j=0; *tbl[tbli][j]; j++) if (*tbl[tbli][j]!='s') insertat++; + if (!tableSep && insertat>1) if (fsourceTab) tableSep='\t'; else tableSep='@'; + source_struct(BEGINTABLELINE); + if (strcmp(p,"_")==0 || /* double line */ strcmp(p,"=")==0) { + source_out(" "); + /*stagadd(HR);*/ /* empty row -- need ROWSPAN for HTML */ + continue; + } + + for (i=0; *tbl[tbli][i] && *p; i++) { + tblcellspan=1; + tblcellformat = tbl[tbli][i]; + if (*tblcellformat=='^') { /* vertical span => blank entry */ + tblcellformat="l"; + } else if (*tblcellformat=='|') { + /* stagadd(VBAR); */ + continue; + } else if (strchr("lrcn", *tblcellformat)==NULL) { + tblcellformat="l"; + /*continue;*/ + } + + while (strncmp(tbl[tbli][i+1],"s",1)==0) { tblcellspan++; i++; } + + source_struct(BEGINTABLEENTRY); + if (toupper(tblcellformat[1])=='B') stagadd(BEGINBOLD); + else if (toupper(tblcellformat[1])=='I') stagadd(BEGINITALICS); + /* not supporting DEC's w() */ + + if (strcmp(p,"T{")==0) { /* DEC, HP */ + while (strncmp(p=source_gets(),"T}",2)!=0) source_line(p); + p+=2; if (*p) p++; + } else { + p = source_out0(p, tableSep); + } + if (toupper(tblcellformat[1])=='B') stagadd(ENDBOLD); + else if (toupper(tblcellformat[1])=='I') stagadd(ENDITALICS); + source_struct(ENDTABLEENTRY); + } + if (tbli+1 lines--on infinite scroll */ +} + + +void +source_line(char *p) { + /*stagadd(BEGINLINE);*/ + char *cmd=p; + if (p==NULL) return; /* bug somewhere else, but where? */ + + isComment = (/*checkcmd("") ||*/ checkcmd("\\\"") || /*DEC triple dot*/checkcmd("..")); + if (inComment && !isComment) { source_struct(ENDCOMMENT); inComment=0; } /* special case to handle transition */ + +#if 0 + if (*p!='.' && *p!='\'' && !finlist) { + if (fsourceTab && !fosourceTab) { + tblc=1; tbli=0; tableSep='\t'; + tbl[0][0]=tbl[0][1]=tbl[0][2]=tbl[0][3]=tbl[0][4]=tbl[0][5]=tbl[0][6]=tbl[0][7]=tbl[0][8]="l"; + source_struct(BEGINTABLE); finTable=1; + } else if (!fsourceTab && fosourceTab) { + source_struct(ENDTABLE); finTable=0; + } + fosourceTab=fsourceTab; + } +#endif + + if (*p=='.' /*|| *p=='\'' -- normalized */) { /* command == starts with "." */ + p++; + supresseol=1; + source_command(p); + + } else if (!*p) { /* blank line */ + /*source_command("P");*/ + ncnt=1; source_struct(BEGINLINE); ncnt=0; /* empty line => paragraph break */ + +#if 0 + } else if (fsourceTab && !finlist /* && pmode */) { /* can't handle tabs, so try tables */ + source_struct(BEGINTABLE); + tblcellformat = "l"; + do { + source_struct(BEGINTABLELINE); + while (*p) { + source_struct(BEGINTABLEENTRY); + p = source_out0(p, '\t'); + source_struct(ENDTABLEENTRY); + } + source_struct(ENDTABLELINE); + } while ((p=source_gets())!=NULL && fsourceTab); + source_struct(ENDTABLE); + source_line(p); +#endif + + } else { /* otherwise normal text */ + source_out(p); + if (finnf || isspace(*cmd)) source_struct(SHORTLINE); + } + + if (!supresseol && !finnf) { source_out(" "); if (finlist) source_list(); } + supresseol=0; + /*stagadd(ENDLINE);*/ +} + + +void +source_filter(void) { + char *p = in, *q; + char *oldv,*newv,*shiftp,*shiftq,*endq; + int lenp,lenq; + int i,on1,on2,nn1,nn2,first; + int insertcnt=0, deletecnt=0, insertcnt0; + int nextDiffLine=-1; + char diffcmd, tmpc, tmpendq; + + AbsLine=0; + + /* just count length of macro table! */ + for (i=0; macro[i].key!=NULL; i++) /*empty*/; + macrocnt = i; + + /* dumb Digital puts \\} closers on same line */ + for (p=in; (p=strstr(p," \\}"))!=NULL; p+=3) *p='\n'; + + sI=0; + /* (*fn)(BEGINDOC); -- done at .TH or first .SH */ + + + /* was: source_subfile(in); */ + while (fDiff && fgets(diffline, MAXBUF, difffd)!=NULL) { + /* requirements: no context lines, no errors in files, ... + change-command: 8a12,15 or 5,7c8,10 or 5,7d3 + < from-file-line + < from-file-line... + -- + > to-file-line + > to-file-line... + */ + for (q=diffline; ; q++) { diffcmd=*q; if (diffcmd=='a'||diffcmd=='c'||diffcmd=='d') break; } + if (sscanf(diffline, "%d,%d", &on1,&on2)==1) on2=on1-1+(diffcmd=='d'||diffcmd=='c'); + if (sscanf(++q, "%d,%d", &nn1,&nn2)==1) nn2=nn1-1+(diffcmd=='a'||diffcmd=='c'); + + deletecnt = on2-on1+1; + insertcnt = nn2-nn1+1; + + nextDiffLine = nn1; + /*assert(nextDiffLine>=AbsLine); -- can happen if inside a macro? */ + if (nextDiffLine */ + do { + p = oldv = fgets(diffline, MAXBUF, difffd); + p[strlen(p)-1]='\0'; /* fgets's \n ending => \0 */ + deletecnt--; + } while (deletecnt && *p=='.'); /* throw out commands in old version */ + + q = newv = source_gets(); + insertcnt--; + while (insertcnt && *q=='.') { + source_line(q); + insertcnt--; + } + + if (*p=='.' || *q=='.') break; + + + /* make larger chunk for better diff -- but still keep away from commands */ + lenp=strlen(p); lenq=strlen(q); + while (deletecnt && MAXBUF-lenq>80*2) { + fgetc(difffd); fgetc(difffd); /* skip '<' */ + if (ungetc(fgetc(difffd),difffd)=='.') break; + p=&diffline[lenp]; *p++=' '; lenp++; + fgets(p, MAXBUF-lenp, difffd); p[strlen(p)-1]='\0'; lenp+=strlen(p); + deletecnt--; + } + + while (insertcnt && *in!='.' && MAXBUF-lenq>80*2) { + if (newv!=diffline2) { strcpy(diffline2,q); newv=diffline2; } + q=source_gets(); diffline2[lenq]=' '; lenq++; + strcpy(&diffline2[lenq],q); lenq+=strlen(q); + insertcnt--; + } + + /* common endings */ + p = &p[strlen(oldv)]; q=&q[strlen(newv)]; + while (p>oldv && q>newv && p[-1]==q[-1]) { p--; q--; } + if ((p>oldv && p[-1]=='\\') || (q>newv && q[-1]=='\\')) + while (*p && *q && !isspace(*p)) { p++; q++; } /* steer clear of escapes */ + tmpendq=*q; *p=*q='\0'; endq=q; + + p=oldv; q=newv; + while (*p && *q) { + /* common starts */ + newv=q; while (*p && *q && *p==*q) { p++; q++; } + if (q>newv) { + tmpc=*q; *q='\0'; source_line(newv); *q=tmpc; + } + + /* too hard to read */ + /* difference: try to find hunk of p in remainder of q */ + if (strlen(p)<15 || (shiftp=strchr(&p[15],' ') /*|| shiftp-p>30*/)==NULL) break; + shiftp++; /* include the space */ + tmpc=*shiftp; *shiftp='\0'; shiftq=strstr(q,p); *shiftp=tmpc; /* includes space */ + if (shiftq!=NULL) { + /* call that part of q inserted */ + tmpc=*shiftq; *shiftq='\0'; + stagadd(BEGINDIFFA); source_line(q); stagadd(ENDDIFFA); source_line(" "); + *shiftq=tmpc; q=shiftq; + } else { + /* call that part of p deleted */ + shiftp--; *shiftp='\0'; /* squash the trailing space */ + stagadd(BEGINDIFFD); source_line(p); stagadd(ENDDIFFD); source_line(" "); + p=shiftp+1; + } +/*#endif*/ + } + + if (*p) { stagadd(BEGINDIFFD); source_line(p); stagadd(ENDDIFFD); } + if (*q) { stagadd(BEGINDIFFA); source_line(q); stagadd(ENDDIFFA); } + if (tmpendq!='\0') { *endq=tmpendq; source_line(endq); } + source_line(" "); + } + + /* even if diffcmd=='c', could still have remaining old version lines */ + first=1; + while (deletecnt--) { + fgets(diffline, MAXBUF, difffd); + if (diffline[2]!='.') { + if (first) { stagadd(BEGINDIFFD); first=0; } + source_line(&diffline[2]); /* don't do commands; skip initial '<' */ + } + } + if (!first) { stagadd(ENDDIFFD); source_line(" "); } + + /* skip over duplicated from old */ + if (diffcmd=='c') while (insertcnt0--) fgets(diffline, MAXBUF, difffd); + + /* even if diffcmd=='c', could still have remaining new version lines */ + first=1; + nextDiffLine = AbsLine + insertcnt; + while (insertcnt--) fgets(diffline, MAXBUF, difffd); /* eat duplicate text of above */ + while (/*insertcnt--*/AbsLine" }, + { 'S', 0, "source", "(ource of man page passed in)" }, /* autodetected */ + { 'F', 0, "formatted:format", "(ormatted man page passed in)" }, /* autodetected */ + + { 'r', 1, "reference:manref:ref", " " }, + { 'l', 1, "title", " " }, + { 'V', 1, "volumes:vol", "(olume) <colon-separated list>" }, + { 'U', 0, "url:urls", "(RLs as hyperlinks)" }, + + /* following options apply to formatted pages only */ + { 'b', 0, "subsections:sub", " (show subsections)" }, + { 'k', 0, "keep:head:foot:header:footer", "(eep head/foot)" }, + { 'n', 1, "name", "(ame of man page) <string>" }, + { 's', 1, "section:sect", "(ection) <string>" }, + { 'p', 0, "paragraph:para", "(aragraph mode toggle)" }, + { 't', 1, "tabstop:tabstops", "(abstops spacing) <number>" }, + { 'N', 0, "normalize:normal", "(ormalize spacing, changebars)" }, + { 'y', 0, "zap:nohyphens", " (zap hyphens toggle)" }, + { 'K', 0, "nobreak", " (declare that page has no breaks)" }, /* autodetected */ + { 'd', 1, "diff", "(iff) <file> (diff of old page source to incorporate)" }, + { 'M', 1, "message", "(essage) <text> (included verbatim at end of Name section)" }, + /*{ 'l', 0, "number lines", "... can number lines in a pipe" */ + /*{ 'T', 0, "tables", "(able agressive parsing ON)" },*/ +/* { 'c', 0, "changeleft:changebar", "(hangebarstoleft toggle)" }, -- default is perfect */ + /*{ 'R', 0, "reflow", "(eflow text lines)" },*/ + { 'R', 1, "rebus", "(ebus words for TkMan)" }, + { 'C', 0, "TclTk", " (enable Tcl/Tk formatting)" }, /* autodetected */ + + /*{ 'D', 0, "debug", "(ebugging mode)" }, -- dump unrecognized macros, e.g.*/ + { 'o', 0, "noop", " (no op)" }, + { 'O', 0, "noop", " <arg> (no op with arg)" }, + { 'q', 0, "quiet", "(uiet--don't report warnings)" }, + { 'h', 0, "help", "(elp)" }, + /*{ '?', 0, "help", " (help)" }, -- getopt returns '?' as error flag */ + { 'v', 0, "version", "(ersion)" }, + { '\0', 0, "", NULL } + }; + + /* calculate strgetopt from options list */ + for (i=0,p=strgetopt; option[i].letter!='\0'; i++) { + *p++ = option[i].letter; + /* check for duplicate option letters */ + assert(strchr(strgetopt,option[i].letter)==&p[-1]); + if (option[i].arg) *p++=':'; + } + *p='\0'; + + /* spot check construction of strgetopt */ + assert(p<strgetopt+80); + assert(strlen(strgetopt)>10); + assert(strchr(strgetopt,'f')!=NULL); + assert(strchr(strgetopt,'v')!=NULL); + assert(strchr(strgetopt,':')!=NULL); + + /* count, sort exception strings */ + for (lcexceptionslen=0; (p=lcexceptions[lcexceptionslen])!=NULL; lcexceptionslen++) /*empty*/; + qsort(lcexceptions, lcexceptionslen, sizeof(char*), lcexceptionscmp); + + /* map long option names to single letters for switching */ + /* (GNU probably has a reusable function to do this...) */ + /* deep six getopt in favor of integrated long names + letters? */ + argvch = malloc(argc * sizeof(char*)); + p = argvbuf = malloc(argc*3 * sizeof(char)); /* either -<char>'\0' or no space used */ + for (i=0; i<argc; i++) argvch[i]=argv[i]; /* need argvch[0] for getopt? */ + argv0 = mystrdup(argv[0]); + for (i=1; i<argc; i++) { + if (argv[i][0]=='-' && argv[i][1]=='-') { + if (argv[i][2]=='\0') break; /* end of options */ + for (j=0; option[j].letter!='\0'; j++) { + if (strcoloncmp2(&argv[i][2],'\0',option[j].longnames,0)) { + argvch[i] = p; + *p++ = '-'; *p++ = option[j].letter; *p++ = '\0'; + if (option[j].arg) i++; /* skip arguments of options */ + break; + } + } + if (option[j].letter=='\0') fprintf(stderr, "%s: unknown option %s\n", argv[0], argv[i]); + } + } + + + + /* pass through options to set defaults for chosen format */ + setFilterDefaults("ASCII"); /* default to ASCII (used by TkMan's Glimpse indexing */ + + /* initialize header/footer buffers (save room in binary) */ + for (i=0; i<CRUFTS; i++) { *cruft[i] = '\0'; } /* automatically done, guaranteed? */ + /*for (i=0; i<MAXLINES; i++) { linetabcnt[i] = 0; } */ + + while ((c=getopt(argc,argvch,strgetopt))!=-1) { + + switch (c) { + case 'k': fHeadfoot=1; break; + case 'b': fSubsections=1; break; +/* case 'c': fChangeleft=1; break; -- obsolete */ + /* case 'R': fReflow=1; break;*/ + case 'n': strcpy(manName,optarg); fname=1; break; /* name & section for when using stdin */ + case 's': strcpy(manSect,optarg); break; + /*case 'D': docbookpath = optarg; break;*/ + case 'V': vollist = optarg; break; + case 'l': manTitle = optarg; break; + case 'r': manRef = optarg; + if (strlen(manRef)==0 || strcmp(manRef,"-")==0 || strcmp(manRef,"off")==0) fmanRef=0; + break; + case 't': TabStops=atoi(optarg); break; + /*case 'T': fTable=1; break; -- if preformatted doesn't work, if source automatic */ + case 'p': fPara=!fPara; break; + case 'K': fFoot=1; break; + case 'y': fNOHY=1; break; + case 'N': fNORM=1; break; + + case 'f': /* set format */ + if (setFilterDefaults(optarg)) { + fprintf(stderr, "%s: unknown format: %s\n", argv0, optarg); + exit(1); + } + break; + case 'F': fSource=0; break; + case 'S': fSource=1; break; + + case 'd': + difffd = fopen(optarg, "r"); + if (difffd==NULL) { fprintf(stderr, "%s: can't open %s\n", argv0, optarg); exit(1); } +/* read in a line at a time + diff = filesuck(fd); + fclose(fd); +*/ + fDiff=1; + break; + + case 'M': message = optarg; break; + + case 'C': fTclTk=1; break; + case 'R': + p = malloc(strlen(optarg)+1); + strcpy(p, optarg); /* string may not be in writable address space */ + oldp = ""; + for (; *p; oldp=p, p++) { + if (*oldp=='\0') rebuspat[rebuspatcnt++] = p; + if (*p=='|') *p='\0'; + } + for (i=0; i<rebuspatcnt; i++) rebuspatlen[i] = strlen(rebuspat[i]); /* for strnlen() */ + break; + + case 'q': fQuiet=1; break; + case 'o': /*no op*/ break; + case 'O': /* no op with arg */ break; + case 'h': + printf("rman"); helplen=strlen("rman"); + + /* linebreak options */ + assert(helplen>0); + for (i=0; option[i].letter!='\0'; i++) { + desclen = strlen(option[i].desc); + if (helplen+desclen+5 > helpbreak) { printf("\n%*s",helpispace,""); helplen=helpispace; } + printf(" [-%c%s]", option[i].letter, option[i].desc); + helplen += desclen+5; + } + if (helplen>helpispace) printf("\n"); + printf("%*s [<filename>]\n",helpispace,""); + exit(0); + + case 'v': /*case '?':*/ + printf("PolyglotMan v" POLYGLOTMANVERSION " of $Date: 2003/07/26 19:00:48 $\n"); + exit(0); + + default: + fprintf(stderr, "%s: unidentified option -%c (-h for help)\n",argvch[0],c); + exit(2); + } + } + + + + /* read from given file name(s) */ + if (optind<argc) { + processing = argvch[optind]; + + if (!fname) { /* if no name given, create from file name */ + /* take name from tail of path */ + if ((p=strrchr(argvch[optind],'/'))!=NULL) p++; else p=argvch[optind]; + strcpy(manName,p); + + /* search backward from end for final dot. split there */ + if ((p=strrchr(manName,'.'))!=NULL) { + strcpy(manSect,p+1); + *p='\0'; + } + } + + strcpy(plain,argvch[optind]); + + if (freopen(argvch[optind], "r", stdin)==NULL) { + fprintf(stderr, "%s: can't open %s\n", argvch[0], argvch[optind]); + exit(1); + } + } + + /* need to read macros, ok if fail; from /usr/lib/tmac/an => needs to be set in Makefile, maybe a searchpath */ + /* + if ((macros=fopen("/usr/lib/tmac/an", "r"))!=NULL) { + in = File = filesuck(macros); + lookahead = File[0]; + source_filter(); + free(File); + } + */ + + /* suck in whole file and just operate on pointers */ + in = File = filesuck(stdin); + + + /* minimal check for roff source: first character dot command or apostrophe comment */ + /* MUST initialize lookahead here, BEFORE first call to la_gets */ + if (fSource==-1) { + lookahead = File[0]; + fSource = (lookahead=='.' || lookahead=='\'' || /*dumb HP*/lookahead=='/' + /* HP needs this too but causes problems || isalpha(lookahead)--use --source flag*/); + } + + if (fDiff && (!fSource || fn!=HTML)) { + fprintf(stderr, "diff incorporation supported for man page source, generating HTML\n"); + exit(1); + } + + if (fSource) source_filter(); else preformatted_filter(); + if (fDiff) fclose(difffd); + /*free(File); -- let system clean up, perhaps more efficiently */ + + return 0; +} diff --git a/src/tools/rman/rman.html b/src/tools/rman/rman.html new file mode 100644 index 0000000000..c21ea16d27 --- /dev/null +++ b/src/tools/rman/rman.html @@ -0,0 +1,342 @@ +<html> +<head> +<title>PolyglotMan Manual Page + + + +

    Name

    + +PolyglotMan, rman - reverse compile man pages from formatted form to a number of source formats + +

    Synopsis

    + +rman [options] [file] + +

    Description

    + +

    PolyglotMan takes man pages from most of the +popular flavors of UNIX and transforms them into any of a number of +text source formats. PolyglotMan was formerly known as RosettaMan. +The name of the binary is still called rman, for scripts +that depend on that name; mnemonically, just think "reverse man". +Previously PolyglotMan required pages to +be formatted by nroff prior to its processing. With version 3.0, it prefers +[tn]roff source and usually produces results that are better yet. +And source processing is the only way to translate tables. +Source format translation is not as mature as formatted, however, so +try formatted translation as a backup. + +

    In parsing [tn]roff source, one could implement an arbitrarily +large subset of [tn]roff, which I did not and will not do, so the +results can be off. I did implement a significant subset of those use +in man pages, however, including tbl (but not eqn), if tests, and +general macro definitions, so usually the results look great. If they +don't, format the page with nroff before sending it to PolyglotMan. If +PolyglotMan doesn't recognize a key macro used by a large class of +pages, however, e-mail me the source and a uuencoded nroff-formatted +page and I'll see what I can do. When running PolyglotMan with man +page source that includes or redirects to other [tn]roff source using +the .so (source or inclusion) macro, you should be in the parent +directory of the page, since pages are written with this assumption. +For example, if you are translating /usr/man/man1/ls.1, first cd into +/usr/man. + +

    PolyglotMan accepts formatted man pages from: +

    SunOS, Sun Solaris, Hewlett-Packard HP-UX, +AT&T System V, OSF/1 aka Digital UNIX, DEC Ultrix, SGI IRIX, Linux, +FreeBSD, SCO.
    +Man page source processing works for: +
    SunOS, Sun Solaris, Hewlett-Packard HP-UX, +AT&T System V, OSF/1 aka Digital UNIX, DEC Ultrix.
    +It can produce +
    printable ASCII-only (control characters +stripped), section headers-only, +Tk, TkMan, [tn]roff (traditional man page source), partial DocBook XML, HTML, MIME, +LaTeX, LaTeX2e, RTF, Perl 5 POD.
    +A modular architecture permits easy addition of additional output +formats.

    + +

    Options

    + +

    The following options should not be used with any others and exit PolyglotMan +without processing any input. + +

    +
    -h|--help
    +
    Show list of command line options and exit.
    + +
    -v|--version
    +
    Show version number and exit.
    +
    + + +

    You should specify the filter first, as this sets a number of parameters, +and then specify other options. + +

    +
    -f|--filter <ASCII|roff|TkMan|Tk|Sections|HTML|MIME|LaTeX|LaTeX2e|RTF|POD>
    + +
    Set the output filter. Defaults to ASCII. + +
    + +
    -S|--source
    +
    PolyglotMan tries to automatically determine whether its input is source or formatted; +use this option to declare source input.
    + +
    -F|--format|--formatted
    +
    PolyglotMan tries to automatically determine whether its input is source or formatted; +use this option to declare formatted input.
    + +
    -l|--title printf-string
    +
    In HTML mode this sets the <TITLE> of the man pages, given the same +parameters as -r.
    + +
    -r|--reference|--manref printf-string
    +
    In HTML mode this sets the URL form by which to retrieve other man pages. +The string can use two supplied parameters: the man page name and its section. +(See the Examples section.) If the string is null (as if set from a shell +by "-r ''"), `-' or `off', then man page references will not be HREFs, just set in italics. +If your printf supports XPG3 positions specifier, this can be quite flexible.
    + +
    -V|--volumes <colon-separated list>
    +
    Set the list of valid volumes to check against when looking for +cross-references to other man pages. Defaults to 1:2:3:4:5:6:7:8:9:o:l:n:p +(volume names can be multicharacter). +If an non-whitespace string in the page is immediately followed by a left +parenthesis, then one of the valid volumes, and ends with optional other +characters and then a right parenthesis--then that string is reported as +a reference to another manual page. If this -V string starts with an equals +sign, then no optional characters are allowed between the match to the list of +valids and the right parenthesis. (This option is needed for SCO UNIX.) +
    + +
    + + +

    The following options apply only when formatted pages are given as input. +They do not apply or are always handled correctly with the source. + +

    +
    -b|--subsections
    +
    Try to recognize subsection titles in addition to section titles. +This can cause problems on some UNIX flavors.
    + +
    -K|--nobreak
    +
    Indicate manual pages don't have page breaks, so don't look for footers and headers +around them. (Older nroff -man macros always put in page breaks, but lately +some vendors have realized that printout are made through troff, whereas +nroff -man is used to format pages for reading on screen, and so have eliminated +page breaks.) PolyglotMan usually gets this right even without this flag.
    + +
    -k|--keep
    +
    Keep headers and footers, as a canonical report at the end of the page.
    + + + + + +
    -n|--name name
    +
    Set name of man page (used in roff format). +If the filename is given in the form "name.section", the name +and section are automatically determined. If the page is being parsed from +[tn]roff source and it has a .TH line, this information is extracted from that line.
    + +
    -p|--paragraph
    +
    paragraph mode toggle. The filter determines whether lines should be linebroken +as they were by nroff, or whether lines should be flowed together into paragraphs. +Mainly for internal use.
    + +
    -s|section #
    +
    Set volume (aka section) number of man page (used in roff format).
    + + + +
    -t|--tabstops #
    +
    For those macros sets that use tabs in place of spaces where +possible in order to reduce the number of characters used, set +tabstops every # columns. Defaults to 8.
    + + +
    + + +

    Notes on Filter Types

    + +

    ROFF

    +

    Some flavors of UNIX ship man page without [tn]roff source, making one's laser printer +little more than a laser-powered daisy wheel. This filer tries to intuit +the original [tn]roff directives, which can then be recompiled by [tn]roff.

    + +

    TkMan

    +

    TkMan, a hypertext man page browser, uses PolyglotMan to show +man pages without the (usually) useless headers and footers on each +pages. It also collects section and (optionally) subsection heads for +direct access from a pulldown menu. TkMan and Tcl/Tk, the toolkit in +which it's written, are available via anonymous ftp from +ftp://ftp.smli.com/pub/tcl/

    + +

    Tk

    + +

    This option outputs the text in a series of Tcl lists consisting of +text-tags pairs, where tag names roughly correspond to HTML. This +output can be inserted into a Tk text widget by doing an eval +<textwidget> insert end <text>. This format should be relatively +easily parsible by other programs that want both the text and the +tags. Also see ASCII.

    + +

    ASCII

    +

    When printed on a line printer, man pages try to produce special text effects +by overstriking characters with themselves (to produce bold) and underscores +(underlining). Other text processing software, such as text editors, searchers, +and indexers, must counteract this. The ASCII filter strips away this formatting. +Piping nroff output through col -b also strips away this formatting, +but it leaves behind unsightly page headers and footers. Also see Tk.

    + +

    Sections

    +

    Dumps section and (optionally) subsection titles. This might be useful for +another program that processes man pages.

    + +

    HTML

    +

    With a simple extention to an HTTP server for Mosaic or other World Wide Web +browser, PolyglotMan can produce high quality HTML on the fly. +Several such extensions and pointers to several others are included in PolyglotMan's +contrib directory.

    + +

    XML

    +

    This is appoaching the Docbook DTD, but I'm hoping that someone that someone +with a real interest in this will polish the tags generated. Try it to see +how close the tags are now.

    + +

    Improved by Aaron Hawley, but still he notes +

    +Output requires human intervention to become proper +DocBook format. This is a result of the fundamental +nature of nroff and DocBook xml. One is marked for +formating the other is marked for semantics (defining +what the content is rather then what it should look +like). For instance, italics and bold formatting are +converted to emphasis and command DocBook elements +respectively even though they should probably be marked +up as command, option, literal, arg, option and other +possible DocBook tags. +
    +

    + +

    MIME

    +

    MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563, +good for consumption by MIME-aware e-mailers or as Emacs (>=19.29) +enriched documents.

    + +

    LaTeX and LaTeX2e

    +Why not? + +

    RTF

    +

    Use output on Mac or NeXT or whatever. Maybe take random man pages +and integrate with NeXT's documentation system better. Maybe NeXT has +own man page macros that do this.

    + +

    PostScript and FrameMaker

    +

    To produce PostScript, use groff or psroff. To produce FrameMaker MIF, +use FrameMaker's built-in filter. In both cases you need [tn]roff source, +so if you only have a formatted version of the manual page, use PolyglotMan's +roff filter first.

    + + +

    Examples

    + +

    To convert the formatted man page named ls.1 back into +[tn]roff source form:

    + +

    + rman -f roff /usr/local/man/cat1/ls.1 > /usr/local/man/man1/ls.1
    + +

    Long man pages are often compressed to conserve space (compression is +especially effective on formatted man pages as many of the characters +are spaces). As it is a long man page, it probably has subsections, +which we try to separate out (some macro sets don't distinguish +subsections well enough for PolyglotMan to detect them). Let's convert +this to LaTeX format:
    + +

    + pcat /usr/catman/a_man/cat1/automount.z | rman -b -n automount -s 1 -f latex > automount.man
    + +

    Alternatively, + + man 1 automount | rman -b -n automount -s 1 -f latex > automount.man
    + +

    For HTML/Mosaic users, PolyglotMan can, without modification of the +source code, produce HTML links that point to other HTML man pages +either pregenerated or generated on the fly. First let's assume +pregenerated HTML versions of man pages stored in /usr/man/html. +Generate these one-by-one with the following form:
    + + rman -f html -r 'http:/usr/man/html/%s.%s.html' /usr/man/cat1/ls.1 > /usr/man/html/ls.1.html
    + +

    If you've extended your HTML client to generate HTML on the fly you should use +something like:
    + + rman -f html -r 'http:~/bin/man2html?%s:%s' /usr/man/cat1/ls.1
    + +when generating HTML.

    + + +

    Bugs/Incompatibilities

    + +

    PolyglotMan is not perfect in all cases, but it usually does a +good job, and in any case reduces the problem of converting man pages +to light editing.

    + +

    Tables in formatted pages, especially H-P's, aren't handled very well. +Be sure to pass in source for the page to recognize tables.

    + +

    The man pager woman applies its own idea of formatting for +man pages, which can confuse PolyglotMan. Bypass woman +by passing the formatted manual page text directly into +PolyglotMan.

    + +

    The [tn]roff output format uses fB to turn on boldface. If your macro set +requires .B, you'll have to a postprocess the PolyglotMan output.

    + + + +

    See Also

    + +tkman(1), xman(1), man(1), man(7) or man(5) depending on your flavor of UNIX + +

    GNU groff can now output to HTML. + + +

    Author

    + +

    PolyglotMan
    +Copyright (c) 1994-2003 T.A. Phelps
    + +developed at the
    +University of California, Berkeley
    +Computer Science Division + +

    Manual page last updated on $Date: 2003/03/29 08:09:13 $ + + +

    The latest version of PolyglotMan is available via +http://polyglotman.sourceforge.net/. + + +