added rules to produce xml from man pages
still to be sharpened : 
	xml files could only be objects but they are located in distro ATM
	users don't read xml AFAIK, so final documentation is to be reworked


git-svn-id: file:///srv/svn/repos/haiku/haiku/trunk@12392 a95241bf-73f2-0310-859d-f6bbb57e9c96
This commit is contained in:
Jérôme Duval 2005-04-14 16:10:58 +00:00
parent 6359ecc258
commit d0f052be51
9 changed files with 5859 additions and 0 deletions

View File

@ -7,6 +7,7 @@ SubInclude OBOS_TOP src tools gensyscalls ;
SubInclude OBOS_TOP src tools hey ;
SubInclude OBOS_TOP src tools rc ;
SubInclude OBOS_TOP src tools resattr ;
SubInclude OBOS_TOP src tools rman ;
SubInclude OBOS_TOP src tools translation ;
SubInclude OBOS_TOP src tools unflatten ;

118
src/tools/rman/CHANGES Normal file
View File

@ -0,0 +1,118 @@
1993
1 Apr as bs2tk posted to comp.lang.tcl (126 lines)
2 bullets, change bars, copyright symbol
5 boldface, other SGI nicks
7 skip unrecognized escape codes
10 small caps
13 underscores considered uppercase so show up in default small caps font
screen out Ultrix junk (code getting pretty tangled now)
14 until Tk text has better tab support, replace tabs by spaces until get to next tab stop (for Ultrix); -t gives tabstop spacing
20 Solaris support (Larry Tsui)
3 Jun section subheading parsing (Per-Erik Martin)
28 hyphenated man pages in SEE ALSO show up correctly in Links (Mike Steele)
13 Jul under FILES, fully qualified path names are added to Links, but this taken out immediately because not useful
14 option to keep changebars on right (Warren Jessop)
5 Aug search for header, footer dynamically--no need to edit or search large list of patterns
11 -m kicks in man page formatting beyond nroff backspace kludges
27 handle double digit numbers better by trying again relative to end of line
19 Sep -T gives Tk extras (otherwise ASCII only)
-H gives headers only (implies -T off)
10 Oct -r reverse compiles to [tn]roff source (as Geoff Collyer's nam and fontch, but leveraging existing analysis so only addition of ~60 lines) (The code is device-driver obscure now--obfuscated C contest next.)
13 header and footer optionally available at bottom in Tk view (Marty Leisner)
19 "reflected" odd and even page headers&footers zapped
20 keep count of sections and subsections, using smaller font for larger numbers
1 Nov reverse compiles to Ensemble, except for character ranges
4 started rman rewrite for cleaner support of multiple output targets, including: plain ascii, headers only, TkMan, [nt]roff, Ensemble, SGML, HTML
5 line filtering separated from other logic despite greater sophistication, RosettaMan faster than bs2tk (!)
28 Dec man page reference recognition (Michael Harrison)
1994
1 Jan identify descriptive lists by comparing scnt2 with s_avg
3 tail-end table of contents in HTML documents
5 -f <filter> and LaTeX output mode
24 proof-of-concept RTF output mode
26 handle man pages that don't have a header on the first page
28 parse "handwritten" man pages
22 Feb alpha version released
6 Mar various bug fixes
10 beta version released
13 Jun fixed surious generation on <DL>'s (the existence of which pointed out by David Sibley)
22 Jul table recognition experiment. works reasonably well, except for tables with centered headers
3 Aug allow for off-by-one (and -two) in identification of header and footer
fixed problem with recurrent/leftover text with OSF/1 bold bullets (yeesh)
12 Sep 2.0gamma released
13 check for *third* header, possibly centered, possibly after blank lines (Charles Anderson)
fixed tag ranges for lines following blank lines (just \n) of pages with global indentation (Owen Rees)
19 fixed two small problems with LaTeX (^ => \^, \bullet => $\bullet$) (Neal Becker)
24 simple check for erroneously being fed roff source
26 deal with bold +- as in ksh (ugh)
30 2.0delta released
9 Oct special check for OSF to guard against section head interpreted as footer
8 Nov Perl pod output format (result still needs work, but not much)
7 Dec 2.0epsilon released (last one before final 2.0)
22 Happy Winter Solstice! 2.0 released
deprecated gets() replaced (Robert Withrow)
25 TkMan module's $w.show => $t, saving about 9% in generated characters
1995
1 Jan experiment with TkMan output to take advantage of my hack to Tk text (i.e., $t insert end "text" => $t insert end "text1" tag1 "text2" tag2 ...); results => output size reduced about 25%, time reduced about 12-15%
25 Mar back to old mark command for Tk module
8 May hyphens in SEE ALSO section would confuse link-finder, so re-linebreak if necessary(!) (Greg Earle & Uri Guttman)
4 Aug put formats and options into tables (inspired by Steve Maguire's Writing Solid Code)
19 -V accepts colon-separated list of valid volume names (Dag Nygren)
22 MIME output format that's usable in Emacs 19.29 (just three hours at hacking HTML module) (Neal Becker)
9 Sep nits in HTML and better Solaris code snippets (Drazen Kacar)
13 Nov Macintosh port by Matthias Neeracher <neeri@iis.ee.ethz.ch>
18 Dec adapted to LaTeX2e, null manRef yields italicized man ref (H. Palme)
28 allow long option names (Larry Schwimmer)
1996
22 Jan fixed problem with hyphenation supression and tabs in man page--sheesh! (H. Palme)
23 May split TkMan format into Tk and TkMan (which calls Tk)
25 in TkMan format, initial spaces converted to tabs
24 Sep experiment with formatting from source's macros, for better transcription short of full nroff interpreter
27 commented out Ensemble output format, which nobody used
2 Oct >4000 lines
11
8 Nov release 3.0 alpha. source code parsing works well for Solaris, SunOS, HP-UX; in generating HTML
25 recognize URLs (Mic Campanel)
1997
19 Mar bug fixes, more special characters (roff expert Larry Jones)
8 Aug renamed to PolyglotMan
4 Nov TkMan module: Rebus and NoteMark identification for paragraph lengths and command line options taken over from Tcl (still have search, highlight in Tcl, necessarily) (Chad Loder)
>5000 lines, or nearly 40X the lines of code of version 1.0 (which just supported TkMan)
1998
20 Mar automatic detection of Tcl/Tk source (within automatic detection of source; already had within formatted)
centralize casifying SH and SS for formatted and source
group exception table by category
17 Apr in source translation, pass along comment lines too
20 incorporate (RCS) versioning diff's, for HTML
2000
22 Jun eliminate last of encumbered code, release as Open Source
release version 3.0.9
2002
24 Aug used in Apple's OS X 10.2 (Jaguar) to convert manual pages for display in Project Builder
2003
28 Mar updated for groff 1.18's new escape codes
remove Ensemble output format, which is obsolete
remove Texinfo output format, which is not useful
HTML tags in lowercase
released version 3.1
5 Jun applied Aaron Hawley's patches for DocBook XML
6 Jul assume HTML browsers support full set of entity references
discontinue support for Mac OS 9 and earlier (compiles out of the box on OS X)
25 tags well nested for troff source input (at last!)
26 release version 3.2

45
src/tools/rman/Jamfile Normal file
View File

@ -0,0 +1,45 @@
SubDir OBOS_TOP src tools rman ;
NotFile doc_files ;
Depends files : doc_files ;
SubDirCcFlags -w ;
BinCommand rman : rman.c : ;
rule Man2Doc
{
local source = [ FGristFiles $(2) ] ;
local binary = $(1) ;
SEARCH on $(source) = $(SEARCH_SOURCE) ;
MakeLocate $(binary) : [ FDirName $(OBOS_DISTRO_TARGET) beos documentation Shell_Tools ] ;
Depends $(binary) : $(source) rman ;
LocalDepends doc_files : $(binary) ;
Man2Doc1 $(binary) : rman $(source) ;
LocalClean clean : $(binary) ;
}
actions Man2Doc1
{
$(2[1]) -f XML "$(2[2])" > "$(1)" ;
}
rule Man2Docs
{
# Man2Docs <sources> ;
local source ;
for source in [ FGristFiles $(1) ]
{
local target = $(source:S=.xml) ;
Man2Doc $(target) : $(source) ;
}
}
Man2Docs rman.1 ;

25
src/tools/rman/MANIFEST Normal file
View File

@ -0,0 +1,25 @@
gcksum crc length name
---------- ------ ----
719617051 139770 rman.c
3816320114 2260 README-rman.txt
696804442 4302 Makefile
2527388467 11935 rman.1
2005445684 13618 site/rman.html
1619945447 6260 CHANGES
3825874035 4647 contrib/README-contrib
953283015 911 contrib/authried.txt
641079878 13609 contrib/bennett.txt
1827405570 1220 contrib/gzip.patch
4230791356 250 contrib/hman.cgi
1743159239 359 contrib/hman.ksh
2919099478 8005 contrib/hman.pl
1821945783 4978 contrib/http-rman.c
3091246851 661 contrib/http-rman.html
2646244070 350 contrib/lewis.pl
2860919314 2049 contrib/man2html
1315989744 6596 contrib/rman.pl
3466647040 7531 contrib/rman_html_split
75383086 2482 contrib/rman_html_split.1
2032677288 150 contrib/sco-wrapper.sh
884546947 3806 contrib/sutter.txt
3112317992 5086 contrib/youki.pl

160
src/tools/rman/Makefile Normal file
View File

@ -0,0 +1,160 @@
#
# Makefile for PolyglotMan
# It's helpful to read the README-rman.txt file first.
# You should read over all parts of this file,
# down to the "you shouldn't modify" line
#
# Tom Phelps (phelps@ACM.org)
#
### you need to localize the paths on these lines
# The executable `rman' is placed in BINDIR.
# If you're also installing TkMan (available separately--see README-rman.txt),
# this must be a directory that's in your bin PATH.
# MANDIR holds the man page.
BINDIR = /opt/local/bin
#BINDIR = /usr/local/bin
#BINDIR = //C/bin
MANDIR = /usr/local/man/man1
# popular alternative
#BINDIR = /opt/local/bin
#MANDIR = /opt/local/man/man1
### if you have GNU gcc, use these definitions
CC = gcc
CFLAGS = -O2 -finline-functions
### if you just have a standard UNIX, use these instead of GNU.
### CC must be an ANSI C compiler
#CC = cc
#CFLAGS = -O
# Solaris and SysV people may need this
#CFLAGS = -O2 -finline-functions
# For HP-UX
#CC = cc
#CFLAGS = -Aa -O
# HP-UX 10.20
#CFLAGS = -Ae -O
# DEC Alpha and Ultrix, -std1 needed to conform to ANSI C
#CC = cc
#CFLAGS = -std1 -O3 -Olimit 1000
# list of valid volume numbers and letters
# you can also set these at runtime with -V
VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p"
# SCO Unix has expanded set of volume letters
#VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p:C:X:S:L:M:F:G:A:H"
# SGI and UnixWare 2.0
#VOLLIST = "1:2:3:4:5:6:7:8:9:o:l:n:p:D"
# the printf strings used to set the HTML <TITLE> and
# to set URL hyperlinks to referenced manual pages
# can be defined at runtime. The defaults are defined below.
# The first %s parameter is the manual page name,
# the second the volume/section number.
# you can set these at runtime with -l and -r, respectively
MANTITLEPRINTF = "%s(%s) manual page"
# relative link to pregenerated file in same directory
MANREFPRINTF = "%s.%s"
# on-the-fly through a cgi-bin script
#MANREFPRINTF = "/cgi-bin/man2html?%s&%s"
#MANREFPRINTF = "/cgi-bin/man2html?m=%s&n=%s"
# # # these lines are probably fine
CP = cp
# or you can use GNU's cp and backup files that are about to be overwritten
#CP = cp -b
RM = rm
#--------------------------------------------------
#
# you shouldn't modify anything below here
#
#--------------------------------------------------
version = 3.2
rman = rman-$(version)
srcs = rman.c
objs = rman
defs = -DVOLLIST='$(VOLLIST)' -DMANTITLEPRINTF='$(MANTITLEPRINTF)' -DMANREFPRINTF='$(MANREFPRINTF)'
libs =
aux = README-rman.txt Makefile rman.1 site/rman.html CHANGES
distrib = $(srcs) $(libs) $(aux) contrib
all: rman
@echo 'Files made in current directory.'
@echo 'You should "make install".'
# everyone but me zaps assertions with the -DNDEBUG flag
rman: rman.c Makefile
$(CC) -DNDEBUG $(defs) -DPOLYGLOTMANVERSION=\"$(version)\" $(CFLAGS) -o rman rman.c
debug:
$(CC) $(defs) -DDEBUG -DPOLYGLOTMANVERSION=\"debug\" -g -Wall -o rman rman.c
prof:
quantify -cache-dir=/home/orodruin/h/bair/phelps/spine/rman/cache $(CC) -DNDEBUG $(defs) -DPOLYGLOTMANVERSION=\"QUANTIFY\" -g -o rman rman.c
install: rman
# $(INSTALL) -s rman $(BINDIR)
$(RM) -f $(BINDIR)/rman
$(CP) rman $(BINDIR)
$(RM) -f $(MANDIR)/rman.1
$(CP) rman.1 $(MANDIR)
# test version includes assertions
# ginstall rman $(BINDIR)/`arch`
test: rman.c Makefile
$(CC) $(defs) -DPOLYGLOTMANVERSION=\"$(version)\" $(CFLAGS) -Wall -ansi -pedantic -o rman rman.c
ls -l rman
ginstall rman $(BINDIR)
rman -v
rman --help
@echo 'Assertion checks:'
rman -f html weirdman/hp-tbl.1 > /dev/null
rman -f html weirdman/Pnews.1 > /dev/null
nroff -man rman.1 | rman -f html > /dev/null
sww:
rm -f rman $(wildcard ~/bin/{sun4,snake,alpha}/rman)
rman
clean:
rm -f $(objs)
dist:
rm -rf $(rman)*
mkdir $(rman)
$(CP) -RH $(distrib) $(rman)
# expand -4 rman.c > $(rman)/rman.c
rm -f $(rman)/contrib/*~
@echo 'gcksum crc length name' > MANIFEST
@echo '---------- ------ ----' >> MANIFEST
@cksum $(filter-out contrib, $(filter-out %~, $(distrib) $(wildcard contrib/*))) | tee -a MANIFEST
mv MANIFEST $(rman)
tar chvf $(rman).tar $(rman)
gzip -9v $(rman).tar
rm -rf $(rman)
# ANNOUNCE-rman rman.1
@echo "*** Did you remember to ci -l first?"
uu: tar
gznew $(rman).tar.Z
echo 'uudecode, gunzip (from GNU), untar' > $(rman).tar.gz.uu
uuencode $(rman).tar.gz $(rman).tar.gz >> $(rman).tar.gz.uu

View File

@ -0,0 +1,52 @@
The home location for PolyglotMan is polyglotman.sourceforge.net
*** INSTALLING ***
Set BINDIR in the Makefile to where you keep your binaries and MANDIR
to where you keep your man pages (in their source form). (If you're
using PolyglotMan with TkMan, BINDIR needs to be a component of your
bin PATH.) After properly editing the Makefile, type `make install'.
Thereafter (perhaps after a `rehash') type `rman' to invoke PolyglotMan.
PolyglotMan requires an ANSI C compiler. To compile on a Macintosh
under MPW, use Makefile.mac.
If you send me bug reports and/or suggestions for new features,
include the version of PolyglotMan (available by typing `rman -v').
PolyglotMan doesn't parse every aspect of every man page perfectly, but
if it blows up spectacularly where it doesn't seem like it should, you
can send me the man page (or a representative man page if it blows up
on a class of man pages) in BOTH: (1) [tn]roff source form, from
.../man/manX and (2) formatted form (as formatted by `nroff -man'),
uuencoded to preserve the control characters, from .../man/catX.
If you discover a bug and you obtained PolyglotMan at some other site,
check the home site to see if a newer version has already fixed the problem.
Be sure to look in the contrib directory for WWW server interfaces,
a batch converter, and a wrapper for SCO.
--------------------------------------------------
*** NOTES ON CURRENT VERSION ***
Help! I'm looking for people to help with the following projects.
(1) Better RTF output format. The current one works, but could be
made better. (2) Extending the macro sets for source recognition. If
you write an output format or otherwise improve PolyglotMan, please
send in your code so that I may share the wealth in future releases.
(3) Fixing output for various (accented?) characters in the Latin1
character set.
--------------------------------------------------
License
This software is distributed under the Artistic License (see
http://www.opensource.org/licenses/artistic-license.html).
(This version of PolyglotMan represents a complete rewrite of bs2tk,
which was packaged with TkMan in 1993, which is copyrighted by the
Regents of the University of California, and therefore is not under
their jurisdiction.)

273
src/tools/rman/rman.1 Normal file
View File

@ -0,0 +1,273 @@
.TH PolyglotMan 1
.SH "NAME "
PolyglotMan, rman - reverse compile man pages from formatted
form to a number of source formats
.SH "SYNOPSIS "
rman [ \fIoptions \fR] [ \fIfile \fR]
.SH "DESCRIPTION "
Up-to-date instructions can be found at
http://polyglotman.sourceforge.net/rman.html
.PP
\fIPolyglotMan \fR takes man pages from most of the popular flavors
of UNIX and transforms them into any of a number of text source
formats. PolyglotMan was formerly known as RosettaMan. The name
of the binary is still called \fIrman \fR, for scripts that depend
on that name; mnemonically, just think "reverse man". Previously \fI
PolyglotMan \fR required pages to be formatted by nroff prior
to its processing. With version 3.0, it \fIprefers [tn]roff source \fR
and usually produces results that are better yet. And source
processing is the only way to translate tables. Source format
translation is not as mature as formatted, however, so try formatted
translation as a backup.
.PP
In parsing [tn]roff source, one could implement an arbitrarily
large subset of [tn]roff, which I did not and will not do, so
the results can be off. I did implement a significant subset
of those use in man pages, however, including tbl (but not eqn),
if tests, and general macro definitions, so usually the results
look great. If they don't, format the page with nroff before
sending it to PolyglotMan. If PolyglotMan doesn't recognize a
key macro used by a large class of pages, however, e-mail me
the source and a uuencoded nroff-formatted page and I'll see
what I can do. When running PolyglotMan with man page source
that includes or redirects to other [tn]roff source using the .so (source
or inclusion) macro, you should be in the parent directory of
the page, since pages are written with this assumption. For example,
if you are translating /usr/man/man1/ls.1, first cd into /usr/man.
.PP
\fIPolyglotMan \fR accepts man pages from: SunOS, Sun Solaris,
Hewlett-Packard HP-UX, AT&T System V, OSF/1 aka Digital UNIX,
DEC Ultrix, SGI IRIX, Linux, FreeBSD, SCO. Source processing
works for: SunOS, Sun Solaris, Hewlett-Packard HP-UX, AT&T System
V, OSF/1 aka Digital UNIX, DEC Ultrix. It can produce printable
ASCII-only (control characters stripped), section headers-only,
Tk, TkMan, [tn]roff (traditional man page source), SGML, HTML,
MIME, LaTeX, LaTeX2e, RTF, Perl 5 POD. A modular architecture
permits easy addition of additional output formats.
.PP
The latest version of PolyglotMan is available from \fI
http://polyglotman.sourceforge.net/ \fR.
.SH "OPTIONS "
The following options should not be used with any others and
exit PolyglotMan without processing any input.
.TP 15
-h|--help
Show list of command line options and exit.
.TP 15
-v|--version
Show version number and exit.
.PP
\fIYou should specify the filter first, as this sets a number
of parameters, and then specify other options.
.TP 15
-f|--filter <ASCII|roff|TkMan|Tk|Sections|HTML|SGML|MIME|LaTeX|LaTeX2e|RTF|POD>
Set the output filter. Defaults to ASCII.
.TP 15
-S|--source
PolyglotMan tries to automatically determine whether its input
is source or formatted; use this option to declare source input.
.TP 15
-F|--format|--formatted
PolyglotMan tries to automatically determine whether its input
is source or formatted; use this option to declare formatted
input.
.TP 15
-l|--title \fIprintf-string \fR
In HTML mode this sets the <TITLE> of the man pages, given the
same parameters as \fI-r \fR.
.TP 15
-r|--reference|--manref \fIprintf-string \fR
In HTML and SGML modes this sets the URL form by which to retrieve
other man pages. The string can use two supplied parameters:
the man page name and its section. (See the Examples section.)
If the string is null (as if set from a shell by "-r ''"), `-'
or `off', then man page references will not be HREFs, just set
in italics. If your printf supports XPG3 positions specifier,
this can be quite flexible.
.TP 15
-V|--volumes \fI<colon-separated list> \fR
Set the list of valid volumes to check against when looking for
cross-references to other man pages. Defaults to \fI1:2:3:4:5:6:7:8:9:o:l:n:p \fR(volume
names can be multicharacter). If an non-whitespace string in
the page is immediately followed by a left parenthesis, then
one of the valid volumes, and ends with optional other characters
and then a right parenthesis--then that string is reported as
a reference to another manual page. If this -V string starts
with an equals sign, then no optional characters are allowed
between the match to the list of valids and the right parenthesis. (This
option is needed for SCO UNIX.)
.PP
The following options apply only when formatted pages are given
as input. They do not apply or are always handled correctly with
the source.
.TP 15
-b|--subsections
Try to recognize subsection titles in addition to section titles.
This can cause problems on some UNIX flavors.
.TP 15
-K|--nobreak
Indicate manual pages don't have page breaks, so don't look for
footers and headers around them. (Older nroff -man macros always
put in page breaks, but lately some vendors have realized that
printout are made through troff, whereas nroff -man is used to
format pages for reading on screen, and so have eliminated page
breaks.) \fIPolyglotMan \fR usually gets this right even without
this flag.
.TP 15
-k|--keep
Keep headers and footers, as a canonical report at the end of
the page. changeleft
Move changebars, such as those found in the Tcl/Tk manual pages,
to the left. --> notaggressive
\fIDisable \fR aggressive man page parsing. Aggressive manual,
which is on by default, page parsing elides headers and footers,
identifies sections and more. -->
.TP 15
-n|--name \fIname \fR
Set name of man page (used in roff format). If the filename is
given in the form " \fIname \fR. \fIsection \fR", the name and
section are automatically determined. If the page is being parsed
from [tn]roff source and it has a .TH line, this information
is extracted from that line.
.TP 15
-p|--paragraph
paragraph mode toggle. The filter determines whether lines should
be linebroken as they were by nroff, or whether lines should
be flowed together into paragraphs. Mainly for internal use.
.TP 15
-s|section \fI# \fR
Set volume (aka section) number of man page (used in roff format).
tables
Turn on aggressive table parsing. -->
.TP 15
-t|--tabstops \fI# \fR
For those macros sets that use tabs in place of spaces where
possible in order to reduce the number of characters used, set
tabstops every \fI# \fR columns. Defaults to 8.
.SH "NOTES ON FILTER TYPES "
.SS "ROFF "
Some flavors of UNIX ship man page without [tn]roff source, making
one's laser printer little more than a laser-powered daisy wheel.
This filer tries to intuit the original [tn]roff directives,
which can then be recompiled by [tn]roff.
.SS "TkMan "
TkMan, a hypertext man page browser, uses \fIPolyglotMan \fR
to show man pages without the (usually) useless headers and footers
on each pages. It also collects section and (optionally) subsection
heads for direct access from a pulldown menu. TkMan and Tcl/Tk,
the toolkit in which it's written, are available via anonymous
ftp from \fIftp://ftp.smli.com/pub/tcl/ \fR
.SS "Tk "
This option outputs the text in a series of Tcl lists consisting
of text-tags pairs, where tag names roughly correspond to HTML.
This output can be inserted into a Tk text widget by doing an \fI
eval <textwidget> insert end <text> \fR. This format should be
relatively easily parsible by other programs that want both the
text and the tags. Also see ASCII.
.SS "ASCII "
When printed on a line printer, man pages try to produce special
text effects by overstriking characters with themselves (to produce
bold) and underscores (underlining). Other text processing software,
such as text editors, searchers, and indexers, must counteract
this. The ASCII filter strips away this formatting. Piping nroff
output through \fIcol -b \fR also strips away this formatting,
but it leaves behind unsightly page headers and footers. Also
see Tk.
.SS "Sections "
Dumps section and (optionally) subsection titles. This might
be useful for another program that processes man pages.
.SS "HTML "
With a simple extention to an HTTP server for Mosaic or other
World Wide Web browser, \fIPolyglotMan \fR can produce high quality
HTML on the fly. Several such extensions and pointers to several
others are included in \fIPolyglotMan \fR's \fIcontrib \fR directory.
.SS "SGML "
This is appoaching the Docbook DTD, but I'm hoping that someone
that someone with a real interest in this will polish the tags
generated. Try it to see how close the tags are now.
.SS "MIME "
MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563,
good for consumption by MIME-aware e-mailers or as Emacs (>=19.29)
enriched documents.
.SS "LaTeX and LaTeX2e "
Why not?
.SS "RTF "
Use output on Mac or NeXT or whatever. Maybe take random man
pages and integrate with NeXT's documentation system better.
Maybe NeXT has own man page macros that do this.
.SS "PostScript and FrameMaker "
To produce PostScript, use \fIgroff \fR or \fIpsroff \fR. To
produce FrameMaker MIF, use FrameMaker's builtin filter. In both
cases you need \fI[tn]roff \fR source, so if you only have a
formatted version of the manual page, use \fIPolyglotMan \fR's
roff filter first.
.SH "EXAMPLES "
To convert the \fIformatted \fR man page named \fIls.1 \fR back
into [tn]roff source form:
.PP
\fIrman -f roff /usr/local/man/cat1/ls.1 > /usr/local/man/man1/ls.1 \fR
.br
.PP
Long man pages are often compressed to conserve space (compression
is especially effective on formatted man pages as many of the
characters are spaces). As it is a long man page, it probably
has subsections, which we try to separate out (some macro sets
don't distinguish subsections well enough for \fIPolyglotMan \fR
to detect them). Let's convert this to LaTeX format:
.br
.PP
\fIpcat /usr/catman/a_man/cat1/automount.z | rman -b -n automount -s 1 -f
latex > automount.man \fR
.br
.PP
Alternatively, \fIman 1 automount | rman -b -n automount -s 1 -f
latex > automount.man \fR
.br
.PP
For HTML/Mosaic users, \fIPolyglotMan \fR can, without modification
of the source code, produce HTML links that point to other HTML
man pages either pregenerated or generated on the fly. First
let's assume pregenerated HTML versions of man pages stored in \fI/usr/man/html \fR.
Generate these one-by-one with the following form:
.br
\fIrman -f html -r 'http:/usr/man/html/%s.%s.html' /usr/man/cat1/ls.1 > /usr/man/html/ls.1.html \fR
.br
.PP
If you've extended your HTML client to generate HTML on the fly
you should use something like:
.br
\fIrman -f html -r 'http:~/bin/man2html?%s:%s' /usr/man/cat1/ls.1 \fR
.br
when generating HTML.
.SH "BUGS/INCOMPATIBILITIES "
\fIPolyglotMan \fR is not perfect in all cases, but it usually
does a good job, and in any case reduces the problem of converting
man pages to light editing.
.PP
Tables in formatted pages, especially H-P's, aren't handled very
well. Be sure to pass in source for the page to recognize tables.
.PP
The man pager \fIwoman \fR applies its own idea of formatting
for man pages, which can confuse \fIPolyglotMan \fR. Bypass \fI
woman \fR by passing the formatted manual page text directly
into \fIPolyglotMan \fR.
.PP
The [tn]roff output format uses fB to turn on boldface. If your
macro set requires .B, you'll have to a postprocess the \fIPolyglotMan \fR
output.
.SH "SEE ALSO "
\fItkman(1) \fR, \fIxman(1) \fR, \fIman(1) \fR, \fIman(7) \fR
or \fIman(5) \fR depending on your flavor of UNIX
.SH "AUTHOR "
PolyglotMan
.br
by Thomas A. Phelps ( \fIphelps@ACM.org \fR)
.br
developed at the
.br
University of California, Berkeley
.br
Computer Science Division
.PP
Manual page last updated on $Date: 1998/07/13 09:47:28 $

4843
src/tools/rman/rman.c Normal file

File diff suppressed because it is too large Load Diff

342
src/tools/rman/rman.html Normal file
View File

@ -0,0 +1,342 @@
<html>
<head>
<title>PolyglotMan Manual Page</title>
</head>
<body>
<h1>Name</h1>
PolyglotMan, rman - reverse compile man pages from formatted form to a number of source formats
<h1>Synopsis</h1>
rman [<i>options</i>] [<var>file</var>]
<h1>Description</h1>
<p><i>PolyglotMan</i> takes man pages from most of the
popular flavors of UNIX and transforms them into any of a number of
text source formats. PolyglotMan was formerly known as RosettaMan.
The name of the binary is still called <tt>rman</tt>, for scripts
that depend on that name; mnemonically, just think "reverse man".
Previously <i>PolyglotMan</i> required pages to
be formatted by nroff prior to its processing. With version 3.0, it <i>prefers
[tn]roff source</i> and usually produces results that are better yet.
And source processing is the only way to translate tables.
Source format translation is not as mature as formatted, however, so
try formatted translation as a backup.
<p>In parsing [tn]roff source, one could implement an arbitrarily
large subset of [tn]roff, which I did not and will not do, so the
results can be off. I did implement a significant subset of those use
in man pages, however, including tbl (but not eqn), if tests, and
general macro definitions, so usually the results look great. If they
don't, format the page with nroff before sending it to PolyglotMan. If
PolyglotMan doesn't recognize a key macro used by a large class of
pages, however, e-mail me the source and a uuencoded nroff-formatted
page and I'll see what I can do. When running PolyglotMan with man
page source that includes or redirects to other [tn]roff source using
the .so (source or inclusion) macro, you should be in the parent
directory of the page, since pages are written with this assumption.
For example, if you are translating /usr/man/man1/ls.1, first cd into
/usr/man.
<p><i>PolyglotMan</i> accepts <em>formatted</em> man pages from:
<blockquote>SunOS, Sun Solaris, Hewlett-Packard HP-UX,
AT&amp;T System V, OSF/1 aka Digital UNIX, DEC Ultrix, SGI IRIX, Linux,
FreeBSD, SCO.</blockquote>
Man page <em>source</em> processing works for:
<blockquote>SunOS, Sun Solaris, Hewlett-Packard HP-UX,
AT&amp;T System V, OSF/1 aka Digital UNIX, DEC Ultrix.</blockquote>
It can produce
<blockquote>printable ASCII-only (control characters
stripped), section headers-only,
Tk, TkMan, [tn]roff (traditional man page source), partial DocBook XML, HTML, MIME,
LaTeX, LaTeX2e, RTF, Perl 5 POD.</blockquote>
A modular architecture permits easy addition of additional output
formats.</p>
<p>The latest version of PolyglotMan is available via
<a href='http://polyglotman.sourceforge.net/'>http://polyglotman.sourceforge.net/</a>.
<h1>Options</h1>
<p>The following options should not be used with any others and exit PolyglotMan
without processing any input.
<dl>
<dt>-h|--help</dt>
<dd>Show list of command line options and exit.</dd>
<dt>-v|--version</dt>
<dd>Show version number and exit.</dd>
</dl>
<p><em>You should specify the filter first, as this sets a number of parameters,
and then specify other options.</em>
<dl>
<dt>-f|--filter &lt;ASCII|roff|TkMan|Tk|Sections|HTML|MIME|LaTeX|LaTeX2e|RTF|POD&gt;</dt>
<dd>Set the output filter. Defaults to ASCII.
<!-- If you are converting
from formatted roff source, it is recommended that you prevent hyphenation by using
groff, making file with the contents ".hpm 20", can reading this in
before the roff source, e.g., groff -Tascii -man <hpm-file> <roff-source>.
-->
</dd>
<dt>-S|--source</dt>
<dd>PolyglotMan tries to automatically determine whether its input is source or formatted;
use this option to declare source input.</dd>
<dt>-F|--format|--formatted</dt>
<dd>PolyglotMan tries to automatically determine whether its input is source or formatted;
use this option to declare formatted input.</dd>
<dt>-l|--title <i>printf-string</i></dt>
<dd>In HTML mode this sets the &lt;TITLE&gt; of the man pages, given the same
parameters as <tt>-r</tt>.</dd>
<dt>-r|--reference|--manref <i>printf-string</i></dt>
<dd>In HTML <!--and SGML--> mode this sets the URL form by which to retrieve other man pages.
The string can use two supplied parameters: the man page name and its section.
(See the Examples section.) If the string is null (as if set from a shell
by "-r ''"), `-' or `off', then man page references will not be HREFs, just set in italics.
If your printf supports XPG3 positions specifier, this can be quite flexible.</dd>
<dt>-V|--volumes <i>&lt;colon-separated list&gt;</i></dt>
<dd>Set the list of valid volumes to check against when looking for
cross-references to other man pages. Defaults to <tt>1:2:3:4:5:6:7:8:9:o:l:n:p</tt>
(volume names can be multicharacter).
If an non-whitespace string in the page is immediately followed by a left
parenthesis, then one of the valid volumes, and ends with optional other
characters and then a right parenthesis--then that string is reported as
a reference to another manual page. If this -V string starts with an equals
sign, then no optional characters are allowed between the match to the list of
valids and the right parenthesis. (This option is needed for SCO UNIX.)
</dd>
</dl>
<p>The following options apply only when formatted pages are given as input.
They do not apply or are always handled correctly with the source.
<dl>
<dt>-b|--subsections</dt>
<dd>Try to recognize subsection titles in addition to section titles.
This can cause problems on some UNIX flavors.</dd>
<dt>-K|--nobreak</dt>
<dd>Indicate manual pages don't have page breaks, so don't look for footers and headers
around them. (Older nroff -man macros always put in page breaks, but lately
some vendors have realized that printout are made through troff, whereas
nroff -man is used to format pages for reading on screen, and so have eliminated
page breaks.) <i>PolyglotMan</i> usually gets this right even without this flag.</dd>
<dt>-k|--keep</dt>
<dd>Keep headers and footers, as a canonical report at the end of the page.</dd>
<!-- this done automatically for Tcl/Tk pages; doesn't apply for others
<dt>-c|--changeleft</dt>
<dd>Move changebars, such as those found in the Tcl/Tk manual pages,
to the left.</dd>
-->
<!-- agressive parsing works so well that this option has been removed
<dt>-m|--notaggressive</dt>
<dd><i>Disable</i> aggressive man page parsing. Aggressive manual,
which is on by default, page parsing elides headers and footers,
identifies sections and more.</dd>
-->
<dt>-n|--name <i>name</i></dt>
<dd>Set name of man page (used in roff format).
If the filename is given in the form "<i>name</i>.<i>section</i>", the name
and section are automatically determined. If the page is being parsed from
[tn]roff source and it has a .TH line, this information is extracted from that line.</dd>
<dt>-p|--paragraph</dt>
<dd>paragraph mode toggle. The filter determines whether lines should be linebroken
as they were by nroff, or whether lines should be flowed together into paragraphs.
Mainly for internal use.</dd>
<dt>-s|section <i>#</i></dt>
<dd>Set volume (aka section) number of man page (used in roff format).</dd>
<!-- if in source automatic, if in preformatted really doesn't work
<dt>-T|--tables</dt>
<dd>Turn on aggressive table parsing.</dd>
-->
<dt>-t|--tabstops <i>#</i></dt>
<dd>For those macros sets that use tabs in place of spaces where
possible in order to reduce the number of characters used, set
tabstops every <i>#</i> columns. Defaults to 8.</dd>
</dl>
<h1>Notes on Filter Types</h1>
<h2>ROFF</h2>
<p>Some flavors of UNIX ship man page without [tn]roff source, making one's laser printer
little more than a laser-powered daisy wheel. This filer tries to intuit
the original [tn]roff directives, which can then be recompiled by [tn]roff.</p>
<h2>TkMan</h2>
<p>TkMan, a hypertext man page browser, uses <i>PolyglotMan</i> to show
man pages without the (usually) useless headers and footers on each
pages. It also collects section and (optionally) subsection heads for
direct access from a pulldown menu. TkMan and Tcl/Tk, the toolkit in
which it's written, are available via anonymous ftp from
<tt>ftp://ftp.smli.com/pub/tcl/</tt></p>
<h2>Tk</h2>
<p>This option outputs the text in a series of Tcl lists consisting of
text-tags pairs, where tag names roughly correspond to HTML. This
output can be inserted into a Tk text widget by doing an <tt>eval
&lt;textwidget&gt; insert end &lt;text&gt;</tt>. This format should be relatively
easily parsible by other programs that want both the text and the
tags. Also see ASCII.</p>
<h2>ASCII</h2>
<p>When printed on a line printer, man pages try to produce special text effects
by overstriking characters with themselves (to produce bold) and underscores
(underlining). Other text processing software, such as text editors, searchers,
and indexers, must counteract this. The ASCII filter strips away this formatting.
Piping nroff output through <tt>col -b</tt> also strips away this formatting,
but it leaves behind unsightly page headers and footers. Also see Tk.</p>
<h2>Sections</h2>
<p>Dumps section and (optionally) subsection titles. This might be useful for
another program that processes man pages.</p>
<h2>HTML</h2>
<p>With a simple extention to an HTTP server for Mosaic or other World Wide Web
browser, <i>PolyglotMan</i> can produce high quality HTML on the fly.
Several such extensions and pointers to several others are included in <i>PolyglotMan</i>'s
<tt>contrib</tt> directory.</p>
<h2>XML</h2>
<p>This is appoaching the Docbook DTD, but I'm hoping that someone that someone
with a real interest in this will polish the tags generated. Try it to see
how close the tags are now.</p>
<p>Improved by Aaron Hawley, but still he notes
<blockquote>
Output requires human intervention to become proper
DocBook format. This is a result of the fundamental
nature of nroff and DocBook xml. One is marked for
formating the other is marked for semantics (defining
what the content is rather then what it should look
like). For instance, italics and bold formatting are
converted to emphasis and command DocBook elements
respectively even though they should probably be marked
up as command, option, literal, arg, option and other
possible DocBook tags.
</blockquote>
</p>
<h2>MIME</h2>
<p>MIME (Multipurpose Internet Mail Extensions) as defined by RFC 1563,
good for consumption by MIME-aware e-mailers or as Emacs (>=19.29)
enriched documents.</p>
<h2>LaTeX and LaTeX2e</h2>
Why not?
<h2>RTF</h2>
<p>Use output on Mac or NeXT or whatever. Maybe take random man pages
and integrate with NeXT's documentation system better. Maybe NeXT has
own man page macros that do this.</p>
<h2>PostScript and FrameMaker</h2>
<p>To produce PostScript, use <tt>groff</tt> or <tt>psroff</tt>. To produce FrameMaker MIF,
use FrameMaker's built-in filter. In both cases you need <tt>[tn]roff</tt> source,
so if you only have a formatted version of the manual page, use <i>PolyglotMan</i>'s
roff filter first.</p>
<h1>Examples</h1>
<p>To convert the <i>formatted</i> man page named <tt>ls.1</tt> back into
[tn]roff source form:</p>
<p>
<tt>rman -f roff /usr/local/man/cat1/ls.1 > /usr/local/man/man1/ls.1</tt><br>
<p>Long man pages are often compressed to conserve space (compression is
especially effective on formatted man pages as many of the characters
are spaces). As it is a long man page, it probably has subsections,
which we try to separate out (some macro sets don't distinguish
subsections well enough for <i>PolyglotMan</i> to detect them). Let's convert
this to LaTeX format:<br>
<p>
<tt>pcat /usr/catman/a_man/cat1/automount.z | rman -b -n automount -s 1 -f latex > automount.man</tt><br>
<p>Alternatively,
<tt>man 1 automount | rman -b -n automount -s 1 -f latex > automount.man</tt><br>
<p>For HTML/Mosaic users, <i>PolyglotMan</i> can, without modification of the
source code, produce HTML links that point to other HTML man pages
either pregenerated or generated on the fly. First let's assume
pregenerated HTML versions of man pages stored in <i>/usr/man/html</i>.
Generate these one-by-one with the following form:<br>
<tt>rman -f html -r 'http:/usr/man/html/%s.%s.html' /usr/man/cat1/ls.1 > /usr/man/html/ls.1.html</tt><br>
<p>If you've extended your HTML client to generate HTML on the fly you should use
something like:<br>
<tt>rman -f html -r 'http:~/bin/man2html?%s:%s' /usr/man/cat1/ls.1</tt><br>
when generating HTML.</p>
<h1>Bugs/Incompatibilities</h1>
<p><i>PolyglotMan</i> is not perfect in all cases, but it usually does a
good job, and in any case reduces the problem of converting man pages
to light editing.</p>
<p>Tables in formatted pages, especially H-P's, aren't handled very well.
Be sure to pass in source for the page to recognize tables.</p>
<p>The man pager <i>woman</i> applies its own idea of formatting for
man pages, which can confuse <i>PolyglotMan</i>. Bypass <i>woman</i>
by passing the formatted manual page text directly into
<i>PolyglotMan</i>.</p>
<p>The [tn]roff output format uses fB to turn on boldface. If your macro set
requires .B, you'll have to a postprocess the <i>PolyglotMan</i> output.</p>
<h1>See Also</h1>
<tt>tkman(1)</tt>, <tt>xman(1)</tt>, <tt>man(1)</tt>, <tt>man(7)</tt> or <tt>man(5)</tt> depending on your flavor of UNIX
<p>GNU groff can now output to HTML.
<h1>Author</h1>
<p>PolyglotMan<br>
Copyright (c) 1994-2003 T.A. Phelps<br />
developed at the<br>
University of California, Berkeley<br />
Computer Science Division
<p>Manual page last updated on $Date: 2003/03/29 08:09:13 $
</body>
</html>