c9bcef0391
This file lists the most important changes in the mandoc.bsd.lv distribution. Changes in version 1.14.4, released on August 8, 2018 --- MAJOR NEW FEATURES --- * In ASCII output, render mathematical symbols and greek letters as transliterations conveying the characters' meanings rather than trying to imitate their shape. Consequently, such characters can now be used in portable manual pages. All the same, please limit their use to contexts where they really matter, for example when showing complicated mathematical formulae. * First steps towards better support for small screens in HTML output (responsive design): avoid most style= attributes, in particular all hard-coded indentations and column widths, and provide a better mandoc.css style sheet with a @media query, using em units throughout, and avoiding redundancy in selectors. * Better HTML output with some more fitting HTML elements, eliminating needless class= attributes, and avoiding various HTML syntax errors (element nesting, URL-fragment syntax, duplicate id= attributes). --- MINOR NEW FEATURES --- * When a man(1) argument contains a slash, imply -l like in man-db. * Use TIOCGWINSZ to reduce the default -Owidth and -Oindent during interactive use on terminals narrower than 79 columns. * Generated PostScript files are now more than 50% smaller. * Terminal rendering of eqn(7) is improved in several respects. * Simplified and nicer output from the mdoc(7) .Lk macro, formatting all links in-line, even long ones. * roff(7) \n+ and \n- numerical register auto-increment and -decrement * roff(7) .nr optional third argument (auto-increment step size) * Autodetect in ./configure whether the compiler can use -W and -static, allowing to build on Solaris 10 and 11 without any configure.local. --- RELIABILITY BUGFIXES --- * Only activate UTF-8 output when the user really selected UTF-8, not some other multibyte character encoding. * Prevent excessive .ll arguments from generating infinite output. * Fix out of bounds accesses to parse buffers that could happen when using renamed or user defined macros after roff(7) conditionals. * Avoid an assertion failure in certain .Bl -column lists. * Avoid a NULL pointer access on deroff() failure after '.SS ""'. * Fix a segfault that could be triggered by two invalid .Dt macros. * Fix two syntax errors in generated PDF files. * Properly state the page size in generated PostScript files. * Close a memory leak caused by missing gzclose(3). * Fix misformatting of man(7) documents lacking .SH macros in PostScript and PDF output. * And many minor bugfixes. --- THANKS TO --- * Marc Espie (OpenBSD) for implementing the size reduction of PostScript files, one additional patch for code simplification, and two bug reports. * Theo Buehler (OpenBSD) for a bugfix patch, and Theo de Raadt (OpenBSD) for checking it. * John Gardner for more than a dozen suggestions regarding HTML output. * Mike Williams for teaching me how to use %%DocumentMedia and setpagedevice in PostScript files. * Werner Lemberg (groff) for feedback on mdoc(7) language changes. * Colin Watson (man-db) for feedback on man-db semantics. * Jason McIntyre (OpenBSD) for lots of feedback and suggestions on diagnostic messages and on the documentation. * Thomas Klausner (NetBSD) for suggesting two new style messages and one new feature, for two bug reports, and for release testing. * Leah Neukirchen (Void Linux) for suggesting a new style message, five bug reports, and release testing. * Anthony Bentley (OpenBSD) for reporting multiple bugs and missing features. * Paul Irofti (OpenBSD) and Nate Bargmann for suggesting new features. * Michael Stapelberg (Debian) for bug reports and release testing. * Christian Weisgerber, Jonathan Gray, Stuart Henderson, Ted Unangst (OpenBSD), Takeshi Nakayama (NetBSD), Anton Lazarov, Jakub Klinkovsky, Jan Stary, Jesper Wallin, Will Backmam, and Wolfgang Mueller for bug reports. * Sevan Janiyan (NetBSD) for additions to lib.in. * George Brown for suggesting code simplifications. * David Coppa, Igor Sobrado (OpenBSD), and Alexander Kuleshov for documentation improvements. * Laura Morales and Raf Czlonka for questions resulting in better documentation. * Yuri Pankov (illumos) for release testing. Changes in version 1.14.3, released on August 5, 2017 --- BUG FIXES --- * man(7): Do not crash with out-of-bounds read access to a constant array if .sp or a blank line immediately precedes .SS or .SH. * mdoc(7): Do not crash with out-of-bounds read access to a constant array if .sp or a blank line precede the first .Sh macro. * tbl(7): Ignore explicitly specified negative column widths rather than wrapping around to huge numbers and risking memory exhaustion. * man(1): No longer use names that only occur in the SYNOPSIS section. Gets rid of some surprising behaviour and bogus warnings. --- THANKS TO --- Leah Neukirchen (Void Linux), Markus Waldeck (Debian), Peter Bui (nd.edu), and Yuri Pankov (illumos) for bug reports. Changes in version 1.14.2, released on July 28, 2017 --- MAJOR NEW FEATURES --- * New mdoc(7) -Tmarkdown output mode. * For -Thtml, implement internal hyperlinks pointing to authoritative definitions of various syntax elements, similar to the ctags(1)-like less(1) :t internal searching in terminal mode. * Provide a superset of the functionality of the former mdoclint(1) utility and a new -Wstyle message level with several new messages, including validity checking of .Xr cross references. * tbl(7): Implement automatic line breaking inside individual table cells, and several other formatting improvements. * eqn(7): Complete rewrite of the lexer, resulting in several bugfixes. * Continue parser unification, in particular allowing generation of syntax tree nodes on the roff(7) level, allowing implementation of many additional roff requests. --- REMOVED FUNCTIONALITY --- * Delete the manpage(1) utility. It was never enabled in any release. * Delete the -Txhtml command line option. It has been an obsolete alias for the -Thtml output mode for more than two years. --- MINOR NEW FEATURES --- * -Tlint now puts parser messages on stdout instead of stderr, making commands like "man -l -Tlint *.1" useful. * mdoc(7): Various .Lk formatting improvements. * mdoc(7) -Thtml: Better CSS for .Bl lists. * man(7): Implement the .MT/.ME block macro (mailto hyperlink). * man(7): Implement the .DT macro (restore default tab positions). * man(7): Improved support for manuals generated with reStructuredText by partial support for the \n[an-margin] number register. * man(7) -Thtml: Support deep linking to .SH and .SS headers. * tbl(7): Implement the "allbox" table option. * tbl(7): Implement the column spacing and the 'w' (minimum column width) layout modifiers. * tbl(7): Significant improvements of the manual page. * eqn(7): Much improved font selection, including recognition of well-known function names, and a few other formatting improvements. * eqn(7) -Thtml: Use <mn> and <mo> in addition to <mi>. * roff(7): Implement the .ce (centering), .mc (margin character), .rj (right justify), .ta (define tab stops), .ti (temporary indent), .als (macro alias), .ec and .eo (escape character control), .po (page offset), and .rn (macro rename) requests. * roff(7) .am: Implement appending to mdoc(7) and man(7) macros. * roff(7): implement the \h (horizontol motion), \l (horizontal line drawing), and \p (break output line) escape sequences, and also several additional character escape sequences. * roff(7): Implement the 'd' conditional (macro or string defined). * man.cgi(8) now uses pledge(2), too. * regress.pl(1): simpler user interface, better summary output, simpler code, and no more recursion. --- THANKS TO --- * Anthony Bentley (OpenBSD) for the implementation of .MT/.ME, reports of many bugs and missing features, and suggestions for a number of feature and documentation improvements. * Sebastien Marie (OpenBSD) for two source code patches and for some useful discussions. * Florian Obser (OpenBSD) for a bugfix patch and a bug report. * Jonathan Gray (OpenBSD) for several bug reports from afl(1) and several more from static analysis tools. * Theo Buehler (OpenBSD) for several bug reports, most from afl(1). * Jason McIntyre (OpenBSD) for many useful discussions about a wide variety of topics, lots of continuous testing, a number of bug reports, and some suggestions for messages and documentation. * Thomas Klausner (NetBSD) for lots of help while migrating mdoclint(1) functionality to mandoc -Tlint, for suggesting several useful new messages, and for release testing. * Reyk Floeter (OpenBSD) and Vsevolod Stakhov (FreeBSD) for suggesting a markdown output mode. * Thomas Guettler for suggesting -Thtml internal hyperlinks. * Yuri Pankov (Illumos) for inspiring new warning messages and for extensive release testing. * Anton Lindqvist and TJ Townsend (both OpenBSD) and Jan Stary for multiple bug reports. * Leah Neukirchen (Void Linux) for bug reports and release testing. * Michael Stapelberg (Debian) for suggesting feature improvements and for release testing. * Martin Natano and Theo de Raadt (both OpenBSD), Andreas Voegele, Gabriel Guzman, Gonzalo Tornaria, Markus Waldeck, and Raf Czlonka for bug reports. * Antoine Jacoutot (OpenBSD) and Steffen Nurpmeso for suggesting feature improvements. * Dag-Erling Smoergrav (FreeBSD) for inspiring new warning messages. * Ted Unangst and Marc Espie (OpenBSD) for providing useful ideas. * Svyatoslav Mishyn (Crux Linux) for release testing. * Carsten Kunze (Heirloom roff) for help keeping mandoc and groff compatible and for committing some of my patches to groff.
368 lines
9.3 KiB
Groff
368 lines
9.3 KiB
Groff
.\" Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp
|
|
.\"
|
|
.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
|
|
.\"
|
|
.\" Permission to use, copy, modify, and distribute this software for any
|
|
.\" purpose with or without fee is hereby granted, provided that the above
|
|
.\" copyright notice and this permission notice appear in all copies.
|
|
.\"
|
|
.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
|
.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
|
.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
|
.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
|
.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
|
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
|
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
.\"
|
|
.Dd July 4, 2017
|
|
.Dt MANDOC_ESCAPE 3
|
|
.Os
|
|
.Sh NAME
|
|
.Nm mandoc_escape
|
|
.Nd parse roff escape sequences
|
|
.Sh SYNOPSIS
|
|
.In sys/types.h
|
|
.In mandoc.h
|
|
.Ft "enum mandoc_esc"
|
|
.Fo mandoc_escape
|
|
.Fa "const char **end"
|
|
.Fa "const char **start"
|
|
.Fa "int *sz"
|
|
.Fc
|
|
.Sh DESCRIPTION
|
|
This function scans a
|
|
.Xr roff 7
|
|
escape sequence.
|
|
.Pp
|
|
An escape sequence consists of
|
|
.Bl -dash -compact -width 2n
|
|
.It
|
|
an initial backslash character
|
|
.Pq Sq \e ,
|
|
.It
|
|
a single ASCII character called the escape sequence identifier,
|
|
.It
|
|
and, with only a few exceptions, an argument.
|
|
.El
|
|
.Pp
|
|
Arguments can be given in the following forms; some escape sequence
|
|
identifiers only accept some of these forms as specified below.
|
|
The first three forms are called the standard forms.
|
|
.Bl -tag -width 2n
|
|
.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
|
|
The argument starts after the initial
|
|
.Sq \&[ ,
|
|
ends before the final
|
|
.Sq \&] ,
|
|
and the escape sequence ends with the final
|
|
.Sq \&] .
|
|
.It Two-character argument short form: Ic \&( Ns Ar ar
|
|
This form can only be used for arguments
|
|
consisting of exactly two characters.
|
|
It has the same effect as
|
|
.Ic \&[ Ns Ar ar Ns Ic \&] .
|
|
.It One-character argument short form: Ar a
|
|
This form can only be used for arguments
|
|
consisting of exactly one character.
|
|
It has the same effect as
|
|
.Ic \&[ Ns Ar a Ns Ic \&] .
|
|
.It Delimited form: Ar C Ns Ar argument Ns Ar C
|
|
The argument starts after the initial delimiter character
|
|
.Ar C ,
|
|
ends before the next occurrence of the delimiter character
|
|
.Ar C ,
|
|
and the escape sequence ends with that second
|
|
.Ar C .
|
|
Some escape sequences allow arbitrary characters
|
|
.Ar C
|
|
as quoting characters, some restrict the range of characters
|
|
that can be used as quoting characters.
|
|
.El
|
|
.Pp
|
|
Upon function entry,
|
|
.Fa end
|
|
is expected to point to the escape sequence identifier.
|
|
The values passed in as
|
|
.Fa start
|
|
and
|
|
.Fa sz
|
|
are ignored and overwritten.
|
|
.Pp
|
|
By design, this function cannot handle those
|
|
.Xr roff 7
|
|
escape sequences that require in-place expansion, in particular
|
|
user-defined strings
|
|
.Ic \e* ,
|
|
number registers
|
|
.Ic \en ,
|
|
width measurements
|
|
.Ic \ew ,
|
|
and numerical expression control
|
|
.Ic \eB .
|
|
These are handled by
|
|
.Fn roff_res ,
|
|
a private preprocessor function called from
|
|
.Fn roff_parseln ,
|
|
see the file
|
|
.Pa roff.c .
|
|
.Pp
|
|
The function
|
|
.Fn mandoc_escape
|
|
is used
|
|
.Bl -dash -compact -width 2n
|
|
.It
|
|
recursively by itself, because some escape sequence arguments can
|
|
in turn contain other escape sequences,
|
|
.It
|
|
for error detection internally by the
|
|
.Xr roff 7
|
|
parser part of the
|
|
.Xr mandoc 3
|
|
library, see the file
|
|
.Pa roff.c ,
|
|
.It
|
|
above all externally by the
|
|
.Xr mandoc 1
|
|
formatting modules, in particular
|
|
.Fl Tascii
|
|
and
|
|
.Fl Thtml ,
|
|
for formatting purposes, see the files
|
|
.Pa term.c
|
|
and
|
|
.Pa html.c ,
|
|
.It
|
|
and rarely externally by high-level utilities using the mandoc library,
|
|
for example
|
|
.Xr makewhatis 8 ,
|
|
to purge escape sequences from text.
|
|
.El
|
|
.Sh RETURN VALUES
|
|
Upon function return, the pointer
|
|
.Fa end
|
|
is set to the character after the end of the escape sequence,
|
|
such that the calling higher-level parser can easily continue.
|
|
.Pp
|
|
For escape sequences taking an argument, the pointer
|
|
.Fa start
|
|
is set to the beginning of the argument and
|
|
.Fa sz
|
|
is set to the length of the argument.
|
|
For escape sequences not taking an argument,
|
|
.Fa start
|
|
is set to the character after the end of the sequence and
|
|
.Fa sz
|
|
is set to 0.
|
|
Both
|
|
.Fa start
|
|
and
|
|
.Fa sz
|
|
may be
|
|
.Dv NULL ;
|
|
in that case, the argument and the length are not returned.
|
|
.Pp
|
|
For sequences taking an argument, the function
|
|
.Fn mandoc_escape
|
|
returns one of the following values:
|
|
.Bl -tag -width 2n
|
|
.It Dv ESCAPE_FONT
|
|
The escape sequence
|
|
.Ic \ef
|
|
taking an argument in standard form:
|
|
.Ic \ef[ , \ef( , \ef Ns Ar a .
|
|
Two-character arguments starting with the character
|
|
.Sq C
|
|
are reduced to one-character arguments by skipping the
|
|
.Sq C .
|
|
More specific values are returned for the most commonly used arguments:
|
|
.Bl -column "argument" "ESCAPE_FONTITALIC"
|
|
.It argument Ta return value
|
|
.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
|
|
.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
|
|
.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
|
|
.It Cm P Ta Dv ESCAPE_FONTPREV
|
|
.It Cm BI Ta Dv ESCAPE_FONTBI
|
|
.El
|
|
.It Dv ESCAPE_SPECIAL
|
|
The escape sequence
|
|
.Ic \eC
|
|
taking an argument delimited with the single quote character
|
|
and, as a special exception, the escape sequences
|
|
.Em not
|
|
having an identifier, that is, those where the argument, in standard
|
|
form, directly follows the initial backslash:
|
|
.Ic \eC' , \e[ , \e( , \e Ns Ar a .
|
|
Note that the one-character argument short form can only be used for
|
|
argument characters that do not clash with escape sequence identifiers.
|
|
.Pp
|
|
If the argument matches one of the forms described below under
|
|
.Dv ESCAPE_UNICODE ,
|
|
that value is returned instead.
|
|
.Pp
|
|
The
|
|
.Dv ESCAPE_SPECIAL
|
|
special character escape sequences can be rendered using the functions
|
|
.Fn mchars_spec2cp
|
|
and
|
|
.Fn mchars_spec2str
|
|
described in the
|
|
.Xr mchars_alloc 3
|
|
manual.
|
|
.It Dv ESCAPE_UNICODE
|
|
Escape sequences of the same format as described above under
|
|
.Dv ESCAPE_SPECIAL ,
|
|
but with an argument of the forms
|
|
.Ic u Ns Ar XXXX ,
|
|
.Ic u Ns Ar YXXXX ,
|
|
or
|
|
.Ic u10 Ns Ar XXXX
|
|
where
|
|
.Ar X
|
|
and
|
|
.Ar Y
|
|
are hexadecimal digits and
|
|
.Ar Y
|
|
is not zero:
|
|
.Ic \eC'u , \e[u .
|
|
As a special exception,
|
|
.Fa start
|
|
is set to the character after the
|
|
.Ic u ,
|
|
and the
|
|
.Fa sz
|
|
return value does not include the
|
|
.Ic u
|
|
either.
|
|
.Pp
|
|
Such Unicode character escape sequences can be rendered using the function
|
|
.Fn mchars_num2uc
|
|
described in the
|
|
.Xr mchars_alloc 3
|
|
manual.
|
|
.It Dv ESCAPE_NUMBERED
|
|
The escape sequence
|
|
.Ic \eN
|
|
followed by a delimited argument.
|
|
The delimiter character is arbitrary except that digits cannot be used.
|
|
If a digit is encountered instead of the opening delimiter, that
|
|
digit is considered to be the argument and the end of the sequence, and
|
|
.Dv ESCAPE_IGNORE
|
|
is returned.
|
|
.Pp
|
|
Such ASCII character escape sequences can be rendered using the function
|
|
.Fn mchars_num2char
|
|
described in the
|
|
.Xr mchars_alloc 3
|
|
manual.
|
|
.It Dv ESCAPE_OVERSTRIKE
|
|
The escape sequence
|
|
.Ic \eo
|
|
followed by an argument delimited by an arbitrary character.
|
|
.It Dv ESCAPE_IGNORE
|
|
.Bl -bullet -width 2n
|
|
.It
|
|
The escape sequence
|
|
.Ic \es
|
|
followed by an argument in standard form or by an argument delimited
|
|
by the single quote character:
|
|
.Ic \es' , \es[ , \es( , \es Ns Ar a .
|
|
As a special exception, an optional
|
|
.Sq +
|
|
or
|
|
.Sq \-
|
|
character is allowed after the
|
|
.Sq s
|
|
for all forms.
|
|
.It
|
|
The escape sequences
|
|
.Ic \eF ,
|
|
.Ic \eg ,
|
|
.Ic \ek ,
|
|
.Ic \eM ,
|
|
.Ic \em ,
|
|
.Ic \en ,
|
|
.Ic \eV ,
|
|
and
|
|
.Ic \eY
|
|
followed by an argument in standard form.
|
|
.It
|
|
The escape sequences
|
|
.Ic \eA ,
|
|
.Ic \eb ,
|
|
.Ic \eD ,
|
|
.Ic \eR ,
|
|
.Ic \eX ,
|
|
and
|
|
.Ic \eZ
|
|
followed by an argument delimited by an arbitrary character.
|
|
.It
|
|
The escape sequences
|
|
.Ic \eH ,
|
|
.Ic \eh ,
|
|
.Ic \eL ,
|
|
.Ic \el ,
|
|
.Ic \eS ,
|
|
.Ic \ev ,
|
|
and
|
|
.Ic \ex
|
|
followed by an argument delimited by a character that cannot occur
|
|
in numerical expressions.
|
|
However, if any character that can occur in numerical expressions
|
|
is found instead of a delimiter, the sequence is considered to end
|
|
with that character, and
|
|
.Dv ESCAPE_ERROR
|
|
is returned.
|
|
.El
|
|
.It Dv ESCAPE_ERROR
|
|
Escape sequences taking an argument but not matching any of the above patterns.
|
|
In particular, that happens if the end of the logical input line
|
|
is reached before the end of the argument.
|
|
.El
|
|
.Pp
|
|
For sequences that do not take an argument, the function
|
|
.Fn mandoc_escape
|
|
returns one of the following values:
|
|
.Bl -tag -width 2n
|
|
.It Dv ESCAPE_SKIPCHAR
|
|
The escape sequence
|
|
.Qq \ez .
|
|
.It Dv ESCAPE_NOSPACE
|
|
The escape sequence
|
|
.Qq \ec .
|
|
.It Dv ESCAPE_IGNORE
|
|
The escape sequences
|
|
.Qq \ed
|
|
and
|
|
.Qq \eu .
|
|
.El
|
|
.Sh FILES
|
|
This function is implemented in
|
|
.Pa mandoc.c .
|
|
.Sh SEE ALSO
|
|
.Xr mchars_alloc 3 ,
|
|
.Xr mandoc_char 7 ,
|
|
.Xr roff 7
|
|
.Sh HISTORY
|
|
This function has been available since mandoc 1.11.2.
|
|
.Sh AUTHORS
|
|
.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
|
|
.An Ingo Schwarze Aq Mt schwarze@openbsd.org
|
|
.Sh BUGS
|
|
The function doesn't cleanly distinguish between sequences that are
|
|
valid and supported, valid and ignored, valid and unsupported,
|
|
syntactically invalid, or undefined.
|
|
For sequences that are ignored or unsupported, it doesn't tell
|
|
whether that deficiency is likely to cause major formatting problems
|
|
and/or loss of document content.
|
|
The function is already rather complicated and still parses some
|
|
sequences incorrectly.
|
|
.
|
|
.ig
|
|
For these sequences, the list given below specifies a starting string
|
|
and either the length of the argument or an ending character.
|
|
The argument starts after the starting string.
|
|
In the former case, the sequence ends with the end of the argument.
|
|
In the latter case, the argument ends before the ending character,
|
|
and the sequence ends with the ending character.
|
|
..
|