Document that tr(1) was written for US-ASCII and may not work as

expected on other character sets which do not share ASCII's properties
(e.g. a symmetric set of capital and lower case characters), per PR 18738

Change all double quotes to nroff macros.
Change "System V" references to the .At macro.
This commit is contained in:
fair 2004-03-24 06:35:53 +00:00
parent afedcd8968
commit c223370599
1 changed files with 86 additions and 28 deletions

View File

@ -1,4 +1,4 @@
.\" $NetBSD: tr.1,v 1.13 2003/08/07 11:16:47 agc Exp $
.\" $NetBSD: tr.1,v 1.14 2004/03/24 06:35:53 fair Exp $
.\"
.\" Copyright (c) 1991, 1993
.\" The Regents of the University of California. All rights reserved.
@ -32,7 +32,7 @@
.\"
.\" @(#)tr.1 8.1 (Berkeley) 6/6/93
.\"
.Dd June 6, 1993
.Dd March 23, 2004
.Dt TR 1
.Os
.Sh NAME
@ -65,7 +65,12 @@ The following options are available:
.It Fl c
Complements the set of characters in
.Ar string1 ,
that is ``-c ab'' includes every character except for ``a'' and ``b''.
that is
.Qq \&-c \&ab
includes every character except for
.Qq \&a
and
.Qq \&b .
.It Fl d
The
.Fl d
@ -184,10 +189,16 @@ Class names are:
\." and vice-versa) is specified in the same relative position in
\." .Ar string1 .
\." .Pp
With the exception of the ``upper'' and ``lower'' classes, characters
in the classes are in unspecified order.
In the ``upper'' and ``lower'' classes, characters are entered in
ascending order.
With the exception of the
.Qq upper
and
.Qq lower
classes, characters in the classes are in unspecified order.
In the
.Qq upper
and
.Qq lower
classes, characters are entered in ascending order.
.Pp
For specific information as to which ASCII characters are included
in these classes, see
@ -197,11 +208,14 @@ and related manual pages.
Represents all characters or collating (sorting) elements belonging to
the same equivalence class as
.Ar equiv .
If
there is a secondary ordering within the equivalence class, the characters
are ordered in ascending sequence.
If there is a secondary ordering within the equivalence class, the
characters are ordered in ascending sequence.
Otherwise, they are ordered after their encoded values.
An example of an equivalence class might be ``c'' and ``ch'' in Spanish;
An example of an equivalence class might be
.Qq \&c
and
.Qq \&ch
in Spanish;
English has no equivalence classes.
.It [#*n]
Represents
@ -228,38 +242,67 @@ exits 0 on success, and \*[Gt]0 if an error occurs.
.Sh EXAMPLES
The following examples are shown as given to the shell:
.sp
Create a list of the words in file1, one per line, where a word is taken to
be a maximal string of letters.
Create a list of the words in
.Ar file1 ,
one per line, where a word is taken to be a maximal string of letters:
.sp
.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q \*[Lt] file1"
.sp
Translate the contents of file1 to upper-case.
Translate the contents of
.Ar file1
to upper-case:
.sp
.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q \*[Lt] file1"
.sp
Strip out non-printable characters from file1.
Strip out non-printable characters from
.Ar file1 :
.sp
.D1 Li "tr -cd \*q[:print:]\*q \*[Lt] file1"
.Sh COMPATIBILITY
System V has historically implemented character ranges using the syntax
``[c-c]'' instead of the ``c-c'' used by historic
.At V
has historically implemented character ranges using the syntax
.Qq [c-c]
instead of the
.Qq c-c
used by historic
.Bx
implementations and
standardized by POSIX.
implementations and standardized by POSIX.
.At V
shell scripts should work under this implementation as long as
the range is intended to map in another range, i.e. the command
``tr [a-z] [A-Z]'' will work as it will map the ``['' character in
.Pp
.Ic "tr [a-z] [A-Z]"
.Pp
will work as it will map the
.Qq \&[
character in
.Ar string1
to the ``['' character in
to the
.Qq \&[
character in
.Ar string2 .
However, if the shell script is deleting or squeezing characters as in
the command ``tr -d [a-z]'', the characters ``['' and ``]'' will be
included in the deletion or compression list which would not have happened
under an historic System V implementation.
Additionally, any scripts that depended on the sequence ``a-z'' to
represent the three characters ``a'', ``-'' and ``z'' will have to be
rewritten as ``a\e-z''.
the command
.Pp
.Ic "tr -d [a-z]"
.Pp
the characters
.Qq \&[
and
.Qq \&]
will be included in the deletion or compression list which would
not have happened under an historic
.At V
implementation.
Additionally, any scripts that depended on the sequence
.Qq a-z
to represent the three characters
.Qq \&a ,
.Qq \&- ,
and
.Qq \&z
will have to be rewritten as
.Qq a\e-z .
.Pp
The
.Nm
@ -290,4 +333,19 @@ has less characters than
.Ar string1
is permitted by POSIX but is not required.
Shell scripts attempting to be portable to other POSIX systems should use
the ``[#*]'' convention instead of relying on this behavior.
the
.Qq [#*]
convention instead of relying on this behavior.
.Sh BUGS
.Nm
was originally designed to work with
.Tn US-ASCII .
Its use with character sets that do not share all the properties of
.Tn US-ASCII ,
e.g.
a symmetric set of upper and lower case characters
that can be algorithmically converted one to the other,
may yield unpredictable results.
.Pp
.Nm
should be internationalized.