Document that tr(1) was written for US-ASCII and may not work as

expected on other character sets which do not share ASCII's properties
(e.g. a symmetric set of capital and lower case characters), per PR 18738

Change all double quotes to nroff macros.
Change "System V" references to the .At macro.
This commit is contained in:
fair 2004-03-24 06:35:53 +00:00
parent afedcd8968
commit c223370599
1 changed files with 86 additions and 28 deletions

View File

@ -1,4 +1,4 @@
.\" $NetBSD: tr.1,v 1.13 2003/08/07 11:16:47 agc Exp $ .\" $NetBSD: tr.1,v 1.14 2004/03/24 06:35:53 fair Exp $
.\" .\"
.\" Copyright (c) 1991, 1993 .\" Copyright (c) 1991, 1993
.\" The Regents of the University of California. All rights reserved. .\" The Regents of the University of California. All rights reserved.
@ -32,7 +32,7 @@
.\" .\"
.\" @(#)tr.1 8.1 (Berkeley) 6/6/93 .\" @(#)tr.1 8.1 (Berkeley) 6/6/93
.\" .\"
.Dd June 6, 1993 .Dd March 23, 2004
.Dt TR 1 .Dt TR 1
.Os .Os
.Sh NAME .Sh NAME
@ -65,7 +65,12 @@ The following options are available:
.It Fl c .It Fl c
Complements the set of characters in Complements the set of characters in
.Ar string1 , .Ar string1 ,
that is ``-c ab'' includes every character except for ``a'' and ``b''. that is
.Qq \&-c \&ab
includes every character except for
.Qq \&a
and
.Qq \&b .
.It Fl d .It Fl d
The The
.Fl d .Fl d
@ -184,10 +189,16 @@ Class names are:
\." and vice-versa) is specified in the same relative position in \." and vice-versa) is specified in the same relative position in
\." .Ar string1 . \." .Ar string1 .
\." .Pp \." .Pp
With the exception of the ``upper'' and ``lower'' classes, characters With the exception of the
in the classes are in unspecified order. .Qq upper
In the ``upper'' and ``lower'' classes, characters are entered in and
ascending order. .Qq lower
classes, characters in the classes are in unspecified order.
In the
.Qq upper
and
.Qq lower
classes, characters are entered in ascending order.
.Pp .Pp
For specific information as to which ASCII characters are included For specific information as to which ASCII characters are included
in these classes, see in these classes, see
@ -197,11 +208,14 @@ and related manual pages.
Represents all characters or collating (sorting) elements belonging to Represents all characters or collating (sorting) elements belonging to
the same equivalence class as the same equivalence class as
.Ar equiv . .Ar equiv .
If If there is a secondary ordering within the equivalence class, the
there is a secondary ordering within the equivalence class, the characters characters are ordered in ascending sequence.
are ordered in ascending sequence.
Otherwise, they are ordered after their encoded values. Otherwise, they are ordered after their encoded values.
An example of an equivalence class might be ``c'' and ``ch'' in Spanish; An example of an equivalence class might be
.Qq \&c
and
.Qq \&ch
in Spanish;
English has no equivalence classes. English has no equivalence classes.
.It [#*n] .It [#*n]
Represents Represents
@ -228,38 +242,67 @@ exits 0 on success, and \*[Gt]0 if an error occurs.
.Sh EXAMPLES .Sh EXAMPLES
The following examples are shown as given to the shell: The following examples are shown as given to the shell:
.sp .sp
Create a list of the words in file1, one per line, where a word is taken to Create a list of the words in
be a maximal string of letters. .Ar file1 ,
one per line, where a word is taken to be a maximal string of letters:
.sp .sp
.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q \*[Lt] file1" .D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q \*[Lt] file1"
.sp .sp
Translate the contents of file1 to upper-case. Translate the contents of
.Ar file1
to upper-case:
.sp .sp
.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q \*[Lt] file1" .D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q \*[Lt] file1"
.sp .sp
Strip out non-printable characters from file1. Strip out non-printable characters from
.Ar file1 :
.sp .sp
.D1 Li "tr -cd \*q[:print:]\*q \*[Lt] file1" .D1 Li "tr -cd \*q[:print:]\*q \*[Lt] file1"
.Sh COMPATIBILITY .Sh COMPATIBILITY
System V has historically implemented character ranges using the syntax .At V
``[c-c]'' instead of the ``c-c'' used by historic has historically implemented character ranges using the syntax
.Qq [c-c]
instead of the
.Qq c-c
used by historic
.Bx .Bx
implementations and implementations and standardized by POSIX.
standardized by POSIX.
.At V .At V
shell scripts should work under this implementation as long as shell scripts should work under this implementation as long as
the range is intended to map in another range, i.e. the command the range is intended to map in another range, i.e. the command
``tr [a-z] [A-Z]'' will work as it will map the ``['' character in .Pp
.Ic "tr [a-z] [A-Z]"
.Pp
will work as it will map the
.Qq \&[
character in
.Ar string1 .Ar string1
to the ``['' character in to the
.Qq \&[
character in
.Ar string2 . .Ar string2 .
However, if the shell script is deleting or squeezing characters as in However, if the shell script is deleting or squeezing characters as in
the command ``tr -d [a-z]'', the characters ``['' and ``]'' will be the command
included in the deletion or compression list which would not have happened .Pp
under an historic System V implementation. .Ic "tr -d [a-z]"
Additionally, any scripts that depended on the sequence ``a-z'' to .Pp
represent the three characters ``a'', ``-'' and ``z'' will have to be the characters
rewritten as ``a\e-z''. .Qq \&[
and
.Qq \&]
will be included in the deletion or compression list which would
not have happened under an historic
.At V
implementation.
Additionally, any scripts that depended on the sequence
.Qq a-z
to represent the three characters
.Qq \&a ,
.Qq \&- ,
and
.Qq \&z
will have to be rewritten as
.Qq a\e-z .
.Pp .Pp
The The
.Nm .Nm
@ -290,4 +333,19 @@ has less characters than
.Ar string1 .Ar string1
is permitted by POSIX but is not required. is permitted by POSIX but is not required.
Shell scripts attempting to be portable to other POSIX systems should use Shell scripts attempting to be portable to other POSIX systems should use
the ``[#*]'' convention instead of relying on this behavior. the
.Qq [#*]
convention instead of relying on this behavior.
.Sh BUGS
.Nm
was originally designed to work with
.Tn US-ASCII .
Its use with character sets that do not share all the properties of
.Tn US-ASCII ,
e.g.
a symmetric set of upper and lower case characters
that can be algorithmically converted one to the other,
may yield unpredictable results.
.Pp
.Nm
should be internationalized.