354 lines
8.8 KiB
Groff
354 lines
8.8 KiB
Groff
.\" $NetBSD: tr.1,v 1.23 2019/05/29 11:27:34 gutteridge Exp $
|
|
.\"
|
|
.\" Copyright (c) 1991, 1993
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" This code is derived from software contributed to Berkeley by
|
|
.\" the Institute of Electrical and Electronics Engineers, Inc.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\" 3. Neither the name of the University nor the names of its contributors
|
|
.\" may be used to endorse or promote products derived from this software
|
|
.\" without specific prior written permission.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" @(#)tr.1 8.1 (Berkeley) 6/6/93
|
|
.\"
|
|
.Dd May 29, 2019
|
|
.Dt TR 1
|
|
.Os
|
|
.Sh NAME
|
|
.Nm tr
|
|
.Nd translate characters
|
|
.Sh SYNOPSIS
|
|
.Nm
|
|
.Op Fl cs
|
|
.Ar string1 string2
|
|
.Nm
|
|
.Op Fl c
|
|
.Fl d
|
|
.Ar string1
|
|
.Nm
|
|
.Op Fl c
|
|
.Fl s
|
|
.Ar string1
|
|
.Nm
|
|
.Op Fl c
|
|
.Fl ds
|
|
.Ar string1 string2
|
|
.Sh DESCRIPTION
|
|
The
|
|
.Nm
|
|
utility copies the standard input to the standard output with substitution
|
|
or deletion of selected characters.
|
|
.Pp
|
|
The following options are available:
|
|
.Bl -tag -width Ds
|
|
.It Fl c
|
|
Complements the set of characters in
|
|
.Ar string1 ;
|
|
that is,
|
|
.Fl c Ar \&ab
|
|
includes every character except for
|
|
.Sq a
|
|
and
|
|
.Sq b .
|
|
.It Fl d
|
|
The
|
|
.Fl d
|
|
option causes characters to be deleted from the input.
|
|
.It Fl s
|
|
The
|
|
.Fl s
|
|
option squeezes multiple occurrences of the characters listed in the last
|
|
operand (either
|
|
.Ar string1
|
|
or
|
|
.Ar string2 )
|
|
in the input into a single instance of the character.
|
|
This occurs after all deletion and translation is completed.
|
|
.El
|
|
.Pp
|
|
In the first synopsis form, the characters in
|
|
.Ar string1
|
|
are translated into the characters in
|
|
.Ar string2 ,
|
|
where the first character in
|
|
.Ar string1
|
|
is translated into the first character in
|
|
.Ar string2 ,
|
|
and so on.
|
|
If
|
|
.Ar string1
|
|
is longer than
|
|
.Ar string2 ,
|
|
the last character found in
|
|
.Ar string2
|
|
is duplicated until
|
|
.Ar string1
|
|
is exhausted.
|
|
.Pp
|
|
In the second synopsis form, the characters in
|
|
.Ar string1
|
|
are deleted from the input.
|
|
.Pp
|
|
In the third synopsis form, the characters in
|
|
.Ar string1
|
|
are compressed as described for the
|
|
.Fl s
|
|
option.
|
|
.Pp
|
|
In the fourth synopsis form, the characters in
|
|
.Ar string1
|
|
are deleted from the input, and the characters in
|
|
.Ar string2
|
|
are compressed as described for the
|
|
.Fl s
|
|
option.
|
|
.Pp
|
|
The following conventions can be used in
|
|
.Ar string1
|
|
and
|
|
.Ar string2
|
|
to specify sets of characters:
|
|
.Bl -tag -width [:equiv:]
|
|
.It character
|
|
Any character not described by one of the following conventions
|
|
represents itself.
|
|
.It \eoctal
|
|
A backslash followed by 1, 2, or 3 octal digits represents a character
|
|
with that encoded value.
|
|
To follow an octal sequence with a digit as a character, left zero-pad
|
|
the octal sequence to the full 3 octal digits.
|
|
.It \echaracter
|
|
A backslash followed by certain special characters maps to special
|
|
values.
|
|
.sp
|
|
.Bl -column cc
|
|
.It \ea <alert character>
|
|
.It \eb <backspace>
|
|
.It \ef <form-feed>
|
|
.It \en <newline>
|
|
.It \er <carriage return>
|
|
.It \et <tab>
|
|
.It \ev <vertical tab>
|
|
.El
|
|
.sp
|
|
A backslash followed by any other character maps to that character.
|
|
.It c-c
|
|
Represents the range of characters between the range endpoints, inclusively.
|
|
.It [:class:]
|
|
Represents all characters belonging to the defined character class.
|
|
Class names are:
|
|
.sp
|
|
.Bl -column xdigit
|
|
.It alnum <alphanumeric characters>
|
|
.It alpha <alphabetic characters>
|
|
.It blank <blank characters>
|
|
.It cntrl <control characters>
|
|
.It digit <numeric characters>
|
|
.It graph <graphic characters>
|
|
.It lower <lower-case alphabetic characters>
|
|
.It print <printable characters>
|
|
.It punct <punctuation characters>
|
|
.It space <space characters>
|
|
.It upper <upper-case characters>
|
|
.It xdigit <hexadecimal characters>
|
|
.El
|
|
.Pp
|
|
.\" All classes may be used in
|
|
.\" .Ar string1 ,
|
|
.\" and in
|
|
.\" .Ar string2
|
|
.\" when both the
|
|
.\" .Fl d
|
|
.\" and
|
|
.\" .Fl s
|
|
.\" options are specified.
|
|
.\" Otherwise, only the classes ``upper'' and ``lower'' may be used in
|
|
.\" .Ar string2
|
|
.\" and then only when the corresponding class (``upper'' for ``lower''
|
|
.\" and vice-versa) is specified in the same relative position in
|
|
.\" .Ar string1 .
|
|
.\" .Pp
|
|
With the exception of the
|
|
.Dq upper
|
|
and
|
|
.Dq lower
|
|
classes, characters in the classes are in unspecified order.
|
|
In the
|
|
.Dq upper
|
|
and
|
|
.Dq lower
|
|
classes, characters are entered in ascending order.
|
|
.Pp
|
|
For specific information as to which ASCII characters are included
|
|
in these classes, see
|
|
.Xr ctype 3
|
|
and related manual pages.
|
|
.It [=equiv=]
|
|
Represents all characters or collating (sorting) elements belonging to
|
|
the same equivalence class as
|
|
.Ar equiv .
|
|
If there is a secondary ordering within the equivalence class, the
|
|
characters are ordered in ascending sequence.
|
|
Otherwise, they are ordered after their encoded values.
|
|
An example of an equivalence class might be
|
|
.Dq \&c
|
|
and
|
|
.Dq \&ch
|
|
in Spanish;
|
|
English has no equivalence classes.
|
|
.It [#*n]
|
|
Represents
|
|
.Ar n
|
|
repeated occurrences of the character represented by
|
|
.Ar # .
|
|
This
|
|
expression is only valid when it occurs in
|
|
.Ar string2 .
|
|
If
|
|
.Ar n
|
|
is omitted or is zero, it is interpreted as large enough to extend the
|
|
.Ar string2
|
|
sequence to the length of
|
|
.Ar string1 .
|
|
If
|
|
.Ar n
|
|
has a leading zero, it is interpreted as an octal value;
|
|
otherwise, it is interpreted as a decimal value.
|
|
.El
|
|
.Sh EXIT STATUS
|
|
.Ex -std
|
|
.Sh EXAMPLES
|
|
The following examples are shown as given to the shell:
|
|
.Pp
|
|
Create a list of the words in
|
|
.Ar file1 ,
|
|
one per line, where a word is taken to be a maximal string of letters:
|
|
.sp
|
|
.D1 Li "tr -cs \*q[:alpha:]\*q \*q\en\*q < file1"
|
|
.sp
|
|
Translate the contents of
|
|
.Ar file1
|
|
to upper-case:
|
|
.sp
|
|
.D1 Li "tr \*q[:lower:]\*q \*q[:upper:]\*q < file1"
|
|
.sp
|
|
Strip out non-printable characters from
|
|
.Ar file1 :
|
|
.sp
|
|
.D1 Li "tr -cd \*q[:print:]\*q < file1"
|
|
.Sh COMPATIBILITY
|
|
.At V
|
|
has historically implemented character ranges using the syntax
|
|
.Dq [c-c]
|
|
instead of the
|
|
.Dq c-c
|
|
used by historic
|
|
.Bx
|
|
implementations and standardized by POSIX.
|
|
.At V
|
|
shell scripts should work under this implementation as long as
|
|
the range is intended to map in another range, i.e. the command
|
|
.Pp
|
|
.Ic "tr [a-z] [A-Z]"
|
|
.Pp
|
|
will work as it will map the
|
|
.Sq \&[
|
|
character in
|
|
.Ar string1
|
|
to the
|
|
.Sq \&[
|
|
character in
|
|
.Ar string2 .
|
|
However, if the shell script is deleting or squeezing characters as in
|
|
the command
|
|
.Pp
|
|
.Ic "tr -d [a-z]"
|
|
.Pp
|
|
the characters
|
|
.Sq \&[
|
|
and
|
|
.Sq \&]
|
|
will be included in the deletion or compression list which would
|
|
not have happened under an historic
|
|
.At V
|
|
implementation.
|
|
Additionally, any scripts that depended on the sequence
|
|
.Dq a-z
|
|
to represent the three characters
|
|
.Sq \&a ,
|
|
.Sq \&- ,
|
|
and
|
|
.Sq \&z
|
|
will have to be rewritten as
|
|
.Dq a\e-z .
|
|
.Pp
|
|
The
|
|
.Nm
|
|
utility has historically not permitted the manipulation of NUL bytes in
|
|
its input and, additionally, stripped NULs from its input stream.
|
|
This implementation has removed this behavior as a bug.
|
|
.Pp
|
|
The
|
|
.Nm
|
|
utility has historically been extremely forgiving of syntax errors,
|
|
for example, the
|
|
.Fl c
|
|
and
|
|
.Fl s
|
|
options were ignored unless two strings were specified.
|
|
This implementation will not permit illegal syntax.
|
|
.Sh SEE ALSO
|
|
.Xr dd 1 ,
|
|
.Xr sed 1 ,
|
|
.Xr ctype 3
|
|
.Sh STANDARDS
|
|
The
|
|
.Nm
|
|
utility is expected to be
|
|
.St -p1003.2
|
|
compatible.
|
|
It should be noted that the feature wherein the last character of
|
|
.Ar string2
|
|
is duplicated if
|
|
.Ar string2
|
|
has less characters than
|
|
.Ar string1
|
|
is permitted by POSIX but is not required.
|
|
Shell scripts attempting to be portable to other POSIX systems should use
|
|
the
|
|
.Dq [#*n]
|
|
convention instead of relying on this behavior.
|
|
.Sh BUGS
|
|
.Nm
|
|
was originally designed to work with
|
|
.Tn US-ASCII .
|
|
Its use with character sets that do not share all the properties of
|
|
.Tn US-ASCII ,
|
|
e.g., a symmetric set of upper and lower case characters
|
|
that can be algorithmically converted one to the other,
|
|
may yield unpredictable results.
|
|
.Pp
|
|
.Nm
|
|
should be internationalized.
|