2cce313a8b
headers, and a few other insignificant changes.
1875 lines
40 KiB
Groff
1875 lines
40 KiB
Groff
.\" $Id: awk.1,v 1.2 1993/08/02 17:29:23 mycroft Exp $ -*- nroff -*-
|
|
.ds PX \s-1POSIX\s+1
|
|
.ds UX \s-1UNIX\s+1
|
|
.ds AN \s-1ANSI\s+1
|
|
.TH GAWK 1 "Apr 15 1993" "Free Software Foundation" "Utility Commands"
|
|
.SH NAME
|
|
gawk \- pattern scanning and processing language
|
|
.SH SYNOPSIS
|
|
.B gawk
|
|
[ POSIX or GNU style options ]
|
|
.B \-f
|
|
.I program-file
|
|
[
|
|
.B \-\^\-
|
|
] file .\^.\^.
|
|
.br
|
|
.B gawk
|
|
[ POSIX or GNU style options ]
|
|
[
|
|
.B \-\^\-
|
|
]
|
|
.I program-text
|
|
file .\^.\^.
|
|
.SH DESCRIPTION
|
|
.I Gawk
|
|
is the GNU Project's implementation of the AWK programming language.
|
|
It conforms to the definition of the language in
|
|
the \*(PX 1003.2 Command Language And Utilities Standard.
|
|
This version in turn is based on the description in
|
|
.IR "The AWK Programming Language" ,
|
|
by Aho, Kernighan, and Weinberger,
|
|
with the additional features defined in the System V Release 4 version
|
|
of \*(UX
|
|
.IR awk .
|
|
.I Gawk
|
|
also provides some GNU-specific extensions.
|
|
.PP
|
|
The command line consists of options to
|
|
.I gawk
|
|
itself, the AWK program text (if not supplied via the
|
|
.B \-f
|
|
or
|
|
.B \-\^\-file
|
|
options), and values to be made
|
|
available in the
|
|
.B ARGC
|
|
and
|
|
.B ARGV
|
|
pre-defined AWK variables.
|
|
.SH OPTIONS
|
|
.PP
|
|
.I Gawk
|
|
options may be either the traditional \*(PX one letter options,
|
|
or the GNU style long options. \*(PX style options start with a single ``\-'',
|
|
while GNU long options start with ``\-\^\-''.
|
|
GNU style long options are provided for both GNU-specific features and
|
|
for \*(PX mandated features. Other implementations of the AWK language
|
|
are likely to only accept the traditional one letter options.
|
|
.PP
|
|
Following the \*(PX standard,
|
|
.IR gawk -specific
|
|
options are supplied via arguments to the
|
|
.B \-W
|
|
option. Multiple
|
|
.B \-W
|
|
options may be supplied, or multiple arguments may be supplied together
|
|
if they are separated by commas, or enclosed in quotes and separated
|
|
by white space.
|
|
Case is ignored in arguments to the
|
|
.B \-W
|
|
option.
|
|
Each
|
|
.B \-W
|
|
option has a corresponding GNU style long option, as detailed below.
|
|
.PP
|
|
.I Gawk
|
|
accepts the following options.
|
|
.TP
|
|
.PD 0
|
|
.BI \-F " fs"
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-field-separator= fs
|
|
Use
|
|
.I fs
|
|
for the input field separator (the value of the
|
|
.B FS
|
|
predefined
|
|
variable).
|
|
.TP
|
|
.PD 0
|
|
\fB\-v\fI var\fB\^=\^\fIval\fR
|
|
.TP
|
|
.PD
|
|
\fB\-\^\-assign=\fIvar\fB\^=\^\fIval\fR
|
|
Assign the value
|
|
.IR val ,
|
|
to the variable
|
|
.IR var ,
|
|
before execution of the program begins.
|
|
Such variable values are available to the
|
|
.B BEGIN
|
|
block of an AWK program.
|
|
.TP
|
|
.PD 0
|
|
.BI \-f " program-file"
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-file= program-file
|
|
Read the AWK program source from the file
|
|
.IR program-file ,
|
|
instead of from the first command line argument.
|
|
Multiple
|
|
.B \-f
|
|
(or
|
|
.BR \-\^\-file )
|
|
options may be used.
|
|
.TP \w'\fB\-\^\-copyright\fR'u+1n
|
|
.PD 0
|
|
.B "\-W compat"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-compat
|
|
Run in
|
|
.I compatibility
|
|
mode. In compatibility mode,
|
|
.I gawk
|
|
behaves identically to \*(UX
|
|
.IR awk ;
|
|
none of the GNU-specific extensions are recognized.
|
|
See
|
|
.BR "GNU EXTENSIONS" ,
|
|
below, for more information.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W copyleft"
|
|
.TP
|
|
.PD 0
|
|
.B "\-W copyright"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-copyleft
|
|
.TP
|
|
.PD
|
|
.B \-\^\-copyright
|
|
Print the short version of the GNU copyright information message on
|
|
the error output.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W help"
|
|
.TP
|
|
.PD 0
|
|
.B "\-W usage"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-help
|
|
.TP
|
|
.PD
|
|
.B \-\^\-usage
|
|
Print a relatively short summary of the available options on
|
|
the error output.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W lint"
|
|
.TP
|
|
.PD 0
|
|
.B \-\^\-lint
|
|
Provide warnings about constructs that are
|
|
dubious or non-portable to other AWK implementations.
|
|
.ig
|
|
.\" This option is left undocumented, on purpose.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W nostalgia"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-nostalgia
|
|
Provide a moment of nostalgia for long time
|
|
.I awk
|
|
users.
|
|
..
|
|
.TP
|
|
.PD 0
|
|
.B "\-W posix"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-posix
|
|
This turns on
|
|
.I compatibility
|
|
mode, with the following additional restrictions:
|
|
.RS
|
|
.TP \w'\(bu'u+1n
|
|
\(bu
|
|
.B \ex
|
|
escape sequences are not recognized.
|
|
.TP
|
|
\(bu
|
|
The synonym
|
|
.B func
|
|
for the keyword
|
|
.B function
|
|
is not recognized.
|
|
.TP
|
|
\(bu
|
|
The operators
|
|
.B **
|
|
and
|
|
.B **=
|
|
cannot be used in place of
|
|
.B ^
|
|
and
|
|
.BR ^= .
|
|
.RE
|
|
.TP
|
|
.PD 0
|
|
.BI "\-W source=" program-text
|
|
.TP
|
|
.PD
|
|
.BI \-\^\-source= program-text
|
|
Use
|
|
.I program-text
|
|
as AWK program source code.
|
|
This option allows the easy intermixing of library functions (used via the
|
|
.B \-f
|
|
and
|
|
.B \-\^\-file
|
|
options) with source code entered on the command line.
|
|
It is intended primarily for medium to large size AWK programs used
|
|
in shell scripts.
|
|
.sp .5
|
|
The
|
|
.B "\-W source="
|
|
form of this option uses the rest of the command line argument for
|
|
.IR program-text ;
|
|
no other options to
|
|
.B \-W
|
|
will be recognized in the same argument.
|
|
.TP
|
|
.PD 0
|
|
.B "\-W version"
|
|
.TP
|
|
.PD
|
|
.B \-\^\-version
|
|
Print version information for this particular copy of
|
|
.I gawk
|
|
on the error output.
|
|
This is useful mainly for knowing if the current copy of
|
|
.I gawk
|
|
on your system
|
|
is up to date with respect to whatever the Free Software Foundation
|
|
is distributing.
|
|
.TP
|
|
.B \-\^\-
|
|
Signal the end of options. This is useful to allow further arguments to the
|
|
AWK program itself to start with a ``\-''.
|
|
This is mainly for consistency with the argument parsing convention used
|
|
by most other \*(PX programs.
|
|
.PP
|
|
Any other options are flagged as illegal, but are otherwise ignored.
|
|
.SH AWK PROGRAM EXECUTION
|
|
.PP
|
|
An AWK program consists of a sequence of pattern-action statements
|
|
and optional function definitions.
|
|
.RS
|
|
.PP
|
|
\fIpattern\fB { \fIaction statements\fB }\fR
|
|
.br
|
|
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements\fB }\fR
|
|
.RE
|
|
.PP
|
|
.I Gawk
|
|
first reads the program source from the
|
|
.IR program-file (s)
|
|
if specified, or from the first non-option argument on the command line.
|
|
The
|
|
.B \-f
|
|
option may be used multiple times on the command line.
|
|
.I Gawk
|
|
will read the program text as if all the
|
|
.IR program-file s
|
|
had been concatenated together. This is useful for building libraries
|
|
of AWK functions, without having to include them in each new AWK
|
|
program that uses them. To use a library function in a file from a
|
|
program typed in on the command line, specify
|
|
.B /dev/tty
|
|
as one of the
|
|
.IR program-file s,
|
|
type your program, and end it with a
|
|
.B ^D
|
|
(control-d).
|
|
.PP
|
|
The environment variable
|
|
.B AWKPATH
|
|
specifies a search path to use when finding source files named with
|
|
the
|
|
.B \-f
|
|
option. If this variable does not exist, the default path is
|
|
\fB".:/usr/lib/awk:/usr/local/lib/awk"\fR.
|
|
If a file name given to the
|
|
.B \-f
|
|
option contains a ``/'' character, no path search is performed.
|
|
.PP
|
|
.I Gawk
|
|
executes AWK programs in the following order.
|
|
First,
|
|
.I gawk
|
|
compiles the program into an internal form.
|
|
Next, all variable assignments specified via the
|
|
.B \-v
|
|
option are performed. Then,
|
|
.I gawk
|
|
executes the code in the
|
|
.B BEGIN
|
|
block(s) (if any),
|
|
and then proceeds to read
|
|
each file named in the
|
|
.B ARGV
|
|
array.
|
|
If there are no files named on the command line,
|
|
.I gawk
|
|
reads the standard input.
|
|
.PP
|
|
If a filename on the command line has the form
|
|
.IB var = val
|
|
it is treated as a variable assignment. The variable
|
|
.I var
|
|
will be assigned the value
|
|
.IR val .
|
|
(This happens after any
|
|
.B BEGIN
|
|
block(s) have been run.)
|
|
Command line variable assignment
|
|
is most useful for dynamically assigning values to the variables
|
|
AWK uses to control how input is broken into fields and records. It
|
|
is also useful for controlling state if multiple passes are needed over
|
|
a single data file.
|
|
.PP
|
|
If the value of a particular element of
|
|
.B ARGV
|
|
is empty (\fB""\fR),
|
|
.I gawk
|
|
skips over it.
|
|
.PP
|
|
For each line in the input,
|
|
.I gawk
|
|
tests to see if it matches any
|
|
.I pattern
|
|
in the AWK program.
|
|
For each pattern that the line matches, the associated
|
|
.I action
|
|
is executed.
|
|
The patterns are tested in the order they occur in the program.
|
|
.PP
|
|
Finally, after all the input is exhausted,
|
|
.I gawk
|
|
executes the code in the
|
|
.B END
|
|
block(s) (if any).
|
|
.SH VARIABLES AND FIELDS
|
|
AWK variables are dynamic; they come into existence when they are
|
|
first used. Their values are either floating-point numbers or strings,
|
|
or both,
|
|
depending upon how they are used. AWK also has one dimension
|
|
arrays; multiply dimensioned arrays may be simulated.
|
|
Several pre-defined variables are set as a program
|
|
runs; these will be described as needed and summarized below.
|
|
.SS Fields
|
|
.PP
|
|
As each input line is read,
|
|
.I gawk
|
|
splits the line into
|
|
.IR fields ,
|
|
using the value of the
|
|
.B FS
|
|
variable as the field separator.
|
|
If
|
|
.B FS
|
|
is a single character, fields are separated by that character.
|
|
Otherwise,
|
|
.B FS
|
|
is expected to be a full regular expression.
|
|
In the special case that
|
|
.B FS
|
|
is a single blank, fields are separated
|
|
by runs of blanks and/or tabs.
|
|
Note that the value of
|
|
.B IGNORECASE
|
|
(see below) will also affect how fields are split when
|
|
.B FS
|
|
is a regular expression.
|
|
.PP
|
|
If the
|
|
.B FIELDWIDTHS
|
|
variable is set to a space separated list of numbers, each field is
|
|
expected to have fixed width, and
|
|
.I gawk
|
|
will split up the record using the specified widths. The value of
|
|
.B FS
|
|
is ignored.
|
|
Assigning a new value to
|
|
.B FS
|
|
overrides the use of
|
|
.BR FIELDWIDTHS ,
|
|
and restores the default behavior.
|
|
.PP
|
|
Each field in the input line may be referenced by its position,
|
|
.BR $1 ,
|
|
.BR $2 ,
|
|
and so on.
|
|
.B $0
|
|
is the whole line. The value of a field may be assigned to as well.
|
|
Fields need not be referenced by constants:
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
n = 5
|
|
.br
|
|
print $n
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
prints the fifth field in the input line.
|
|
The variable
|
|
.B NF
|
|
is set to the total number of fields in the input line.
|
|
.PP
|
|
References to non-existent fields (i.e. fields after
|
|
.BR $NF )
|
|
produce the null-string. However, assigning to a non-existent field
|
|
(e.g.,
|
|
.BR "$(NF+2) = 5" )
|
|
will increase the value of
|
|
.BR NF ,
|
|
create any intervening fields with the null string as their value, and
|
|
cause the value of
|
|
.B $0
|
|
to be recomputed, with the fields being separated by the value of
|
|
.BR OFS .
|
|
.SS Built-in Variables
|
|
.PP
|
|
AWK's built-in variables are:
|
|
.PP
|
|
.TP \w'\fBFIELDWIDTHS\fR'u+1n
|
|
.B ARGC
|
|
The number of command line arguments (does not include options to
|
|
.IR gawk ,
|
|
or the program source).
|
|
.TP
|
|
.B ARGIND
|
|
The index in
|
|
.B ARGV
|
|
of the current file being processed.
|
|
.TP
|
|
.B ARGV
|
|
Array of command line arguments. The array is indexed from
|
|
0 to
|
|
.B ARGC
|
|
\- 1.
|
|
Dynamically changing the contents of
|
|
.B ARGV
|
|
can control the files used for data.
|
|
.TP
|
|
.B CONVFMT
|
|
The conversion format for numbers, \fB"%.6g"\fR, by default.
|
|
.TP
|
|
.B ENVIRON
|
|
An array containing the values of the current environment.
|
|
The array is indexed by the environment variables, each element being
|
|
the value of that variable (e.g., \fBENVIRON["HOME"]\fP might be
|
|
.BR /u/arnold ).
|
|
Changing this array does not affect the environment seen by programs which
|
|
.I gawk
|
|
spawns via redirection or the
|
|
.B system()
|
|
function.
|
|
(This may change in a future version of
|
|
.IR gawk .)
|
|
.\" but don't hold your breath...
|
|
.TP
|
|
.B ERRNO
|
|
If a system error occurs either doing a redirection for
|
|
.BR getline ,
|
|
during a read for
|
|
.BR getline ,
|
|
or during a
|
|
.BR close ,
|
|
then
|
|
.B ERRNO
|
|
will contain
|
|
a string describing the error.
|
|
.TP
|
|
.B FIELDWIDTHS
|
|
A white-space separated list of fieldwidths. When set,
|
|
.I gawk
|
|
parses the input into fields of fixed width, instead of using the
|
|
value of the
|
|
.B FS
|
|
variable as the field separator.
|
|
The fixed field width facility is still experimental; expect the
|
|
semantics to change as
|
|
.I gawk
|
|
evolves over time.
|
|
.TP
|
|
.B FILENAME
|
|
The name of the current input file.
|
|
If no files are specified on the command line, the value of
|
|
.B FILENAME
|
|
is ``\-''.
|
|
.TP
|
|
.B FNR
|
|
The input record number in the current input file.
|
|
.TP
|
|
.B FS
|
|
The input field separator, a blank by default.
|
|
.TP
|
|
.B IGNORECASE
|
|
Controls the case-sensitivity of all regular expression operations. If
|
|
.B IGNORECASE
|
|
has a non-zero value, then pattern matching in rules,
|
|
field splitting with
|
|
.BR FS ,
|
|
regular expression
|
|
matching with
|
|
.B ~
|
|
and
|
|
.BR !~ ,
|
|
and the
|
|
.BR gsub() ,
|
|
.BR index() ,
|
|
.BR match() ,
|
|
.BR split() ,
|
|
and
|
|
.B sub()
|
|
pre-defined functions will all ignore case when doing regular expression
|
|
operations. Thus, if
|
|
.B IGNORECASE
|
|
is not equal to zero,
|
|
.B /aB/
|
|
matches all of the strings \fB"ab"\fP, \fB"aB"\fP, \fB"Ab"\fP,
|
|
and \fB"AB"\fP.
|
|
As with all AWK variables, the initial value of
|
|
.B IGNORECASE
|
|
is zero, so all regular expression operations are normally case-sensitive.
|
|
.TP
|
|
.B NF
|
|
The number of fields in the current input record.
|
|
.TP
|
|
.B NR
|
|
The total number of input records seen so far.
|
|
.TP
|
|
.B OFMT
|
|
The output format for numbers, \fB"%.6g"\fR, by default.
|
|
.TP
|
|
.B OFS
|
|
The output field separator, a blank by default.
|
|
.TP
|
|
.B ORS
|
|
The output record separator, by default a newline.
|
|
.TP
|
|
.B RS
|
|
The input record separator, by default a newline.
|
|
.B RS
|
|
is exceptional in that only the first character of its string
|
|
value is used for separating records.
|
|
(This will probably change in a future release of
|
|
.IR gawk .)
|
|
If
|
|
.B RS
|
|
is set to the null string, then records are separated by
|
|
blank lines.
|
|
When
|
|
.B RS
|
|
is set to the null string, then the newline character always acts as
|
|
a field separator, in addition to whatever value
|
|
.B FS
|
|
may have.
|
|
.TP
|
|
.B RSTART
|
|
The index of the first character matched by
|
|
.BR match() ;
|
|
0 if no match.
|
|
.TP
|
|
.B RLENGTH
|
|
The length of the string matched by
|
|
.BR match() ;
|
|
\-1 if no match.
|
|
.TP
|
|
.B SUBSEP
|
|
The character used to separate multiple subscripts in array
|
|
elements, by default \fB"\e034"\fR.
|
|
.SS Arrays
|
|
.PP
|
|
Arrays are subscripted with an expression between square brackets
|
|
.RB ( [ " and " ] ).
|
|
If the expression is an expression list
|
|
.RI ( expr ", " expr " ...)"
|
|
then the array subscript is a string consisting of the
|
|
concatenation of the (string) value of each expression,
|
|
separated by the value of the
|
|
.B SUBSEP
|
|
variable.
|
|
This facility is used to simulate multiply dimensioned
|
|
arrays. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
i = "A" ;\^ j = "B" ;\^ k = "C"
|
|
.br
|
|
x[i, j, k] = "hello, world\en"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
assigns the string \fB"hello, world\en"\fR to the element of the array
|
|
.B x
|
|
which is indexed by the string \fB"A\e034B\e034C"\fR. All arrays in AWK
|
|
are associative, i.e. indexed by string values.
|
|
.PP
|
|
The special operator
|
|
.B in
|
|
may be used in an
|
|
.B if
|
|
or
|
|
.B while
|
|
statement to see if an array has an index consisting of a particular
|
|
value.
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
if (val in array)
|
|
print array[val]
|
|
.fi
|
|
.ft
|
|
.RE
|
|
.PP
|
|
If the array has multiple subscripts, use
|
|
.BR "(i, j) in array" .
|
|
.PP
|
|
The
|
|
.B in
|
|
construct may also be used in a
|
|
.B for
|
|
loop to iterate over all the elements of an array.
|
|
.PP
|
|
An element may be deleted from an array using the
|
|
.B delete
|
|
statement.
|
|
.SS Variable Typing And Conversion
|
|
.PP
|
|
Variables and fields
|
|
may be (floating point) numbers, or strings, or both. How the
|
|
value of a variable is interpreted depends upon its context. If used in
|
|
a numeric expression, it will be treated as a number, if used as a string
|
|
it will be treated as a string.
|
|
.PP
|
|
To force a variable to be treated as a number, add 0 to it; to force it
|
|
to be treated as a string, concatenate it with the null string.
|
|
.PP
|
|
When a string must be converted to a number, the conversion is accomplished
|
|
using
|
|
.IR atof (3).
|
|
A number is converted to a string by using the value of
|
|
.B CONVFMT
|
|
as a format string for
|
|
.IR sprintf (3),
|
|
with the numeric value of the variable as the argument.
|
|
However, even though all numbers in AWK are floating-point,
|
|
integral values are
|
|
.I always
|
|
converted as integers. Thus, given
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
CONVFMT = "%2.2f"
|
|
a = 12
|
|
b = a ""
|
|
.fi
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
the variable
|
|
.B b
|
|
has a value of \fB"12"\fR and not \fB"12.00"\fR.
|
|
.PP
|
|
.I Gawk
|
|
performs comparisons as follows:
|
|
If two variables are numeric, they are compared numerically.
|
|
If one value is numeric and the other has a string value that is a
|
|
``numeric string,'' then comparisons are also done numerically.
|
|
Otherwise, the numeric value is converted to a string and a string
|
|
comparison is performed.
|
|
Two strings are compared, of course, as strings.
|
|
According to the \*(PX standard, even if two strings are
|
|
numeric strings, a numeric comparison is performed. However, this is
|
|
clearly incorrect, and
|
|
.I gawk
|
|
does not do this.
|
|
.PP
|
|
Uninitialized variables have the numeric value 0 and the string value ""
|
|
(the null, or empty, string).
|
|
.SH PATTERNS AND ACTIONS
|
|
AWK is a line oriented language. The pattern comes first, and then the
|
|
action. Action statements are enclosed in
|
|
.B {
|
|
and
|
|
.BR } .
|
|
Either the pattern may be missing, or the action may be missing, but,
|
|
of course, not both. If the pattern is missing, the action will be
|
|
executed for every single line of input.
|
|
A missing action is equivalent to
|
|
.RS
|
|
.PP
|
|
.B "{ print }"
|
|
.RE
|
|
.PP
|
|
which prints the entire line.
|
|
.PP
|
|
Comments begin with the ``#'' character, and continue until the
|
|
end of the line.
|
|
Blank lines may be used to separate statements.
|
|
Normally, a statement ends with a newline, however, this is not the
|
|
case for lines ending in
|
|
a ``,'', ``{'', ``?'', ``:'', ``&&'', or ``||''.
|
|
Lines ending in
|
|
.B do
|
|
or
|
|
.B else
|
|
also have their statements automatically continued on the following line.
|
|
In other cases, a line can be continued by ending it with a ``\e'',
|
|
in which case the newline will be ignored.
|
|
.PP
|
|
Multiple statements may
|
|
be put on one line by separating them with a ``;''.
|
|
This applies to both the statements within the action part of a
|
|
pattern-action pair (the usual case),
|
|
and to the pattern-action statements themselves.
|
|
.SS Patterns
|
|
AWK patterns may be one of the following:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
.B BEGIN
|
|
.B END
|
|
.BI / "regular expression" /
|
|
.I "relational expression"
|
|
.IB pattern " && " pattern
|
|
.IB pattern " || " pattern
|
|
.IB pattern " ? " pattern " : " pattern
|
|
.BI ( pattern )
|
|
.BI ! " pattern"
|
|
.IB pattern1 ", " pattern2
|
|
.fi
|
|
.RE
|
|
.PP
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
are two special kinds of patterns which are not tested against
|
|
the input.
|
|
The action parts of all
|
|
.B BEGIN
|
|
patterns are merged as if all the statements had
|
|
been written in a single
|
|
.B BEGIN
|
|
block. They are executed before any
|
|
of the input is read. Similarly, all the
|
|
.B END
|
|
blocks are merged,
|
|
and executed when all the input is exhausted (or when an
|
|
.B exit
|
|
statement is executed).
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
patterns cannot be combined with other patterns in pattern expressions.
|
|
.B BEGIN
|
|
and
|
|
.B END
|
|
patterns cannot have missing action parts.
|
|
.PP
|
|
For
|
|
.BI / "regular expression" /
|
|
patterns, the associated statement is executed for each input line that matches
|
|
the regular expression.
|
|
Regular expressions are the same as those in
|
|
.IR egrep (1),
|
|
and are summarized below.
|
|
.PP
|
|
A
|
|
.I "relational expression"
|
|
may use any of the operators defined below in the section on actions.
|
|
These generally test whether certain fields match certain regular expressions.
|
|
.PP
|
|
The
|
|
.BR && ,
|
|
.BR || ,
|
|
and
|
|
.B !
|
|
operators are logical AND, logical OR, and logical NOT, respectively, as in C.
|
|
They do short-circuit evaluation, also as in C, and are used for combining
|
|
more primitive pattern expressions. As in most languages, parentheses
|
|
may be used to change the order of evaluation.
|
|
.PP
|
|
The
|
|
.B ?\^:
|
|
operator is like the same operator in C. If the first pattern is true
|
|
then the pattern used for testing is the second pattern, otherwise it is
|
|
the third. Only one of the second and third patterns is evaluated.
|
|
.PP
|
|
The
|
|
.IB pattern1 ", " pattern2
|
|
form of an expression is called a range pattern.
|
|
It matches all input records starting with a line that matches
|
|
.IR pattern1 ,
|
|
and continuing until a record that matches
|
|
.IR pattern2 ,
|
|
inclusive. It does not combine with any other sort of pattern expression.
|
|
.SS Regular Expressions
|
|
Regular expressions are the extended kind found in
|
|
.IR egrep .
|
|
They are composed of characters as follows:
|
|
.TP \w'\fB[^\fIabc...\fB]\fR'u+2n
|
|
.I c
|
|
matches the non-metacharacter
|
|
.IR c .
|
|
.TP
|
|
.I \ec
|
|
matches the literal character
|
|
.IR c .
|
|
.TP
|
|
.B .
|
|
matches any character except newline.
|
|
.TP
|
|
.B ^
|
|
matches the beginning of a line or a string.
|
|
.TP
|
|
.B $
|
|
matches the end of a line or a string.
|
|
.TP
|
|
.BI [ abc... ]
|
|
character class, matches any of the characters
|
|
.IR abc... .
|
|
.TP
|
|
.BI [^ abc... ]
|
|
negated character class, matches any character except
|
|
.I abc...
|
|
and newline.
|
|
.TP
|
|
.IB r1 | r2
|
|
alternation: matches either
|
|
.I r1
|
|
or
|
|
.IR r2 .
|
|
.TP
|
|
.I r1r2
|
|
concatenation: matches
|
|
.IR r1 ,
|
|
and then
|
|
.IR r2 .
|
|
.TP
|
|
.IB r +
|
|
matches one or more
|
|
.IR r 's.
|
|
.TP
|
|
.IB r *
|
|
matches zero or more
|
|
.IR r 's.
|
|
.TP
|
|
.IB r ?
|
|
matches zero or one
|
|
.IR r 's.
|
|
.TP
|
|
.BI ( r )
|
|
grouping: matches
|
|
.IR r .
|
|
.PP
|
|
The escape sequences that are valid in string constants (see below)
|
|
are also legal in regular expressions.
|
|
.SS Actions
|
|
Action statements are enclosed in braces,
|
|
.B {
|
|
and
|
|
.BR } .
|
|
Action statements consist of the usual assignment, conditional, and looping
|
|
statements found in most languages. The operators, control statements,
|
|
and input/output statements
|
|
available are patterned after those in C.
|
|
.SS Operators
|
|
.PP
|
|
The operators in AWK, in order of increasing precedence, are
|
|
.PP
|
|
.TP "\w'\fB*= /= %= ^=\fR'u+1n"
|
|
.PD 0
|
|
.B "= += \-="
|
|
.TP
|
|
.PD
|
|
.B "*= /= %= ^="
|
|
Assignment. Both absolute assignment
|
|
.BI ( var " = " value )
|
|
and operator-assignment (the other forms) are supported.
|
|
.TP
|
|
.B ?:
|
|
The C conditional expression. This has the form
|
|
.IB expr1 " ? " expr2 " : " expr3\c
|
|
\&. If
|
|
.I expr1
|
|
is true, the value of the expression is
|
|
.IR expr2 ,
|
|
otherwise it is
|
|
.IR expr3 .
|
|
Only one of
|
|
.I expr2
|
|
and
|
|
.I expr3
|
|
is evaluated.
|
|
.TP
|
|
.B ||
|
|
Logical OR.
|
|
.TP
|
|
.B &&
|
|
Logical AND.
|
|
.TP
|
|
.B "~ !~"
|
|
Regular expression match, negated match.
|
|
.B NOTE:
|
|
Do not use a constant regular expression
|
|
.RB ( /foo/ )
|
|
on the left-hand side of a
|
|
.B ~
|
|
or
|
|
.BR !~ .
|
|
Only use one on the right-hand side. The expression
|
|
.BI "/foo/ ~ " exp
|
|
has the same meaning as \fB(($0 ~ /foo/) ~ \fIexp\fB)\fR.
|
|
This is usually
|
|
.I not
|
|
what was intended.
|
|
.TP
|
|
.PD 0
|
|
.B "< >"
|
|
.TP
|
|
.PD 0
|
|
.B "<= >="
|
|
.TP
|
|
.PD
|
|
.B "!= =="
|
|
The regular relational operators.
|
|
.TP
|
|
.I blank
|
|
String concatenation.
|
|
.TP
|
|
.B "+ \-"
|
|
Addition and subtraction.
|
|
.TP
|
|
.B "* / %"
|
|
Multiplication, division, and modulus.
|
|
.TP
|
|
.B "+ \- !"
|
|
Unary plus, unary minus, and logical negation.
|
|
.TP
|
|
.B ^
|
|
Exponentiation (\fB**\fR may also be used, and \fB**=\fR for
|
|
the assignment operator).
|
|
.TP
|
|
.B "++ \-\^\-"
|
|
Increment and decrement, both prefix and postfix.
|
|
.TP
|
|
.B $
|
|
Field reference.
|
|
.SS Control Statements
|
|
.PP
|
|
The control statements are
|
|
as follows:
|
|
.PP
|
|
.RS
|
|
.nf
|
|
\fBif (\fIcondition\fB) \fIstatement\fR [ \fBelse\fI statement \fR]
|
|
\fBwhile (\fIcondition\fB) \fIstatement \fR
|
|
\fBdo \fIstatement \fBwhile (\fIcondition\fB)\fR
|
|
\fBfor (\fIexpr1\fB; \fIexpr2\fB; \fIexpr3\fB) \fIstatement\fR
|
|
\fBfor (\fIvar \fBin\fI array\fB) \fIstatement\fR
|
|
\fBbreak\fR
|
|
\fBcontinue\fR
|
|
\fBdelete \fIarray\^\fB[\^\fIindex\^\fB]\fR
|
|
\fBexit\fR [ \fIexpression\fR ]
|
|
\fB{ \fIstatements \fB}
|
|
.fi
|
|
.RE
|
|
.SS "I/O Statements"
|
|
.PP
|
|
The input/output statements are as follows:
|
|
.PP
|
|
.TP "\w'\fBprintf \fIfmt, expr-list\fR'u+1n"
|
|
.BI close( filename )
|
|
Close file (or pipe, see below).
|
|
.TP
|
|
.B getline
|
|
Set
|
|
.B $0
|
|
from next input record; set
|
|
.BR NF ,
|
|
.BR NR ,
|
|
.BR FNR .
|
|
.TP
|
|
.BI "getline <" file
|
|
Set
|
|
.B $0
|
|
from next record of
|
|
.IR file ;
|
|
set
|
|
.BR NF .
|
|
.TP
|
|
.BI getline " var"
|
|
Set
|
|
.I var
|
|
from next input record; set
|
|
.BR NF ,
|
|
.BR FNR .
|
|
.TP
|
|
.BI getline " var" " <" file
|
|
Set
|
|
.I var
|
|
from next record of
|
|
.IR file .
|
|
.TP
|
|
.B next
|
|
Stop processing the current input record. The next input record
|
|
is read and processing starts over with the first pattern in the
|
|
AWK program. If the end of the input data is reached, the
|
|
.B END
|
|
block(s), if any, are executed.
|
|
.TP
|
|
.B "next file"
|
|
Stop processing the current input file. The next input record read
|
|
comes from the next input file.
|
|
.B FILENAME
|
|
is updated,
|
|
.B FNR
|
|
is reset to 1, and processing starts over with the first pattern in the
|
|
AWK program. If the end of the input data is reached, the
|
|
.B END
|
|
block(s), if any, are executed.
|
|
.TP
|
|
.B print
|
|
Prints the current record.
|
|
.TP
|
|
.BI print " expr-list"
|
|
Prints expressions.
|
|
.TP
|
|
.BI print " expr-list" " >" file
|
|
Prints expressions on
|
|
.IR file .
|
|
.TP
|
|
.BI printf " fmt, expr-list"
|
|
Format and print.
|
|
.TP
|
|
.BI printf " fmt, expr-list" " >" file
|
|
Format and print on
|
|
.IR file .
|
|
.TP
|
|
.BI system( cmd-line )
|
|
Execute the command
|
|
.IR cmd-line ,
|
|
and return the exit status.
|
|
(This may not be available on non-\*(PX systems.)
|
|
.PP
|
|
Other input/output redirections are also allowed. For
|
|
.B print
|
|
and
|
|
.BR printf ,
|
|
.BI >> file
|
|
appends output to the
|
|
.IR file ,
|
|
while
|
|
.BI | " command"
|
|
writes on a pipe.
|
|
In a similar fashion,
|
|
.IB command " | getline"
|
|
pipes into
|
|
.BR getline .
|
|
.BR Getline
|
|
will return 0 on end of file, and \-1 on an error.
|
|
.SS The \fIprintf\fP\^ Statement
|
|
.PP
|
|
The AWK versions of the
|
|
.B printf
|
|
statement and
|
|
.B sprintf()
|
|
function
|
|
(see below)
|
|
accept the following conversion specification formats:
|
|
.TP
|
|
.B %c
|
|
An \s-1ASCII\s+1 character.
|
|
If the argument used for
|
|
.B %c
|
|
is numeric, it is treated as a character and printed.
|
|
Otherwise, the argument is assumed to be a string, and the only first
|
|
character of that string is printed.
|
|
.TP
|
|
.B %d
|
|
A decimal number (the integer part).
|
|
.TP
|
|
.B %i
|
|
Just like
|
|
.BR %d .
|
|
.TP
|
|
.B %e
|
|
A floating point number of the form
|
|
.BR [\-]d.ddddddE[+\^\-]dd .
|
|
.TP
|
|
.B %f
|
|
A floating point number of the form
|
|
.BR [\-]ddd.dddddd .
|
|
.TP
|
|
.B %g
|
|
Use
|
|
.B e
|
|
or
|
|
.B f
|
|
conversion, whichever is shorter, with nonsignificant zeros suppressed.
|
|
.TP
|
|
.B %o
|
|
An unsigned octal number (again, an integer).
|
|
.TP
|
|
.B %s
|
|
A character string.
|
|
.TP
|
|
.B %x
|
|
An unsigned hexadecimal number (an integer).
|
|
.TP
|
|
.B %X
|
|
Like
|
|
.BR %x ,
|
|
but using
|
|
.B ABCDEF
|
|
instead of
|
|
.BR abcdef .
|
|
.TP
|
|
.B %%
|
|
A single
|
|
.B %
|
|
character; no argument is converted.
|
|
.PP
|
|
There are optional, additional parameters that may lie between the
|
|
.B %
|
|
and the control letter:
|
|
.TP
|
|
.B \-
|
|
The expression should be left-justified within its field.
|
|
.TP
|
|
.I width
|
|
The field should be padded to this width. If the number has a leading
|
|
zero, then the field will be padded with zeros.
|
|
Otherwise it is padded with blanks.
|
|
.TP
|
|
.BI . prec
|
|
A number indicating the maximum width of strings or digits to the right
|
|
of the decimal point.
|
|
.PP
|
|
The dynamic
|
|
.I width
|
|
and
|
|
.I prec
|
|
capabilities of the \*(AN C
|
|
.B printf()
|
|
routines are supported.
|
|
A
|
|
.B *
|
|
in place of either the
|
|
.B width
|
|
or
|
|
.B prec
|
|
specifications will cause their values to be taken from
|
|
the argument list to
|
|
.B printf
|
|
or
|
|
.BR sprintf() .
|
|
.SS Special File Names
|
|
.PP
|
|
When doing I/O redirection from either
|
|
.B print
|
|
or
|
|
.B printf
|
|
into a file,
|
|
or via
|
|
.B getline
|
|
from a file,
|
|
.I gawk
|
|
recognizes certain special filenames internally. These filenames
|
|
allow access to open file descriptors inherited from
|
|
.IR gawk 's
|
|
parent process (usually the shell).
|
|
Other special filenames provide access information about the running
|
|
.B gawk
|
|
process.
|
|
The filenames are:
|
|
.TP \w'\fB/dev/stdout\fR'u+1n
|
|
.B /dev/pid
|
|
Reading this file returns the process ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/ppid
|
|
Reading this file returns the parent process ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/pgrpid
|
|
Reading this file returns the process group ID of the current process,
|
|
in decimal, terminated with a newline.
|
|
.TP
|
|
.B /dev/user
|
|
Reading this file returns a single record terminated with a newline.
|
|
The fields are separated with blanks.
|
|
.B $1
|
|
is the value of the
|
|
.IR getuid (2)
|
|
system call,
|
|
.B $2
|
|
is the value of the
|
|
.IR geteuid (2)
|
|
system call,
|
|
.B $3
|
|
is the value of the
|
|
.IR getgid (2)
|
|
system call, and
|
|
.B $4
|
|
is the value of the
|
|
.IR getegid (2)
|
|
system call.
|
|
If there are any additional fields, they are the group IDs returned by
|
|
.IR getgroups (2).
|
|
(Multiple groups may not be supported on all systems.)
|
|
.TP
|
|
.B /dev/stdin
|
|
The standard input.
|
|
.TP
|
|
.B /dev/stdout
|
|
The standard output.
|
|
.TP
|
|
.B /dev/stderr
|
|
The standard error output.
|
|
.TP
|
|
.BI /dev/fd/\^ n
|
|
The file associated with the open file descriptor
|
|
.IR n .
|
|
.PP
|
|
These are particularly useful for error messages. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
print "You blew it!" > "/dev/stderr"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
whereas you would otherwise have to use
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
print "You blew it!" | "cat 1>&2"
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
These file names may also be used on the command line to name data files.
|
|
.SS Numeric Functions
|
|
.PP
|
|
AWK has the following pre-defined arithmetic functions:
|
|
.PP
|
|
.TP \w'\fBsrand(\^\fIexpr\^\fB)\fR'u+1n
|
|
.BI atan2( y , " x" )
|
|
returns the arctangent of
|
|
.I y/x
|
|
in radians.
|
|
.TP
|
|
.BI cos( expr )
|
|
returns the cosine in radians.
|
|
.TP
|
|
.BI exp( expr )
|
|
the exponential function.
|
|
.TP
|
|
.BI int( expr )
|
|
truncates to integer.
|
|
.TP
|
|
.BI log( expr )
|
|
the natural logarithm function.
|
|
.TP
|
|
.B rand()
|
|
returns a random number between 0 and 1.
|
|
.TP
|
|
.BI sin( expr )
|
|
returns the sine in radians.
|
|
.TP
|
|
.BI sqrt( expr )
|
|
the square root function.
|
|
.TP
|
|
.BI srand( expr )
|
|
use
|
|
.I expr
|
|
as a new seed for the random number generator. If no
|
|
.I expr
|
|
is provided, the time of day will be used.
|
|
The return value is the previous seed for the random
|
|
number generator.
|
|
.SS String Functions
|
|
.PP
|
|
AWK has the following pre-defined string functions:
|
|
.PP
|
|
.TP "\w'\fBsprintf(\^\fIfmt\fB\^, \fIexpr-list\^\fB)\fR'u+1n"
|
|
\fBgsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
|
|
for each substring matching the regular expression
|
|
.I r
|
|
in the string
|
|
.IR t ,
|
|
substitute the string
|
|
.IR s ,
|
|
and return the number of substitutions.
|
|
If
|
|
.I t
|
|
is not supplied, use
|
|
.BR $0 .
|
|
.TP
|
|
.BI index( s , " t" )
|
|
returns the index of the string
|
|
.I t
|
|
in the string
|
|
.IR s ,
|
|
or 0 if
|
|
.I t
|
|
is not present.
|
|
.TP
|
|
.BI length( s )
|
|
returns the length of the string
|
|
.IR s ,
|
|
or the length of
|
|
.B $0
|
|
if
|
|
.I s
|
|
is not supplied.
|
|
.TP
|
|
.BI match( s , " r" )
|
|
returns the position in
|
|
.I s
|
|
where the regular expression
|
|
.I r
|
|
occurs, or 0 if
|
|
.I r
|
|
is not present, and sets the values of
|
|
.B RSTART
|
|
and
|
|
.BR RLENGTH .
|
|
.TP
|
|
\fBsplit(\fIs\fB, \fIa\fB, \fIr\fB)\fR
|
|
splits the string
|
|
.I s
|
|
into the array
|
|
.I a
|
|
on the regular expression
|
|
.IR r ,
|
|
and returns the number of fields. If
|
|
.I r
|
|
is omitted,
|
|
.B FS
|
|
is used instead.
|
|
.TP
|
|
.BI sprintf( fmt , " expr-list" )
|
|
prints
|
|
.I expr-list
|
|
according to
|
|
.IR fmt ,
|
|
and returns the resulting string.
|
|
.TP
|
|
\fBsub(\fIr\fB, \fIs\fB, \fIt\fB)\fR
|
|
just like
|
|
.BR gsub() ,
|
|
but only the first matching substring is replaced.
|
|
.TP
|
|
\fBsubstr(\fIs\fB, \fIi\fB, \fIn\fB)\fR
|
|
returns the
|
|
.IR n -character
|
|
substring of
|
|
.I s
|
|
starting at
|
|
.IR i .
|
|
If
|
|
.I n
|
|
is omitted, the rest of
|
|
.I s
|
|
is used.
|
|
.TP
|
|
.BI tolower( str )
|
|
returns a copy of the string
|
|
.IR str ,
|
|
with all the upper-case characters in
|
|
.I str
|
|
translated to their corresponding lower-case counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
.TP
|
|
.BI toupper( str )
|
|
returns a copy of the string
|
|
.IR str ,
|
|
with all the lower-case characters in
|
|
.I str
|
|
translated to their corresponding upper-case counterparts.
|
|
Non-alphabetic characters are left unchanged.
|
|
.SS Time Functions
|
|
.PP
|
|
Since one of the primary uses of AWK programs is processing log files
|
|
that contain time stamp information,
|
|
.I gawk
|
|
provides the following two functions for obtaining time stamps and
|
|
formatting them.
|
|
.PP
|
|
.TP "\w'\fBsystime()\fR'u+1n"
|
|
.B systime()
|
|
returns the current time of day as the number of seconds since the Epoch
|
|
(Midnight UTC, January 1, 1970 on \*(PX systems).
|
|
.TP
|
|
\fBstrftime(\fIformat\fR, \fItimestamp\fB)\fR
|
|
formats
|
|
.I timestamp
|
|
according to the specification in
|
|
.IR format.
|
|
The
|
|
.I timestamp
|
|
should be of the same form as returned by
|
|
.BR systime() .
|
|
If
|
|
.I timestamp
|
|
is missing, the current time of day is used.
|
|
See the specification for the
|
|
.B strftime()
|
|
function in \*(AN C for the format conversions that are
|
|
guaranteed to be available.
|
|
A public-domain version of
|
|
.IR strftime (3)
|
|
and a man page for it are shipped with
|
|
.IR gawk ;
|
|
if that version was used to build
|
|
.IR gawk ,
|
|
then all of the conversions described in that man page are available to
|
|
.IR gawk.
|
|
.SS String Constants
|
|
.PP
|
|
String constants in AWK are sequences of characters enclosed
|
|
between double quotes (\fB"\fR). Within strings, certain
|
|
.I "escape sequences"
|
|
are recognized, as in C. These are:
|
|
.PP
|
|
.TP \w'\fB\e\^\fIddd\fR'u+1n
|
|
.B \e\e
|
|
A literal backslash.
|
|
.TP
|
|
.B \ea
|
|
The ``alert'' character; usually the \s-1ASCII\s+1 \s-1BEL\s+1 character.
|
|
.TP
|
|
.B \eb
|
|
backspace.
|
|
.TP
|
|
.B \ef
|
|
form-feed.
|
|
.TP
|
|
.B \en
|
|
new line.
|
|
.TP
|
|
.B \er
|
|
carriage return.
|
|
.TP
|
|
.B \et
|
|
horizontal tab.
|
|
.TP
|
|
.B \ev
|
|
vertical tab.
|
|
.TP
|
|
.BI \ex "\^hex digits"
|
|
The character represented by the string of hexadecimal digits following
|
|
the
|
|
.BR \ex .
|
|
As in \*(AN C, all following hexadecimal digits are considered part of
|
|
the escape sequence.
|
|
(This feature should tell us something about language design by committee.)
|
|
E.g., "\ex1B" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
|
|
.TP
|
|
.BI \e ddd
|
|
The character represented by the 1-, 2-, or 3-digit sequence of octal
|
|
digits. E.g. "\e033" is the \s-1ASCII\s+1 \s-1ESC\s+1 (escape) character.
|
|
.TP
|
|
.BI \e c
|
|
The literal character
|
|
.IR c\^ .
|
|
.PP
|
|
The escape sequences may also be used inside constant regular expressions
|
|
(e.g.,
|
|
.B "/[\ \et\ef\en\er\ev]/"
|
|
matches whitespace characters).
|
|
.SH FUNCTIONS
|
|
Functions in AWK are defined as follows:
|
|
.PP
|
|
.RS
|
|
\fBfunction \fIname\fB(\fIparameter list\fB) { \fIstatements \fB}\fR
|
|
.RE
|
|
.PP
|
|
Functions are executed when called from within the action parts of regular
|
|
pattern-action statements. Actual parameters supplied in the function
|
|
call are used to instantiate the formal parameters declared in the function.
|
|
Arrays are passed by reference, other variables are passed by value.
|
|
.PP
|
|
Since functions were not originally part of the AWK language, the provision
|
|
for local variables is rather clumsy: They are declared as extra parameters
|
|
in the parameter list. The convention is to separate local variables from
|
|
real parameters by extra spaces in the parameter list. For example:
|
|
.PP
|
|
.RS
|
|
.ft B
|
|
.nf
|
|
function f(p, q, a, b) { # a & b are local
|
|
..... }
|
|
|
|
/abc/ { ... ; f(1, 2) ; ... }
|
|
.fi
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
The left parenthesis in a function call is required
|
|
to immediately follow the function name,
|
|
without any intervening white space.
|
|
This is to avoid a syntactic ambiguity with the concatenation operator.
|
|
This restriction does not apply to the built-in functions listed above.
|
|
.PP
|
|
Functions may call each other and may be recursive.
|
|
Function parameters used as local variables are initialized
|
|
to the null string and the number zero upon function invocation.
|
|
.PP
|
|
The word
|
|
.B func
|
|
may be used in place of
|
|
.BR function .
|
|
.SH EXAMPLES
|
|
.nf
|
|
Print and sort the login names of all users:
|
|
|
|
.ft B
|
|
BEGIN { FS = ":" }
|
|
{ print $1 | "sort" }
|
|
|
|
.ft R
|
|
Count lines in a file:
|
|
|
|
.ft B
|
|
{ nlines++ }
|
|
END { print nlines }
|
|
|
|
.ft R
|
|
Precede each line by its number in the file:
|
|
|
|
.ft B
|
|
{ print FNR, $0 }
|
|
|
|
.ft R
|
|
Concatenate and line number (a variation on a theme):
|
|
|
|
.ft B
|
|
{ print NR, $0 }
|
|
.ft R
|
|
.fi
|
|
.SH SEE ALSO
|
|
.IR egrep (1)
|
|
.PP
|
|
.IR "The AWK Programming Language" ,
|
|
Alfred V. Aho, Brian W. Kernighan, Peter J. Weinberger,
|
|
Addison-Wesley, 1988. ISBN 0-201-07981-X.
|
|
.PP
|
|
.IR "The GAWK Manual" ,
|
|
Edition 0.15, published by the Free Software Foundation, 1993.
|
|
.SH POSIX COMPATIBILITY
|
|
A primary goal for
|
|
.I gawk
|
|
is compatibility with the \*(PX standard, as well as with the
|
|
latest version of \*(UX
|
|
.IR awk .
|
|
To this end,
|
|
.I gawk
|
|
incorporates the following user visible
|
|
features which are not described in the AWK book,
|
|
but are part of
|
|
.I awk
|
|
in System V Release 4, and are in the \*(PX standard.
|
|
.PP
|
|
The
|
|
.B \-v
|
|
option for assigning variables before program execution starts is new.
|
|
The book indicates that command line variable assignment happens when
|
|
.I awk
|
|
would otherwise open the argument as a file, which is after the
|
|
.B BEGIN
|
|
block is executed. However, in earlier implementations, when such an
|
|
assignment appeared before any file names, the assignment would happen
|
|
.I before
|
|
the
|
|
.B BEGIN
|
|
block was run. Applications came to depend on this ``feature.''
|
|
When
|
|
.I awk
|
|
was changed to match its documentation, this option was added to
|
|
accomodate applications that depended upon the old behavior.
|
|
(This feature was agreed upon by both the AT&T and GNU developers.)
|
|
.PP
|
|
The
|
|
.B \-W
|
|
option for implementation specific features is from the \*(PX standard.
|
|
.PP
|
|
When processing arguments,
|
|
.I gawk
|
|
uses the special option ``\fB\-\^\-\fP'' to signal the end of
|
|
arguments, and warns about, but otherwise ignores, undefined options.
|
|
.PP
|
|
The AWK book does not define the return value of
|
|
.BR srand() .
|
|
The System V Release 4 version of \*(UX
|
|
.I awk
|
|
(and the \*(PX standard)
|
|
has it return the seed it was using, to allow keeping track
|
|
of random number sequences. Therefore
|
|
.B srand()
|
|
in
|
|
.I gawk
|
|
also returns its current seed.
|
|
.PP
|
|
Other new features are:
|
|
The use of multiple
|
|
.B \-f
|
|
options (from MKS
|
|
.IR awk );
|
|
the
|
|
.B ENVIRON
|
|
array; the
|
|
.BR \ea ,
|
|
and
|
|
.BR \ev
|
|
escape sequences (done originally in
|
|
.I gawk
|
|
and fed back into AT&T's); the
|
|
.B tolower()
|
|
and
|
|
.B toupper()
|
|
built-in functions (from AT&T); and the \*(AN C conversion specifications in
|
|
.B printf
|
|
(done first in AT&T's version).
|
|
.SH GNU EXTENSIONS
|
|
.I Gawk
|
|
has some extensions to \*(PX
|
|
.IR awk .
|
|
They are described in this section. All the extensions described here
|
|
can be disabled by
|
|
invoking
|
|
.I gawk
|
|
with the
|
|
.B "\-W compat"
|
|
option.
|
|
.PP
|
|
The following features of
|
|
.I gawk
|
|
are not available in
|
|
\*(PX
|
|
.IR awk .
|
|
.RS
|
|
.TP \w'\(bu'u+1n
|
|
\(bu
|
|
The
|
|
.B \ex
|
|
escape sequence.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B systime()
|
|
and
|
|
.B strftime()
|
|
functions.
|
|
.TP
|
|
\(bu
|
|
The special file names available for I/O redirection are not recognized.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B ARGIND
|
|
and
|
|
.B ERRNO
|
|
variables are not special.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B IGNORECASE
|
|
variable and its side-effects are not available.
|
|
.TP
|
|
\(bu
|
|
The
|
|
.B FIELDWIDTHS
|
|
variable and fixed width field splitting.
|
|
.TP
|
|
\(bu
|
|
No path search is performed for files named via the
|
|
.B \-f
|
|
option. Therefore the
|
|
.B AWKPATH
|
|
environment variable is not special.
|
|
.TP
|
|
\(bu
|
|
The use of
|
|
.B "next file"
|
|
to abandon processing of the current input file.
|
|
.RE
|
|
.PP
|
|
The AWK book does not define the return value of the
|
|
.B close()
|
|
function.
|
|
.IR Gawk\^ 's
|
|
.B close()
|
|
returns the value from
|
|
.IR fclose (3),
|
|
or
|
|
.IR pclose (3),
|
|
when closing a file or pipe, respectively.
|
|
.PP
|
|
When
|
|
.I gawk
|
|
is invoked with the
|
|
.B "\-W compat"
|
|
option,
|
|
if the
|
|
.I fs
|
|
argument to the
|
|
.B \-F
|
|
option is ``t'', then
|
|
.B FS
|
|
will be set to the tab character.
|
|
Since this is a rather ugly special case, it is not the default behavior.
|
|
This behavior also does not occur if
|
|
.B \-Wposix
|
|
has been specified.
|
|
.ig
|
|
.PP
|
|
If
|
|
.I gawk
|
|
was compiled for debugging, it will
|
|
accept the following additional options:
|
|
.TP
|
|
.PD 0
|
|
.B \-Wparsedebug
|
|
.TP
|
|
.PD
|
|
.B \-\^\-parsedebug
|
|
Turn on
|
|
.IR yacc (1)
|
|
or
|
|
.IR bison (1)
|
|
debugging output during program parsing.
|
|
This option should only be of interest to the
|
|
.I gawk
|
|
maintainers, and may not even be compiled into
|
|
.IR gawk .
|
|
..
|
|
.SH HISTORICAL FEATURES
|
|
There are two features of historical AWK implementations that
|
|
.I gawk
|
|
supports.
|
|
First, it is possible to call the
|
|
.B length()
|
|
built-in function not only with no argument, but even without parentheses!
|
|
Thus,
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
a = length
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
is the same as either of
|
|
.RS
|
|
.PP
|
|
.ft B
|
|
a = length()
|
|
.br
|
|
a = length($0)
|
|
.ft R
|
|
.RE
|
|
.PP
|
|
This feature is marked as ``deprecated'' in the \*(PX standard, and
|
|
.I gawk
|
|
will issue a warning about its use if
|
|
.B \-Wlint
|
|
is specified on the command line.
|
|
.PP
|
|
The other feature is the use of the
|
|
.B continue
|
|
statement outside the body of a
|
|
.BR while ,
|
|
.BR for ,
|
|
or
|
|
.B do
|
|
loop. Traditional AWK implementations have treated such usage as
|
|
equivalent to the
|
|
.B next
|
|
statement.
|
|
.I Gawk
|
|
will support this usage if
|
|
.B \-Wposix
|
|
has not been specified.
|
|
.SH BUGS
|
|
The
|
|
.B \-F
|
|
option is not necessary given the command line variable assignment feature;
|
|
it remains only for backwards compatibility.
|
|
.PP
|
|
If your system actually has support for
|
|
.B /dev/fd
|
|
and the associated
|
|
.BR /dev/stdin ,
|
|
.BR /dev/stdout ,
|
|
and
|
|
.B /dev/stderr
|
|
files, you may get different output from
|
|
.I gawk
|
|
than you would get on a system without those files. When
|
|
.I gawk
|
|
interprets these files internally, it synchronizes output to the standard
|
|
output with output to
|
|
.BR /dev/stdout ,
|
|
while on a system with those files, the output is actually to different
|
|
open files.
|
|
Caveat Emptor.
|
|
.SH VERSION INFORMATION
|
|
This man page documents
|
|
.IR gawk ,
|
|
version 2.15.
|
|
.PP
|
|
Starting with the 2.15 version of
|
|
.IR gawk ,
|
|
the
|
|
.BR \-c ,
|
|
.BR \-V ,
|
|
.BR \-C ,
|
|
.ig
|
|
.BR \-D ,
|
|
..
|
|
.BR \-a ,
|
|
and
|
|
.B \-e
|
|
options of the 2.11 version are no longer recognized.
|
|
.SH AUTHORS
|
|
The original version of \*(UX
|
|
.I awk
|
|
was designed and implemented by Alfred Aho,
|
|
Peter Weinberger, and Brian Kernighan of AT&T Bell Labs. Brian Kernighan
|
|
continues to maintain and enhance it.
|
|
.PP
|
|
Paul Rubin and Jay Fenlason,
|
|
of the Free Software Foundation, wrote
|
|
.IR gawk ,
|
|
to be compatible with the original version of
|
|
.I awk
|
|
distributed in Seventh Edition \*(UX.
|
|
John Woods contributed a number of bug fixes.
|
|
David Trueman, with contributions
|
|
from Arnold Robbins, made
|
|
.I gawk
|
|
compatible with the new version of \*(UX
|
|
.IR awk .
|
|
.PP
|
|
The initial DOS port was done by Conrad Kwok and Scott Garfinkle.
|
|
Scott Deifik is the current DOS maintainer. Pat Rankin did the
|
|
port to VMS, and Michal Jaegermann did the port to the Atari ST.
|
|
.SH ACKNOWLEDGEMENTS
|
|
Brian Kernighan of Bell Labs
|
|
provided valuable assistance during testing and debugging.
|
|
We thank him.
|