43e0efbdeb
difficult to read and understand. Most manuals of English style therefore say that you should use "use".
698 lines
23 KiB
Groff
698 lines
23 KiB
Groff
.\" -*- nroff -*-
|
|
.\"
|
|
.\" $NetBSD: bpf.4,v 1.23 2003/02/04 22:38:15 perry Exp $
|
|
.\"
|
|
.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that: (1) source code distributions
|
|
.\" retain the above copyright notice and this paragraph in its entirety, (2)
|
|
.\" distributions including binary code include the above copyright notice and
|
|
.\" this paragraph in its entirety in the documentation or other materials
|
|
.\" provided with the distribution, and (3) all advertising materials mentioning
|
|
.\" features or use of this software display the following acknowledgement:
|
|
.\" ``This product includes software developed by the University of California,
|
|
.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
|
|
.\" the University nor the names of its contributors may be used to endorse
|
|
.\" or promote products derived from this software without specific prior
|
|
.\" written permission.
|
|
.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
|
|
.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|
.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|
.\"
|
|
.\" This document is derived in part from the enet man page (enet.4)
|
|
.\" distributed with 4.3BSD Unix.
|
|
.\"
|
|
.Dd August 29, 2002
|
|
.Dt BPF 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm bpf
|
|
.Nd Berkeley Packet Filter raw network interface
|
|
.Sh SYNOPSIS
|
|
.Cd "pseudo-device bpfilter 16"
|
|
.Sh DESCRIPTION
|
|
The Berkeley Packet Filter
|
|
provides a raw interface to data link layers in a protocol
|
|
independent fashion.
|
|
All packets on the network, even those destined for other hosts,
|
|
are accessible through this mechanism.
|
|
.Pp
|
|
The packet filter appears as a character special device,
|
|
.Pa /dev/bpf0 ,
|
|
.Pa /dev/bpf1 ,
|
|
etc.
|
|
After opening the device, the file descriptor must be bound to a
|
|
specific network interface with the
|
|
.Dv BIOSETIF
|
|
ioctl.
|
|
A given interface can be shared by multiple listeners, and the filter
|
|
underlying each descriptor will see an identical packet stream.
|
|
The total number of open
|
|
files is limited to the value given in the kernel configuration; the
|
|
example given in the SYNOPSIS above sets the limit to 16.
|
|
.Pp
|
|
A separate device file is required for each minor device.
|
|
If a file is in use, the open will fail and
|
|
.Va errno
|
|
will be set to
|
|
.Er EBUSY .
|
|
.Pp
|
|
Associated with each open instance of a
|
|
.Nm
|
|
file is a user-settable packet filter.
|
|
Whenever a packet is received by an interface,
|
|
all file descriptors listening on that interface apply their filter.
|
|
Each descriptor that accepts the packet receives its own copy.
|
|
.Pp
|
|
Reads from these files return the next group of packets
|
|
that have matched the filter.
|
|
To improve performance, the buffer passed to read must be
|
|
the same size as the buffers used internally by
|
|
.Nm "" .
|
|
This size is returned by the
|
|
.Dv BIOCGBLEN
|
|
ioctl (see below), and under
|
|
BSD, can be set with
|
|
.Dv BIOCSBLEN .
|
|
Note that an individual packet larger than this size is necessarily
|
|
truncated.
|
|
.Pp
|
|
The packet filter will support any link level protocol that has fixed length
|
|
headers. Currently, only Ethernet, SLIP and PPP drivers have been
|
|
modified to interact with
|
|
.Nm "" .
|
|
.Pp
|
|
Since packet data is in network byte order, applications should use the
|
|
.Xr byteorder 3
|
|
macros to extract multi-byte values.
|
|
.Pp
|
|
A packet can be sent out on the network by writing to a
|
|
.Nm
|
|
file descriptor. The writes are unbuffered, meaning only one
|
|
packet can be processed per write.
|
|
Currently, only writes to Ethernets and SLIP links are supported.
|
|
.Sh IOCTLS
|
|
The
|
|
.Xr ioctl 2
|
|
command codes below are defined in \*[Lt]net/bpf.h\*[Gt]. All commands require
|
|
these includes:
|
|
.Bd -literal -offset indent
|
|
#include \*[Lt]sys/types.h\*[Gt]
|
|
#include \*[Lt]sys/time.h\*[Gt]
|
|
#include \*[Lt]sys/ioctl.h\*[Gt]
|
|
#include \*[Lt]net/bpf.h\*[Gt]
|
|
.Ed
|
|
.Pp
|
|
Additionally,
|
|
.Dv BIOCGETIF
|
|
and
|
|
.Dv BIOCSETIF
|
|
require
|
|
.Pa \*[Lt]net/if.h\*[Gt] .
|
|
.Pp
|
|
The (third) argument to the
|
|
.Xr ioctl 2
|
|
should be a pointer to the type indicated.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Dv "BIOCGBLEN (u_int)"
|
|
Returns the required buffer length for reads on
|
|
.Nm
|
|
files.
|
|
.It Dv "BIOCSBLEN (u_int)"
|
|
Sets the buffer length for reads on
|
|
.Nm
|
|
files. The buffer must be set before the file is attached to an interface
|
|
with
|
|
.Dv BIOCSETIF .
|
|
If the requested buffer size cannot be accommodated, the closest
|
|
allowable size will be set and returned in the argument.
|
|
A read call will result in EIO if it is passed a buffer that is not this size.
|
|
.It Dv BIOCGDLT (u_int)
|
|
Returns the type of the data link layer underlying the attached interface.
|
|
.Er EINVAL
|
|
is returned if no interface has been specified.
|
|
The device types, prefixed with
|
|
.Dq DLT_ ,
|
|
are defined in \*[Lt]net/bpf.h\*[Gt].
|
|
.It Dv BIOCGDLTLIST (struct bpf_dltlist)
|
|
Returns an array of available type of the data link layer
|
|
underlying the attached interface:
|
|
.Bd -literal -offset indent
|
|
struct bpf_dltlist {
|
|
u_int bfl_len;
|
|
u_int *bfl_list;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The available type is returned to the array pointed to the
|
|
.Va bfl_list
|
|
field while its length in u_int is supplied to the
|
|
.Va bfl_len
|
|
field.
|
|
.Er ENOMEM
|
|
is returned if there is not enough buffer. The
|
|
.Va bfl_len
|
|
field is modified on return to indicate the actual length in u_int
|
|
of the array returned.
|
|
If
|
|
.Va bfl_list
|
|
is
|
|
Dv NULL ,
|
|
the
|
|
.Va bfl_len
|
|
field is returned to indicate the required length of an array in u_int.
|
|
.It Dv BIOCSDLT (u_int)
|
|
Change the type of the data link layer underlying the attached interface.
|
|
.Er EINVAL
|
|
is returned if no interface has been specified or the specified
|
|
type is not available for the interface.
|
|
.It Dv BIOCPROMISC
|
|
Forces the interface into promiscuous mode.
|
|
All packets, not just those destined for the local host, are processed.
|
|
Since more than one file can be listening on a given interface,
|
|
a listener that opened its interface non-promiscuously may receive
|
|
packets promiscuously. This problem can be remedied with an
|
|
appropriate filter.
|
|
.Pp
|
|
The interface remains in promiscuous mode until all files listening
|
|
promiscuously are closed.
|
|
.It Dv BIOCFLUSH
|
|
Flushes the buffer of incoming packets,
|
|
and resets the statistics that are returned by
|
|
.Dv BIOCGSTATS .
|
|
.It Dv BIOCGETIF (struct ifreq)
|
|
Returns the name of the hardware interface that the file is listening on.
|
|
The name is returned in the ifr_name field of
|
|
.Fa ifr .
|
|
All other fields are undefined.
|
|
.It Dv BIOCSETIF (struct ifreq)
|
|
Sets the hardware interface associate with the file. This
|
|
command must be performed before any packets can be read.
|
|
The device is indicated by name using the
|
|
.Dv ifr_name
|
|
field of the
|
|
.Fa ifreq .
|
|
Additionally, performs the actions of
|
|
.Dv BIOCFLUSH .
|
|
.It Dv BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
|
|
Set or get the read timeout parameter.
|
|
The
|
|
.Fa timeval
|
|
specifies the length of time to wait before timing
|
|
out on a read request.
|
|
This parameter is initialized to zero by
|
|
.Xr open 2 ,
|
|
indicating no timeout.
|
|
.It Dv BIOCGSTATS (struct bpf_stat)
|
|
Returns the following structure of packet statistics:
|
|
.Bd -literal -offset indent
|
|
struct bpf_stat {
|
|
u_int bs_recv;
|
|
u_int bs_drop;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields are:
|
|
.Bl -tag -width bs_recv -offset indent
|
|
.It Va bs_recv
|
|
the number of packets received by the descriptor since opened or reset
|
|
(including any buffered since the last read call);
|
|
and
|
|
.It Va bs_drop
|
|
the number of packets which were accepted by the filter but dropped by the
|
|
kernel because of buffer overflows
|
|
(i.e., the application's reads aren't keeping up with the packet traffic).
|
|
.El
|
|
.It Dv BIOCIMMEDIATE (u_int)
|
|
Enable or disable
|
|
.Dq immediate mode ,
|
|
based on the truth value of the argument.
|
|
When immediate mode is enabled, reads return immediately upon packet
|
|
reception. Otherwise, a read will block until either the kernel buffer
|
|
becomes full or a timeout occurs.
|
|
This is useful for programs like
|
|
.Xr rarpd 8 ,
|
|
which must respond to messages in real time.
|
|
The default for a new file is off.
|
|
.It Dv BIOCSETF (struct bpf_program)
|
|
Sets the filter program used by the kernel to discard uninteresting
|
|
packets. An array of instructions and its length is passed in using
|
|
the following structure:
|
|
.Bd -literal -offset indent
|
|
struct bpf_program {
|
|
int bf_len;
|
|
struct bpf_insn *bf_insns;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The filter program is pointed to by the
|
|
.Va bf_insns
|
|
field while its length in units of
|
|
.Sq struct bpf_insn
|
|
is given by the
|
|
.Va bf_len
|
|
field.
|
|
Also, the actions of
|
|
.Dv BIOCFLUSH
|
|
are performed.
|
|
.Pp
|
|
See section
|
|
.Sy FILTER MACHINE
|
|
for an explanation of the filter language.
|
|
.It Dv BIOCVERSION (struct bpf_version)
|
|
Returns the major and minor version numbers of the filter language currently
|
|
recognized by the kernel. Before installing a filter, applications must check
|
|
that the current version is compatible with the running kernel. Version
|
|
numbers are compatible if the major numbers match and the application minor
|
|
is less than or equal to the kernel minor. The kernel version number is
|
|
returned in the following structure:
|
|
.Bd -literal -offset indent
|
|
struct bpf_version {
|
|
u_short bv_major;
|
|
u_short bv_minor;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The current version numbers are given by
|
|
.Dv BPF_MAJOR_VERSION
|
|
and
|
|
.Dv BPF_MINOR_VERSION
|
|
from \*[Lt]net/bpf.h\*[Gt].
|
|
An incompatible filter
|
|
may result in undefined behavior (most likely, an error returned by
|
|
.Xr ioctl 2
|
|
or haphazard packet matching).
|
|
.It Dv BIOCSRSIG BIOCGRSIG (u_int signal)
|
|
Set or get the receive signal. This signal will be sent to the process or process group
|
|
specified by FIOSETOWN. It defaults to SIGIO.
|
|
.El
|
|
.Sh STANDARD IOCTLS
|
|
.Nm
|
|
now supports several standard
|
|
.Xr ioctl 2 's
|
|
which allow the user to do async and/or non-blocking I/O to an open
|
|
.I bpf
|
|
file descriptor.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Dv FIONREAD (int)
|
|
Returns the number of bytes that are immediately available for reading.
|
|
.It Dv SIOCGIFADDR (struct ifreq)
|
|
Returns the address associated with the interface.
|
|
.It Dv FIONBIO (int)
|
|
Set or clear non-blocking I/O. If arg is non-zero, then doing a
|
|
.Xr read 2
|
|
when no data is available will return -1 and
|
|
.Va errno
|
|
will be set to
|
|
.Er EAGAIN .
|
|
If arg is zero, non-blocking I/O is disabled. Note: setting this
|
|
overrides the timeout set by
|
|
.Dv BIOCSRTIMEOUT .
|
|
.It Dv FIOASYNC (int)
|
|
Enable or disable async I/O. When enabled (arg is non-zero), the process or
|
|
process group specified by FIOSETOWN will start receiving SIGIO's when packets
|
|
arrive.
|
|
Note that you must do an FIOSETOWN in order for this to take affect, as
|
|
the system will not default this for you.
|
|
The signal may be changed via
|
|
.Dv BIOCSRSIG .
|
|
.It Dv FIOSETOWN FIOGETOWN (int)
|
|
Set or get the process or process group (if negative) that should receive SIGIO
|
|
when packets are available.
|
|
The signal may be changed using
|
|
.Dv BIOCSRSIG
|
|
(see above).
|
|
.El
|
|
.Sh BPF HEADER
|
|
The following structure is prepended to each packet returned by
|
|
.Xr read 2 :
|
|
.Bd -literal -offset indent
|
|
struct bpf_hdr {
|
|
struct timeval bh_tstamp;
|
|
u_long bh_caplen;
|
|
u_long bh_datalen;
|
|
u_short bh_hdrlen;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields, whose values are stored in host order, and are:
|
|
.Bl -tag -width bh_datalen -offset indent
|
|
.It Va bh_tstamp
|
|
The time at which the packet was processed by the packet filter.
|
|
.It Va bh_caplen
|
|
The length of the captured portion of the packet. This is the minimum of
|
|
the truncation amount specified by the filter and the length of the packet.
|
|
.It Va bh_datalen
|
|
The length of the packet off the wire.
|
|
This value is independent of the truncation amount specified by the filter.
|
|
.It Va bh_hdrlen
|
|
The length of the BPF header, which may not be equal to
|
|
.Em sizeof(struct bpf_hdr) .
|
|
.El
|
|
.Pp
|
|
The
|
|
.Va bh_hdrlen
|
|
field exists to account for
|
|
padding between the header and the link level protocol.
|
|
The purpose here is to guarantee proper alignment of the packet
|
|
data structures, which is required on alignment sensitive
|
|
architectures and and improves performance on many other architectures.
|
|
The packet filter ensures that the
|
|
.Va bpf_hdr
|
|
and the
|
|
.Em network layer
|
|
header will be word aligned. Suitable precautions
|
|
must be taken when accessing the link layer protocol fields on alignment
|
|
restricted machines. (This isn't a problem on an Ethernet, since
|
|
the type field is a short falling on an even offset,
|
|
and the addresses are probably accessed in a bytewise fashion).
|
|
.Pp
|
|
Additionally, individual packets are padded so that each starts
|
|
on a word boundary. This requires that an application
|
|
has some knowledge of how to get from packet to packet.
|
|
The macro
|
|
.Dv BPF_WORDALIGN
|
|
is defined in
|
|
.Pa \*[Lt]net/bpf.h\*[Gt]
|
|
to facilitate this process.
|
|
It rounds up its argument
|
|
to the nearest word aligned value (where a word is BPF_ALIGNMENT bytes wide).
|
|
.Pp
|
|
For example, if
|
|
.Sq Va p
|
|
points to the start of a packet, this expression
|
|
will advance it to the next packet:
|
|
.Pp
|
|
.Dl p = (char *)p + BPF_WORDALIGN(p-\*[Gt]bh_hdrlen + p-\*[Gt]bh_caplen)
|
|
.Pp
|
|
For the alignment mechanisms to work properly, the
|
|
buffer passed to
|
|
.Xr read 2
|
|
must itself be word aligned.
|
|
.Xr malloc 3
|
|
will always return an aligned buffer.
|
|
.Sh FILTER MACHINE
|
|
A filter program is an array of instructions, with all branches forwardly
|
|
directed, terminated by a
|
|
.Sy return
|
|
instruction.
|
|
Each instruction performs some action on the pseudo-machine state,
|
|
which consists of an accumulator, index register, scratch memory store,
|
|
and implicit program counter.
|
|
.Pp
|
|
The following structure defines the instruction format:
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn {
|
|
u_short code;
|
|
u_char jt;
|
|
u_char jf;
|
|
long k;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Va k
|
|
field is used in different ways by different instructions,
|
|
and the
|
|
.Va jt
|
|
and
|
|
.Va jf
|
|
fields are used as offsets
|
|
by the branch instructions.
|
|
The opcodes are encoded in a semi-hierarchical fashion.
|
|
There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
|
|
BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC. Various other mode and
|
|
operator bits are or'd into the class to give the actual instructions.
|
|
The classes and modes are defined in \*[Lt]net/bpf.h\*[Gt].
|
|
.Pp
|
|
Below are the semantics for each defined BPF instruction.
|
|
We use the convention that A is the accumulator, X is the index register,
|
|
P[] packet data, and M[] scratch memory store.
|
|
P[i:n] gives the data at byte offset
|
|
.Dq i
|
|
in the packet,
|
|
interpreted as a word (n=4),
|
|
unsigned halfword (n=2), or unsigned byte (n=1).
|
|
M[i] gives the i'th word in the scratch memory store, which is only
|
|
addressed in word units. The memory store is indexed from 0 to BPF_MEMWORDS-1.
|
|
.Va k ,
|
|
.Va jt ,
|
|
and
|
|
.Va jf
|
|
are the corresponding fields in the
|
|
instruction definition.
|
|
.Dq len
|
|
refers to the length of the packet.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Sy BPF_LD
|
|
These instructions copy a value into the accumulator. The type of the
|
|
source operand is specified by an
|
|
.Dq addressing mode
|
|
and can be a constant
|
|
.Sy ( BBPF_IMM ) ,
|
|
packet data at a fixed offset
|
|
.Sy ( BPF_ABS ) ,
|
|
packet data at a variable offset
|
|
.Sy ( BPF_IND ) ,
|
|
the packet length
|
|
.Sy ( BPF_LEN ) ,
|
|
or a word in the scratch memory store
|
|
.Sy ( BPF_MEM ) .
|
|
For
|
|
.Sy BPF_IND
|
|
and
|
|
.Sy BPF_ABS ,
|
|
the data size must be specified as a word
|
|
.Sy ( BPF_W ) ,
|
|
halfword
|
|
.Sy ( BPF_H ) ,
|
|
or byte
|
|
.Sy ( BPF_B ) .
|
|
The semantics of all the recognized BPF_LD instructions follow.
|
|
.Bl -column "BPF_LD+BPF_W+BPF_ABS" "A \*[Lt]- P[k:4]" -width indent -offset indent
|
|
.It Sy BPF_LD+BPF_W+BPF_ABS Ta A \*[Lt]- P[k:4]
|
|
.It Li Sy BPF_LD+BPF_H+BPF_ABS Ta A \*[Lt]- P[k:2]
|
|
.It Li Sy BPF_LD+BPF_B+BPF_ABS Ta A \*[Lt]- P[k:1]
|
|
.It Li Sy BPF_LD+BPF_W+BPF_IND Ta A \*[Lt]- P[X+k:4]
|
|
.It Li Sy BPF_LD+BPF_H+BPF_IND Ta A \*[Lt]- P[X+k:2]
|
|
.It Li Sy BPF_LD+BPF_B+BPF_IND Ta A \*[Lt]- P[X+k:1]
|
|
.It Li Sy BPF_LD+BPF_W+BPF_LEN Ta A \*[Lt]- len
|
|
.It Li Sy BPF_LD+BPF_IMM Ta A \*[Lt]- k
|
|
.It Li Sy BPF_LD+BPF_MEM Ta A \*[Lt]- M[k]
|
|
.El
|
|
.It Sy BPF_LDX
|
|
These instructions load a value into the index register. Note that
|
|
the addressing modes are more restricted than those of the accumulator loads,
|
|
but they include
|
|
.Sy BPF_MSH ,
|
|
a hack for efficiently loading the IP header length.
|
|
.Bl -column "BPF_LDX+BPF_W+BPF_IMM" "X \*[Lt]- k" -width indent -offset indent
|
|
.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X \*[Lt]- k
|
|
.It Li Sy BPF_LDX+BPF_W+BPF_MEM Ta X \*[Lt]- M[k]
|
|
.It Li Sy BPF_LDX+BPF_W+BPF_LEN Ta X \*[Lt]- len
|
|
.It Li Sy BPF_LDX+BPF_B+BPF_MSH Ta X \*[Lt]- 4*(P[k:1]\*[Am]0xf)
|
|
.El
|
|
.It Sy BPF_ST
|
|
This instruction stores the accumulator into the scratch memory.
|
|
We do not need an addressing mode since there is only one possibility
|
|
for the destination.
|
|
.Bl -column "BPF_ST" "M[k] \*[Lt]- A" -width indent -offset indent
|
|
.It Sy BPF_ST Ta M[k] \*[Lt]- A
|
|
.El
|
|
.It Sy BPF_STX
|
|
This instruction stores the index register in the scratch memory store.
|
|
.Bl -column "BPF_STX" "M[k] \*[Lt]- X" -width indent -offset indent
|
|
.It Sy BPF_STX Ta M[k] \*[Lt]- X
|
|
.El
|
|
.It Sy BPF_ALU
|
|
The alu instructions perform operations between the accumulator and
|
|
index register or constant, and store the result back in the accumulator.
|
|
For binary operations, a source mode is required
|
|
.Sy ( BPF_K
|
|
or
|
|
.Sy BPF_X ) .
|
|
.Bl -column "BPF_ALU+BPF_ADD+BPF_K" "A \*[Lt]- A + k" -width indent -offset indent
|
|
.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A \*[Lt]- A + k
|
|
.It Li Sy BPF_ALU+BPF_SUB+BPF_K Ta A \*[Lt]- A - k
|
|
.It Li Sy BPF_ALU+BPF_MUL+BPF_K Ta A \*[Lt]- A * k
|
|
.It Li Sy BPF_ALU+BPF_DIV+BPF_K Ta A \*[Lt]- A / k
|
|
.It Li Sy BPF_ALU+BPF_AND+BPF_K Ta A \*[Lt]- A \*[Am] k
|
|
.It Li Sy BPF_ALU+BPF_OR+BPF_K Ta A \*[Lt]- A | k
|
|
.It Li Sy BPF_ALU+BPF_LSH+BPF_K Ta A \*[Lt]- A \*[Lt]\*[Lt] k
|
|
.It Li Sy BPF_ALU+BPF_RSH+BPF_K Ta A \*[Lt]- A \*[Gt]\*[Gt] k
|
|
.It Li Sy BPF_ALU+BPF_ADD+BPF_X Ta A \*[Lt]- A + X
|
|
.It Li Sy BPF_ALU+BPF_SUB+BPF_X Ta A \*[Lt]- A - X
|
|
.It Li Sy BPF_ALU+BPF_MUL+BPF_X Ta A \*[Lt]- A * X
|
|
.It Li Sy BPF_ALU+BPF_DIV+BPF_X Ta A \*[Lt]- A / X
|
|
.It Li Sy BPF_ALU+BPF_AND+BPF_X Ta A \*[Lt]- A \*[Am] X
|
|
.It Li Sy BPF_ALU+BPF_OR+BPF_X Ta A \*[Lt]- A | X
|
|
.It Li Sy BPF_ALU+BPF_LSH+BPF_X Ta A \*[Lt]- A \*[Lt]\*[Lt] X
|
|
.It Li Sy BPF_ALU+BPF_RSH+BPF_X Ta A \*[Lt]- A \*[Gt]\*[Gt] X
|
|
.It Li Sy BPF_ALU+BPF_NEG Ta A \*[Lt]- -A
|
|
.El
|
|
.It Sy BPF_JMP
|
|
The jump instructions alter flow of control. Conditional jumps
|
|
compare the accumulator against a constant
|
|
.Sy ( BPF_K )
|
|
or the index register
|
|
.Sy ( BPF_X ) .
|
|
If the result is true (or non-zero),
|
|
the true branch is taken, otherwise the false branch is taken.
|
|
Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
|
|
However, the jump always
|
|
.Sy ( BPF_JA )
|
|
opcode uses the 32 bit
|
|
.Va k
|
|
field as the offset, allowing arbitrarily distant destinations.
|
|
All conditionals use unsigned comparison conventions.
|
|
.Bl -column "BPF_JMP+BPF_JGE+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -width indent -offset indent
|
|
.It Sy BPF_JMP+BPF_JA Ta pc += k
|
|
.It Li Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A \*[Gt] k) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A \*[Am] k) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A \*[Gt] X) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
|
|
.It Li Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A \*[Am] X) ? jt : jf"
|
|
.El
|
|
.It Sy BPF_RET
|
|
The return instructions terminate the filter program and specify the amount
|
|
of packet to accept (i.e., they return the truncation amount). A return
|
|
value of zero indicates that the packet should be ignored.
|
|
The return value is either a constant
|
|
.Sy ( BPF_K )
|
|
or the accumulator
|
|
.Sy ( BPF_A ) .
|
|
.Bl -column "BPF_RET+BPF_A" "accept A bytes" -width indent -offset indent
|
|
.It Sy BPF_RET+BPF_A Ta accept A bytes
|
|
.It Li Sy BPF_RET+BPF_K Ta accept k bytes
|
|
.El
|
|
.It Sy BPF_MISC
|
|
The miscellaneous category was created for anything that doesn't
|
|
fit into the above classes, and for any new instructions that might need to
|
|
be added. Currently, these are the register transfer instructions
|
|
that copy the index register to the accumulator or vice versa.
|
|
.Bl -column "BPF_MISC+BPF_TAX" "X \*[Lt]- A" -width indent -offset indent
|
|
.It Sy BPF_MISC+BPF_TAX Ta X \*[Lt]- A
|
|
.It Li Sy BPF_MISC+BPF_TXA Ta A \*[Lt]- X
|
|
.El
|
|
.El
|
|
.Pp
|
|
The BPF interface provides the following macros to facilitate
|
|
array initializers:
|
|
.Bd -unfilled -offset indent
|
|
.Sy BPF_STMT No (opcode, operand)
|
|
.Sy BPF_JUMP No (opcode, operand, true_offset, false_offset)
|
|
.Ed
|
|
.Sh FILES
|
|
.Pa /dev/bpf0 ,
|
|
.Pa /dev/bpf1 ,
|
|
.Pa ...
|
|
.Sh EXAMPLES
|
|
The following filter is taken from the Reverse ARP Daemon. It accepts
|
|
only Reverse ARP requests.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
|
|
sizeof(struct ether_header)),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Pp
|
|
This filter accepts only IP packets between host 128.3.112.15 and
|
|
128.3.112.35.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Finally, this filter returns only TCP finger packets. We must parse
|
|
the IP header to reach the TCP header. The
|
|
.Sy BPF_JSET
|
|
instruction checks that the IP fragment offset is 0 so we are sure
|
|
that we have a TCP header.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
|
|
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
|
|
BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
|
|
BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Sh SEE ALSO
|
|
.Xr ioctl 2 ,
|
|
.Xr read 2 ,
|
|
.Xr select 2 ,
|
|
.Xr signal 3 ,
|
|
.Xr tcpdump 8
|
|
.Rs
|
|
.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
|
|
.%A S. McCanne
|
|
.%A V. Jacobson
|
|
.%J Proceedings of the 1993 Winter USENIX
|
|
.%C Technical Conference, San Diego, CA
|
|
.Re
|
|
.Sh HISTORY
|
|
The Enet packet filter was created in 1980 by Mike Accetta and
|
|
Rick Rashid at Carnegie-Mellon University. Jeffrey Mogul, at
|
|
Stanford, ported the code to BSD and continued its development from
|
|
1983 on. Since then, it has evolved into the Ultrix Packet Filter
|
|
at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
|
|
.Sh AUTHORS
|
|
Steven McCanne, of Lawrence Berkeley Laboratory, implemented BPF in
|
|
Summer 1990. The design was in collaboration with Van Jacobson,
|
|
also of Lawrence Berkeley Laboratory.
|
|
.Sh BUGS
|
|
The read buffer must be of a fixed size (returned by the
|
|
.Dv BIOCGBLEN
|
|
ioctl).
|
|
.Pp
|
|
A file that does not request promiscuous mode may receive promiscuously
|
|
received packets as a side effect of another file requesting this
|
|
mode on the same hardware interface. This could be fixed in the kernel
|
|
with additional processing overhead. However, we favor the model where
|
|
all files must assume that the interface is promiscuous, and if
|
|
so desired, must use a filter to reject foreign packets.
|
|
.Pp
|
|
Data link protocols with variable length headers are not currently supported.
|
|
.Pp
|
|
Under SunOS, if a BPF application reads more than 2^31 bytes of
|
|
data, read will fail in
|
|
.Er EINVAL .
|
|
You can either fix the bug in SunOS,
|
|
or lseek to 0 when read fails for this reason.
|
|
.Pp
|
|
.Dq Immediate mode
|
|
and the
|
|
.Dq read timeout
|
|
are misguided features.
|
|
This functionality can be emulated with non-blocking mode and
|
|
.Xr select 2 .
|