800 lines
25 KiB
Groff
800 lines
25 KiB
Groff
.\" -*- nroff -*-
|
|
.\"
|
|
.\" $NetBSD: bpf.4,v 1.53 2013/08/29 20:02:35 wiz Exp $
|
|
.\"
|
|
.\" Copyright (c) 1990, 1991, 1992, 1993, 1994
|
|
.\" The Regents of the University of California. All rights reserved.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that: (1) source code distributions
|
|
.\" retain the above copyright notice and this paragraph in its entirety, (2)
|
|
.\" distributions including binary code include the above copyright notice and
|
|
.\" this paragraph in its entirety in the documentation or other materials
|
|
.\" provided with the distribution, and (3) all advertising materials mentioning
|
|
.\" features or use of this software display the following acknowledgement:
|
|
.\" ``This product includes software developed by the University of California,
|
|
.\" Lawrence Berkeley Laboratory and its contributors.'' Neither the name of
|
|
.\" the University nor the names of its contributors may be used to endorse
|
|
.\" or promote products derived from this software without specific prior
|
|
.\" written permission.
|
|
.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED
|
|
.\" WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
|
|
.\" MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
|
|
.\"
|
|
.\" This document is derived in part from the enet man page (enet.4)
|
|
.\" distributed with 4.3BSD Unix.
|
|
.\"
|
|
.Dd August 29, 2013
|
|
.Dt BPF 4
|
|
.Os
|
|
.Sh NAME
|
|
.Nm bpf
|
|
.Nd Berkeley Packet Filter raw network interface
|
|
.Sh SYNOPSIS
|
|
.Cd "pseudo-device bpfilter"
|
|
.Sh DESCRIPTION
|
|
The Berkeley Packet Filter
|
|
provides a raw interface to data link layers in a protocol
|
|
independent fashion.
|
|
All packets on the network, even those destined for other hosts,
|
|
are accessible through this mechanism.
|
|
.Pp
|
|
The packet filter appears as a character special device,
|
|
.Pa /dev/bpf .
|
|
After opening the device, the file descriptor must be bound to a
|
|
specific network interface with the
|
|
.Dv BIOCSETIF
|
|
ioctl.
|
|
A given interface can be shared by multiple listeners, and the filter
|
|
underlying each descriptor will see an identical packet stream.
|
|
.Pp
|
|
Associated with each open instance of a
|
|
.Nm
|
|
file is a user-settable packet filter.
|
|
Whenever a packet is received by an interface,
|
|
all file descriptors listening on that interface apply their filter.
|
|
Each descriptor that accepts the packet receives its own copy.
|
|
.Pp
|
|
Reads from these files return the next group of packets
|
|
that have matched the filter.
|
|
To improve performance, the buffer passed to read must be
|
|
the same size as the buffers used internally by
|
|
.Nm .
|
|
This size is returned by the
|
|
.Dv BIOCGBLEN
|
|
ioctl (see below), and can be set with
|
|
.Dv BIOCSBLEN .
|
|
Note that an individual packet larger than this size is necessarily
|
|
truncated.
|
|
.Pp
|
|
Since packet data is in network byte order, applications should use the
|
|
.Xr byteorder 3
|
|
macros to extract multi-byte values.
|
|
.Pp
|
|
A packet can be sent out on the network by writing to a
|
|
.Nm
|
|
file descriptor.
|
|
The writes are unbuffered, meaning only one packet can be processed per write.
|
|
Currently, only writes to Ethernets and SLIP links are supported.
|
|
.Sh IOCTLS
|
|
The
|
|
.Xr ioctl 2
|
|
command codes below are defined in
|
|
.In net/bpf.h .
|
|
All commands require these includes:
|
|
.Bd -literal -offset indent
|
|
#include \*[Lt]sys/types.h\*[Gt]
|
|
#include \*[Lt]sys/time.h\*[Gt]
|
|
#include \*[Lt]sys/ioctl.h\*[Gt]
|
|
#include \*[Lt]net/bpf.h\*[Gt]
|
|
.Ed
|
|
.Pp
|
|
Additionally,
|
|
.Dv BIOCGETIF
|
|
and
|
|
.Dv BIOCSETIF
|
|
require
|
|
.Pa \*[Lt]net/if.h\*[Gt] .
|
|
.Pp
|
|
The (third) argument to the
|
|
.Xr ioctl 2
|
|
should be a pointer to the type indicated.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Dv "BIOCGBLEN (u_int)"
|
|
Returns the required buffer length for reads on
|
|
.Nm
|
|
files.
|
|
.It Dv "BIOCSBLEN (u_int)"
|
|
Sets the buffer length for reads on
|
|
.Nm
|
|
files.
|
|
The buffer must be set before the file is attached to an interface with
|
|
.Dv BIOCSETIF .
|
|
If the requested buffer size cannot be accommodated, the closest
|
|
allowable size will be set and returned in the argument.
|
|
A read call will result in
|
|
.Er EINVAL
|
|
if it is passed a buffer that is not this size.
|
|
.It Dv BIOCGDLT (u_int)
|
|
Returns the type of the data link layer underlying the attached interface.
|
|
.Er EINVAL
|
|
is returned if no interface has been specified.
|
|
The device types, prefixed with
|
|
.Dq DLT_ ,
|
|
are defined in
|
|
.In net/bpf.h .
|
|
.It Dv BIOCGDLTLIST (struct bpf_dltlist)
|
|
Returns an array of the available types of the data link layer
|
|
underlying the attached interface:
|
|
.Bd -literal -offset indent
|
|
struct bpf_dltlist {
|
|
u_int bfl_len;
|
|
u_int *bfl_list;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The available types are returned in the array pointed to by the
|
|
.Va bfl_list
|
|
field while their length in u_int is supplied to the
|
|
.Va bfl_len
|
|
field.
|
|
.Er ENOMEM
|
|
is returned if there is not enough buffer space and
|
|
.Er EFAULT
|
|
is returned if a bad address is encountered.
|
|
The
|
|
.Va bfl_len
|
|
field is modified on return to indicate the actual length in u_int
|
|
of the array returned.
|
|
If
|
|
.Va bfl_list
|
|
is
|
|
.Dv NULL ,
|
|
the
|
|
.Va bfl_len
|
|
field is set to indicate the required length of an array in u_int.
|
|
.It Dv BIOCSDLT (u_int)
|
|
Changes the type of the data link layer underlying the attached interface.
|
|
.Er EINVAL
|
|
is returned if no interface has been specified or the specified
|
|
type is not available for the interface.
|
|
.It Dv BIOCPROMISC
|
|
Forces the interface into promiscuous mode.
|
|
All packets, not just those destined for the local host, are processed.
|
|
Since more than one file can be listening on a given interface,
|
|
a listener that opened its interface non-promiscuously may receive
|
|
packets promiscuously.
|
|
This problem can be remedied with an appropriate filter.
|
|
.Pp
|
|
The interface remains in promiscuous mode until all files listening
|
|
promiscuously are closed.
|
|
.It Dv BIOCFLUSH
|
|
Flushes the buffer of incoming packets,
|
|
and resets the statistics that are returned by
|
|
.Dv BIOCGSTATS .
|
|
.It Dv BIOCGETIF (struct ifreq)
|
|
Returns the name of the hardware interface that the file is listening on.
|
|
The name is returned in the ifr_name field of
|
|
.Fa ifr .
|
|
All other fields are undefined.
|
|
.It Dv BIOCSETIF (struct ifreq)
|
|
Sets the hardware interface associated with the file.
|
|
This command must be performed before any packets can be read.
|
|
The device is indicated by name using the
|
|
.Dv ifr_name
|
|
field of the
|
|
.Fa ifreq .
|
|
Additionally, performs the actions of
|
|
.Dv BIOCFLUSH .
|
|
.It Dv BIOCSRTIMEOUT, BIOCGRTIMEOUT (struct timeval)
|
|
Sets or gets the read timeout parameter.
|
|
The
|
|
.Fa timeval
|
|
specifies the length of time to wait before timing
|
|
out on a read request.
|
|
This parameter is initialized to zero by
|
|
.Xr open 2 ,
|
|
indicating no timeout.
|
|
.It Dv BIOCGSTATS (struct bpf_stat)
|
|
Returns the following structure of packet statistics:
|
|
.Bd -literal -offset indent
|
|
struct bpf_stat {
|
|
uint64_t bs_recv;
|
|
uint64_t bs_drop;
|
|
uint64_t bs_capt;
|
|
uint64_t bs_padding[13];
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields are:
|
|
.Bl -tag -width bs_recv -offset indent
|
|
.It Va bs_recv
|
|
the number of packets received by the descriptor since opened or reset
|
|
(including any buffered since the last read call);
|
|
.It Va bs_drop
|
|
the number of packets which were accepted by the filter but dropped by the
|
|
kernel because of buffer overflows
|
|
(i.e., the application's reads aren't keeping up with the packet
|
|
traffic); and
|
|
.It Va bs_capt
|
|
the number of packets accepted by the filter.
|
|
.El
|
|
.It Dv BIOCIMMEDIATE (u_int)
|
|
Enables or disables
|
|
.Dq immediate mode ,
|
|
based on the truth value of the argument.
|
|
When immediate mode is enabled, reads return immediately upon packet
|
|
reception.
|
|
Otherwise, a read will block until either the kernel buffer
|
|
becomes full or a timeout occurs.
|
|
This is useful for programs like
|
|
.Xr rarpd 8 ,
|
|
which must respond to messages in real time.
|
|
The default for a new file is off.
|
|
.It Dv BIOCSETF (struct bpf_program)
|
|
Sets the filter program used by the kernel to discard uninteresting
|
|
packets.
|
|
An array of instructions and its length are passed in using the following structure:
|
|
.Bd -literal -offset indent
|
|
struct bpf_program {
|
|
u_int bf_len;
|
|
struct bpf_insn *bf_insns;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The filter program is pointed to by the
|
|
.Va bf_insns
|
|
field while its length in units of
|
|
.Sq struct bpf_insn
|
|
is given by the
|
|
.Va bf_len
|
|
field.
|
|
Also, the actions of
|
|
.Dv BIOCFLUSH
|
|
are performed.
|
|
.Pp
|
|
See section
|
|
.Sy FILTER MACHINE
|
|
for an explanation of the filter language.
|
|
.It Dv BIOCVERSION (struct bpf_version)
|
|
Returns the major and minor version numbers of the filter language currently
|
|
recognized by the kernel.
|
|
Before installing a filter, applications must check
|
|
that the current version is compatible with the running kernel.
|
|
Version numbers are compatible if the major numbers match and the
|
|
application minor is less than or equal to the kernel minor.
|
|
The kernel version number is returned in the following structure:
|
|
.Bd -literal -offset indent
|
|
struct bpf_version {
|
|
u_short bv_major;
|
|
u_short bv_minor;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The current version numbers are given by
|
|
.Dv BPF_MAJOR_VERSION
|
|
and
|
|
.Dv BPF_MINOR_VERSION
|
|
from
|
|
.In net/bpf.h .
|
|
An incompatible filter
|
|
may result in undefined behavior (most likely, an error returned by
|
|
.Xr ioctl 2
|
|
or haphazard packet matching).
|
|
.It Dv BIOCSRSIG BIOCGRSIG (u_int)
|
|
Sets or gets the receive signal.
|
|
This signal will be sent to the process or process group specified by
|
|
.Dv FIOSETOWN .
|
|
It defaults to
|
|
.Dv SIGIO .
|
|
.It Dv BIOCGHDRCMPLT BIOCSHDRCMPLT (u_int)
|
|
Sets or gets the status of the
|
|
.Dq header complete
|
|
flag.
|
|
Set to zero if the link level source address should be filled in
|
|
automatically by the interface output routine.
|
|
Set to one if the link level source address will be written,
|
|
as provided, to the wire.
|
|
This flag is initialized to zero by default.
|
|
.It Dv BIOCGSEESENT BIOCSSEESENT (u_int)
|
|
Enable/disable or get the
|
|
.Dq see sent
|
|
flag status.
|
|
If enabled, packets sent by the host (not from
|
|
.Nm )
|
|
will be passed to the filter.
|
|
By default, the flag is enabled (value is 1).
|
|
.It Dv BIOCFEEDBACK BIOCSFEEDBACK BIOCGFEEDBACK (u_int)
|
|
Set (or get)
|
|
.Dq packet feedback mode .
|
|
This allows injected packets to be fed back as input to the interface when
|
|
output via the interface is successful.
|
|
The first name is meant for
|
|
.Fx
|
|
compatibility, the two others follow the Get/Set convention.
|
|
.\"When
|
|
.\".Dv BPF_D_INOUT
|
|
.\"direction is set, injected
|
|
Injected
|
|
outgoing packets are not returned by BPF to avoid
|
|
duplication.
|
|
This flag is initialized to zero by default.
|
|
.El
|
|
.Sh STANDARD IOCTLS
|
|
.Nm
|
|
now supports several standard
|
|
.Xr ioctl 2 Ns 's
|
|
which allow the user to do async and/or non-blocking I/O to an open
|
|
.Nm bpf
|
|
file descriptor.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Dv FIONREAD (int)
|
|
Returns the number of bytes that are immediately available for reading.
|
|
.It Dv FIONBIO (int)
|
|
Set or clear non-blocking I/O.
|
|
If arg is non-zero, then doing a
|
|
.Xr read 2
|
|
when no data is available will return -1 and
|
|
.Va errno
|
|
will be set to
|
|
.Er EAGAIN .
|
|
If arg is zero, non-blocking I/O is disabled.
|
|
Note: setting this
|
|
overrides the timeout set by
|
|
.Dv BIOCSRTIMEOUT .
|
|
.It Dv FIOASYNC (int)
|
|
Enable or disable async I/O.
|
|
When enabled (arg is non-zero), the process or process group specified by
|
|
.Dv FIOSETOWN
|
|
will start receiving SIGIO's when packets
|
|
arrive.
|
|
Note that you must do an
|
|
.Dv FIOSETOWN
|
|
in order for this to take effect, as
|
|
the system will not default this for you.
|
|
The signal may be changed via
|
|
.Dv BIOCSRSIG .
|
|
.It Dv FIOSETOWN FIOGETOWN (int)
|
|
Set or get the process or process group (if negative) that should receive SIGIO
|
|
when packets are available.
|
|
The signal may be changed using
|
|
.Dv BIOCSRSIG
|
|
(see above).
|
|
.El
|
|
.Sh BPF HEADER
|
|
The following structure is prepended to each packet returned by
|
|
.Xr read 2 :
|
|
.Bd -literal -offset indent
|
|
struct bpf_hdr {
|
|
struct bpf_timeval bh_tstamp;
|
|
uint32_t bh_caplen;
|
|
uint32_t bh_datalen;
|
|
uint16_t bh_hdrlen;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields, whose values are stored in host order, are:
|
|
.Bl -tag -width bh_datalen -offset indent
|
|
.It Va bh_tstamp
|
|
The time at which the packet was processed by the packet filter.
|
|
This structure differs from the standard
|
|
.Vt struct timeval
|
|
in that both members are of type
|
|
.Vt long .
|
|
.It Va bh_caplen
|
|
The length of the captured portion of the packet.
|
|
This is the minimum of
|
|
the truncation amount specified by the filter and the length of the packet.
|
|
.It Va bh_datalen
|
|
The length of the packet off the wire.
|
|
This value is independent of the truncation amount specified by the filter.
|
|
.It Va bh_hdrlen
|
|
The length of the BPF header, which may not be equal to
|
|
.Em sizeof(struct bpf_hdr) .
|
|
.El
|
|
.Pp
|
|
The
|
|
.Va bh_hdrlen
|
|
field exists to account for
|
|
padding between the header and the link level protocol.
|
|
The purpose here is to guarantee proper alignment of the packet
|
|
data structures, which is required on alignment sensitive
|
|
architectures and improves performance on many other architectures.
|
|
The packet filter ensures that the
|
|
.Va bpf_hdr
|
|
and the
|
|
.Em network layer
|
|
header will be word aligned.
|
|
Suitable precautions must be taken when accessing the link layer
|
|
protocol fields on alignment restricted machines.
|
|
(This isn't a problem on an Ethernet, since
|
|
the type field is a short falling on an even offset,
|
|
and the addresses are probably accessed in a bytewise fashion).
|
|
.Pp
|
|
Additionally, individual packets are padded so that each starts
|
|
on a word boundary.
|
|
This requires that an application
|
|
has some knowledge of how to get from packet to packet.
|
|
The macro
|
|
.Dv BPF_WORDALIGN
|
|
is defined in
|
|
.In net/bpf.h
|
|
to facilitate this process.
|
|
It rounds up its argument
|
|
to the nearest word aligned value (where a word is
|
|
.Dv BPF_ALIGNMENT
|
|
bytes wide).
|
|
.Pp
|
|
For example, if
|
|
.Sq Va p
|
|
points to the start of a packet, this expression
|
|
will advance it to the next packet:
|
|
.Pp
|
|
.Dl p = (char *)p + BPF_WORDALIGN(p-\*[Gt]bh_hdrlen + p-\*[Gt]bh_caplen)
|
|
.Pp
|
|
For the alignment mechanisms to work properly, the
|
|
buffer passed to
|
|
.Xr read 2
|
|
must itself be word aligned.
|
|
.Xr malloc 3
|
|
will always return an aligned buffer.
|
|
.Sh FILTER MACHINE
|
|
A filter program is an array of instructions, with all branches forwardly
|
|
directed, terminated by a
|
|
.Sy return
|
|
instruction.
|
|
Each instruction performs some action on the pseudo-machine state,
|
|
which consists of an accumulator, index register, scratch memory store,
|
|
and implicit program counter.
|
|
.Pp
|
|
The following structure defines the instruction format:
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn {
|
|
uint16_t code;
|
|
u_char jt;
|
|
u_char jf;
|
|
uint32_t k;
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The
|
|
.Va k
|
|
field is used in different ways by different instructions,
|
|
and the
|
|
.Va jt
|
|
and
|
|
.Va jf
|
|
fields are used as offsets
|
|
by the branch instructions.
|
|
The opcodes are encoded in a semi-hierarchical fashion.
|
|
There are eight classes of instructions: BPF_LD, BPF_LDX, BPF_ST, BPF_STX,
|
|
BPF_ALU, BPF_JMP, BPF_RET, and BPF_MISC.
|
|
Various other mode and
|
|
operator bits are or'd into the class to give the actual instructions.
|
|
The classes and modes are defined in
|
|
.In net/bpf.h .
|
|
.Pp
|
|
Below are the semantics for each defined BPF instruction.
|
|
We use the convention that A is the accumulator, X is the index register,
|
|
P[] packet data, and M[] scratch memory store.
|
|
P[i:n] gives the data at byte offset
|
|
.Dq i
|
|
in the packet,
|
|
interpreted as a word (n=4),
|
|
unsigned halfword (n=2), or unsigned byte (n=1).
|
|
M[i] gives the i'th word in the scratch memory store, which is only
|
|
addressed in word units.
|
|
The memory store is indexed from 0 to BPF_MEMWORDS-1.
|
|
.Va k ,
|
|
.Va jt ,
|
|
and
|
|
.Va jf
|
|
are the corresponding fields in the
|
|
instruction definition.
|
|
.Dq len
|
|
refers to the length of the packet.
|
|
.Bl -tag -width indent -offset indent
|
|
.It Sy BPF_LD
|
|
These instructions copy a value into the accumulator.
|
|
The type of the source operand is specified by an
|
|
.Dq addressing mode
|
|
and can be a constant
|
|
.Sy ( BPF_IMM ) ,
|
|
packet data at a fixed offset
|
|
.Sy ( BPF_ABS ) ,
|
|
packet data at a variable offset
|
|
.Sy ( BPF_IND ) ,
|
|
the packet length
|
|
.Sy ( BPF_LEN ) ,
|
|
or a word in the scratch memory store
|
|
.Sy ( BPF_MEM ) .
|
|
For
|
|
.Sy BPF_IND
|
|
and
|
|
.Sy BPF_ABS ,
|
|
the data size must be specified as a word
|
|
.Sy ( BPF_W ) ,
|
|
halfword
|
|
.Sy ( BPF_H ) ,
|
|
or byte
|
|
.Sy ( BPF_B ) .
|
|
Arithmetic overflow when calculating a variable offset terminates
|
|
the filter program and the packet is ignored.
|
|
The semantics of all the recognized BPF_LD instructions follow.
|
|
.Bl -column "BPF_LD_BPF_W_BPF_ABS" "A \*[Lt]- P[k:4]" -offset indent
|
|
.It Sy BPF_LD+BPF_W+BPF_ABS Ta A \*[Lt]- P[k:4]
|
|
.It Sy BPF_LD+BPF_H+BPF_ABS Ta A \*[Lt]- P[k:2]
|
|
.It Sy BPF_LD+BPF_B+BPF_ABS Ta A \*[Lt]- P[k:1]
|
|
.It Sy BPF_LD+BPF_W+BPF_IND Ta A \*[Lt]- P[X+k:4]
|
|
.It Sy BPF_LD+BPF_H+BPF_IND Ta A \*[Lt]- P[X+k:2]
|
|
.It Sy BPF_LD+BPF_B+BPF_IND Ta A \*[Lt]- P[X+k:1]
|
|
.It Sy BPF_LD+BPF_W+BPF_LEN Ta A \*[Lt]- len
|
|
.It Sy BPF_LD+BPF_IMM Ta A \*[Lt]- k
|
|
.It Sy BPF_LD+BPF_MEM Ta A \*[Lt]- M[k]
|
|
.El
|
|
.It Sy BPF_LDX
|
|
These instructions load a value into the index register.
|
|
Note that the addressing modes are more restricted than those of
|
|
the accumulator loads, but they include
|
|
.Sy BPF_MSH ,
|
|
a hack for efficiently loading the IP header length.
|
|
.Bl -column "BPF_LDX_BPF_W_BPF_IMM" "X \*[Lt]- k" -offset indent
|
|
.It Sy BPF_LDX+BPF_W+BPF_IMM Ta X \*[Lt]- k
|
|
.It Sy BPF_LDX+BPF_W+BPF_MEM Ta X \*[Lt]- M[k]
|
|
.It Sy BPF_LDX+BPF_W+BPF_LEN Ta X \*[Lt]- len
|
|
.It Sy BPF_LDX+BPF_B+BPF_MSH Ta X \*[Lt]- 4*(P[k:1]\*[Am]0xf)
|
|
.El
|
|
.It Sy BPF_ST
|
|
This instruction stores the accumulator into the scratch memory.
|
|
We do not need an addressing mode since there is only one possibility
|
|
for the destination.
|
|
.Bl -column "BPF_ST" "M[k] \*[Lt]- A" -offset indent
|
|
.It Sy BPF_ST Ta M[k] \*[Lt]- A
|
|
.El
|
|
.It Sy BPF_STX
|
|
This instruction stores the index register in the scratch memory store.
|
|
.Bl -column "BPF_STX" "M[k] \*[Lt]- X" -offset indent
|
|
.It Sy BPF_STX Ta M[k] \*[Lt]- X
|
|
.El
|
|
.It Sy BPF_ALU
|
|
The alu instructions perform operations between the accumulator and
|
|
index register or constant, and store the result back in the accumulator.
|
|
For binary operations, a source mode is required
|
|
.Sy ( BPF_K
|
|
or
|
|
.Sy BPF_X ) .
|
|
.Bl -column "BPF_ALU_BPF_ADD_BPF_K" "A \*[Lt]- A + k" -offset indent
|
|
.It Sy BPF_ALU+BPF_ADD+BPF_K Ta A \*[Lt]- A + k
|
|
.It Sy BPF_ALU+BPF_SUB+BPF_K Ta A \*[Lt]- A - k
|
|
.It Sy BPF_ALU+BPF_MUL+BPF_K Ta A \*[Lt]- A * k
|
|
.It Sy BPF_ALU+BPF_DIV+BPF_K Ta A \*[Lt]- A / k
|
|
.It Sy BPF_ALU+BPF_AND+BPF_K Ta A \*[Lt]- A \*[Am] k
|
|
.It Sy BPF_ALU+BPF_OR+BPF_K Ta A \*[Lt]- A | k
|
|
.It Sy BPF_ALU+BPF_LSH+BPF_K Ta A \*[Lt]- A \*[Lt]\*[Lt] k
|
|
.It Sy BPF_ALU+BPF_RSH+BPF_K Ta A \*[Lt]- A \*[Gt]\*[Gt] k
|
|
.It Sy BPF_ALU+BPF_ADD+BPF_X Ta A \*[Lt]- A + X
|
|
.It Sy BPF_ALU+BPF_SUB+BPF_X Ta A \*[Lt]- A - X
|
|
.It Sy BPF_ALU+BPF_MUL+BPF_X Ta A \*[Lt]- A * X
|
|
.It Sy BPF_ALU+BPF_DIV+BPF_X Ta A \*[Lt]- A / X
|
|
.It Sy BPF_ALU+BPF_AND+BPF_X Ta A \*[Lt]- A \*[Am] X
|
|
.It Sy BPF_ALU+BPF_OR+BPF_X Ta A \*[Lt]- A | X
|
|
.It Sy BPF_ALU+BPF_LSH+BPF_X Ta A \*[Lt]- A \*[Lt]\*[Lt] X
|
|
.It Sy BPF_ALU+BPF_RSH+BPF_X Ta A \*[Lt]- A \*[Gt]\*[Gt] X
|
|
.It Sy BPF_ALU+BPF_NEG Ta A \*[Lt]- -A
|
|
.El
|
|
.It Sy BPF_JMP
|
|
The jump instructions alter flow of control.
|
|
Conditional jumps compare the accumulator against a constant
|
|
.Sy ( BPF_K )
|
|
or the index register
|
|
.Sy ( BPF_X ) .
|
|
If the result is true (or non-zero),
|
|
the true branch is taken, otherwise the false branch is taken.
|
|
Jump offsets are encoded in 8 bits so the longest jump is 256 instructions.
|
|
However, the jump always
|
|
.Sy ( BPF_JA )
|
|
opcode uses the 32 bit
|
|
.Va k
|
|
field as the offset, allowing arbitrarily distant destinations.
|
|
All conditionals use unsigned comparison conventions.
|
|
.Bl -column "BPF_JMP+BPF_JGE+BPF_K" "pc += (A \*[Ge] k) ? jt : jf" -offset indent
|
|
.It Sy BPF_JMP+BPF_JA Ta pc += k
|
|
.It Sy BPF_JMP+BPF_JGT+BPF_K Ta "pc += (A \*[Gt] k) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JGE+BPF_K Ta "pc += (A \*[Ge] k) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JEQ+BPF_K Ta "pc += (A == k) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JSET+BPF_K Ta "pc += (A \*[Am] k) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JGT+BPF_X Ta "pc += (A \*[Gt] X) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JGE+BPF_X Ta "pc += (A \*[Ge] X) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JEQ+BPF_X Ta "pc += (A == X) ? jt : jf"
|
|
.It Sy BPF_JMP+BPF_JSET+BPF_X Ta "pc += (A \*[Am] X) ? jt : jf"
|
|
.El
|
|
.It Sy BPF_RET
|
|
The return instructions terminate the filter program and specify the amount
|
|
of packet to accept (i.e., they return the truncation amount).
|
|
A return value of zero indicates that the packet should be ignored.
|
|
The return value is either a constant
|
|
.Sy ( BPF_K )
|
|
or the accumulator
|
|
.Sy ( BPF_A ) .
|
|
.Bl -column "BPF_RET+BPF_A" "accept A bytes" -offset indent
|
|
.It Sy BPF_RET+BPF_A Ta accept A bytes
|
|
.It Sy BPF_RET+BPF_K Ta accept k bytes
|
|
.El
|
|
.It Sy BPF_MISC
|
|
The miscellaneous category was created for anything that doesn't
|
|
fit into the above classes, and for any new instructions that might need to
|
|
be added.
|
|
Currently, these are the register transfer instructions
|
|
that copy the index register to the accumulator or vice versa.
|
|
.Bl -column "BPF_MISC+BPF_TAX" "X \*[Lt]- A" -offset indent
|
|
.It Sy BPF_MISC+BPF_TAX Ta X \*[Lt]- A
|
|
.It Sy BPF_MISC+BPF_TXA Ta A \*[Lt]- X
|
|
.El
|
|
.Pp
|
|
Also, two instructions to call a "coprocessor" if initialized by the kernel
|
|
component.
|
|
There is no coprocessor by default.
|
|
.Bl -column "BPF_MISC+BPF_COP" "A \*[Lt]- funcs[X](...)" -offset indent
|
|
.It Sy BPF_MISC+BPF_COP Ta A \*[Lt]- funcs[k](..)
|
|
.It Sy BPF_MISC+BPF_COPX Ta A \*[Lt]- funcs[X](..)
|
|
.El
|
|
.Pp
|
|
If the coprocessor is not set or the function index is out of range, these
|
|
instructions will abort the program and return zero.
|
|
.El
|
|
.Pp
|
|
The BPF interface provides the following macros to facilitate
|
|
array initializers:
|
|
.Bd -unfilled -offset indent
|
|
.Sy BPF_STMT No (opcode, operand)
|
|
.Sy BPF_JUMP No (opcode, operand, true_offset, false_offset)
|
|
.Ed
|
|
.Sh SYSCTLS
|
|
The following sysctls are available when
|
|
.Nm
|
|
is enabled:
|
|
.Pp
|
|
.Bl -tag -width "XnetXbpfXmaxbufsizeXX"
|
|
.It Li net.bpf.maxbufsize
|
|
Sets the maximum buffer size available for
|
|
.Nm
|
|
peers.
|
|
.It Li net.bpf.stats
|
|
Shows
|
|
.Nm
|
|
statistics.
|
|
They can be retrieved with the
|
|
.Xr netstat 1
|
|
utility.
|
|
.It Li net.bpf.peers
|
|
Shows the current
|
|
.Nm
|
|
peers.
|
|
This is only available to the super user and can also be retrieved with the
|
|
.Xr netstat 1
|
|
utility.
|
|
.El
|
|
.Pp
|
|
On architectures with
|
|
.Xr bpfjit 4
|
|
support, the additional sysctl is available:
|
|
.Pp
|
|
.Bl -tag -width "XnetXbpfXjitXX"
|
|
.It Li net.bpf.jit
|
|
Toggle
|
|
.Nm Just-In-Time
|
|
compilation of new filter programs.
|
|
In order to enable Just-In-Time compilation,
|
|
the bpfjit kernel module must be loaded.
|
|
Changing a value of this sysctl doesn't affect
|
|
existing filter programs.
|
|
.El
|
|
.Sh FILES
|
|
.Pa /dev/bpf
|
|
.Sh EXAMPLES
|
|
The following filter is taken from the Reverse ARP Daemon.
|
|
It accepts only Reverse ARP requests.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_REVARP, 0, 3),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, REVARP_REQUEST, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, sizeof(struct ether_arp) +
|
|
sizeof(struct ether_header)),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Pp
|
|
This filter accepts only IP packets between host 128.3.112.15 and
|
|
128.3.112.35.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 8),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 26),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 2),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 3, 4),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x80037023, 0, 3),
|
|
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, 30),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 0x8003700f, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Finally, this filter returns only TCP finger packets.
|
|
We must parse the IP header to reach the TCP header.
|
|
The
|
|
.Sy BPF_JSET
|
|
instruction checks that the IP fragment offset is 0 so we are sure
|
|
that we have a TCP header.
|
|
.Bd -literal -offset indent
|
|
struct bpf_insn insns[] = {
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 12),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, ETHERTYPE_IP, 0, 10),
|
|
BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 23),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, IPPROTO_TCP, 0, 8),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_ABS, 20),
|
|
BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, 0x1fff, 6, 0),
|
|
BPF_STMT(BPF_LDX+BPF_B+BPF_MSH, 14),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 14),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 2, 0),
|
|
BPF_STMT(BPF_LD+BPF_H+BPF_IND, 16),
|
|
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, 79, 0, 1),
|
|
BPF_STMT(BPF_RET+BPF_K, (u_int)-1),
|
|
BPF_STMT(BPF_RET+BPF_K, 0),
|
|
};
|
|
.Ed
|
|
.Sh SEE ALSO
|
|
.Xr ioctl 2 ,
|
|
.Xr read 2 ,
|
|
.Xr select 2 ,
|
|
.Xr signal 3 ,
|
|
.Xr tcpdump 8
|
|
.Rs
|
|
.%T "The BSD Packet Filter: A New Architecture for User-level Packet Capture"
|
|
.%A S. McCanne
|
|
.%A V. Jacobson
|
|
.%J Proceedings of the 1993 Winter USENIX
|
|
.%C Technical Conference, San Diego, CA
|
|
.Re
|
|
.Sh HISTORY
|
|
The Enet packet filter was created in 1980 by Mike Accetta and
|
|
Rick Rashid at Carnegie-Mellon University.
|
|
Jeffrey Mogul, at Stanford, ported the code to BSD and continued
|
|
its development from 1983 on.
|
|
Since then, it has evolved into the ULTRIX Packet Filter
|
|
at DEC, a STREAMS NIT module under SunOS 4.1, and BPF.
|
|
.Sh AUTHORS
|
|
Steven McCanne, of Lawrence Berkeley Laboratory, implemented BPF in
|
|
Summer 1990.
|
|
The design was in collaboration with Van Jacobson,
|
|
also of Lawrence Berkeley Laboratory.
|
|
.Sh BUGS
|
|
The read buffer must be of a fixed size (returned by the
|
|
.Dv BIOCGBLEN
|
|
ioctl).
|
|
.Pp
|
|
A file that does not request promiscuous mode may receive promiscuously
|
|
received packets as a side effect of another file requesting this
|
|
mode on the same hardware interface.
|
|
This could be fixed in the kernel with additional processing overhead.
|
|
However, we favor the model where
|
|
all files must assume that the interface is promiscuous, and if
|
|
so desired, must use a filter to reject foreign packets.
|
|
.Pp
|
|
Under SunOS, if a BPF application reads more than 2^31 bytes of
|
|
data, read will fail in
|
|
.Er EINVAL .
|
|
You can either fix the bug in SunOS,
|
|
or lseek to 0 when read fails for this reason.
|
|
.Pp
|
|
.Dq Immediate mode
|
|
and the
|
|
.Dq read timeout
|
|
are misguided features.
|
|
This functionality can be emulated with non-blocking mode and
|
|
.Xr select 2 .
|