553 lines
16 KiB
Groff
553 lines
16 KiB
Groff
.\" $NetBSD: kqueue.2,v 1.20 2006/10/13 20:58:50 wiz Exp $
|
|
.\"
|
|
.\" Copyright (c) 2000 Jonathan Lemon
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Copyright (c) 2001, 2002, 2003 The NetBSD Foundation, Inc.
|
|
.\" All rights reserved.
|
|
.\"
|
|
.\" Portions of this documentation is derived from text contributed by
|
|
.\" Luke Mewburn.
|
|
.\"
|
|
.\" Redistribution and use in source and binary forms, with or without
|
|
.\" modification, are permitted provided that the following conditions
|
|
.\" are met:
|
|
.\" 1. Redistributions of source code must retain the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer.
|
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
|
.\" notice, this list of conditions and the following disclaimer in the
|
|
.\" documentation and/or other materials provided with the distribution.
|
|
.\"
|
|
.\" THIS SOFTWARE IS PROVIDED ``AS IS'' AND
|
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
|
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
.\" SUCH DAMAGE.
|
|
.\"
|
|
.\" $FreeBSD: src/lib/libc/sys/kqueue.2,v 1.22 2001/06/27 19:55:57 dd Exp $
|
|
.\"
|
|
.Dd February 4, 2003
|
|
.Dt KQUEUE 2
|
|
.Os
|
|
.Sh NAME
|
|
.Nm kqueue ,
|
|
.Nm kevent
|
|
.Nd kernel event notification mechanism
|
|
.Sh LIBRARY
|
|
.Lb libc
|
|
.Sh SYNOPSIS
|
|
.In sys/event.h
|
|
.In sys/time.h
|
|
.Ft int
|
|
.Fn kqueue "void"
|
|
.Ft int
|
|
.Fn kevent "int kq" "const struct kevent *changelist" "size_t nchanges" "struct kevent *eventlist" "size_t nevents" "const struct timespec *timeout"
|
|
.Fn EV_SET "\*[Am]kev" ident filter flags fflags data udata
|
|
.Sh DESCRIPTION
|
|
.Fn kqueue
|
|
provides a generic method of notifying the user when an event
|
|
happens or a condition holds, based on the results of small
|
|
pieces of kernel code termed filters.
|
|
A kevent is identified by the (ident, filter) pair; there may only
|
|
be one unique kevent per kqueue.
|
|
.Pp
|
|
The filter is executed upon the initial registration of a kevent
|
|
in order to detect whether a preexisting condition is present, and is also
|
|
executed whenever an event is passed to the filter for evaluation.
|
|
If the filter determines that the condition should be reported,
|
|
then the kevent is placed on the kqueue for the user to retrieve.
|
|
.Pp
|
|
The filter is also run when the user attempts to retrieve the kevent
|
|
from the kqueue.
|
|
If the filter indicates that the condition that triggered
|
|
the event no longer holds, the kevent is removed from the kqueue and
|
|
is not returned.
|
|
.Pp
|
|
Multiple events which trigger the filter do not result in multiple
|
|
kevents being placed on the kqueue; instead, the filter will aggregate
|
|
the events into a single struct kevent.
|
|
Calling
|
|
.Fn close
|
|
on a file descriptor will remove any kevents that reference the descriptor.
|
|
.Pp
|
|
.Fn kqueue
|
|
creates a new kernel event queue and returns a descriptor.
|
|
The queue is not inherited by a child created with
|
|
.Xr fork 2 .
|
|
.ig
|
|
However, if
|
|
.Xr rfork 2
|
|
is called without the
|
|
.Dv RFFDG
|
|
flag, then the descriptor table is shared,
|
|
which will allow sharing of the kqueue between two processes.
|
|
..
|
|
.Pp
|
|
.Fn kevent
|
|
is used to register events with the queue, and return any pending
|
|
events to the user.
|
|
.Fa changelist
|
|
is a pointer to an array of
|
|
.Va kevent
|
|
structures, as defined in
|
|
.Aq Pa sys/event.h .
|
|
All changes contained in the
|
|
.Fa changelist
|
|
are applied before any pending events are read from the queue.
|
|
.Fa nchanges
|
|
gives the size of
|
|
.Fa changelist .
|
|
.Fa eventlist
|
|
is a pointer to an array of kevent structures.
|
|
.Fa nevents
|
|
determines the size of
|
|
.Fa eventlist .
|
|
If
|
|
.Fa timeout
|
|
is a
|
|
.No non- Ns Dv NULL
|
|
pointer, it specifies a maximum interval to wait
|
|
for an event, which will be interpreted as a struct timespec.
|
|
If
|
|
.Fa timeout
|
|
is a
|
|
.Dv NULL
|
|
pointer,
|
|
.Fn kevent
|
|
waits indefinitely.
|
|
To effect a poll, the
|
|
.Fa timeout
|
|
argument should be
|
|
.No non- Ns Dv NULL ,
|
|
pointing to a zero-valued
|
|
.Va timespec
|
|
structure.
|
|
The same array may be used for the
|
|
.Fa changelist
|
|
and
|
|
.Fa eventlist .
|
|
.Pp
|
|
.Fn EV_SET
|
|
is a macro which is provided for ease of initializing a
|
|
kevent structure.
|
|
.Pp
|
|
The
|
|
.Va kevent
|
|
structure is defined as:
|
|
.Bd -literal
|
|
struct kevent {
|
|
uintptr_t ident; /* identifier for this event */
|
|
uint32_t filter; /* filter for event */
|
|
uint32_t flags; /* action flags for kqueue */
|
|
uint32_t fflags; /* filter flag value */
|
|
int64_t data; /* filter data value */
|
|
intptr_t udata; /* opaque user data identifier */
|
|
};
|
|
.Ed
|
|
.Pp
|
|
The fields of
|
|
.Fa struct kevent
|
|
are:
|
|
.Bl -tag -width XXXfilter -offset indent
|
|
.It ident
|
|
Value used to identify this event.
|
|
The exact interpretation is determined by the attached filter,
|
|
but often is a file descriptor.
|
|
.It filter
|
|
Identifies the kernel filter used to process this event.
|
|
There are pre-defined system filters (which are described below), and
|
|
other filters may be added by kernel subsystems as necessary.
|
|
.It flags
|
|
Actions to perform on the event.
|
|
.It fflags
|
|
Filter-specific flags.
|
|
.It data
|
|
Filter-specific data value.
|
|
.It udata
|
|
Opaque user-defined value passed through the kernel unchanged.
|
|
.El
|
|
.Pp
|
|
The
|
|
.Va flags
|
|
field can contain the following values:
|
|
.Bl -tag -width XXXEV_ONESHOT -offset indent
|
|
.It EV_ADD
|
|
Adds the event to the kqueue.
|
|
Re-adding an existing event will modify the parameters of the original
|
|
event, and not result in a duplicate entry.
|
|
Adding an event automatically enables it,
|
|
unless overridden by the EV_DISABLE flag.
|
|
.It EV_ENABLE
|
|
Permit
|
|
.Fn kevent
|
|
to return the event if it is triggered.
|
|
.It EV_DISABLE
|
|
Disable the event so
|
|
.Fn kevent
|
|
will not return it.
|
|
The filter itself is not disabled.
|
|
.It EV_DELETE
|
|
Removes the event from the kqueue.
|
|
Events which are attached to file descriptors are automatically deleted
|
|
on the last close of the descriptor.
|
|
.It EV_ONESHOT
|
|
Causes the event to return only the first occurrence of the filter
|
|
being triggered.
|
|
After the user retrieves the event from the kqueue, it is deleted.
|
|
.It EV_CLEAR
|
|
After the event is retrieved by the user, its state is reset.
|
|
This is useful for filters which report state transitions
|
|
instead of the current state.
|
|
Note that some filters may automatically set this flag internally.
|
|
.It EV_EOF
|
|
Filters may set this flag to indicate filter-specific EOF condition.
|
|
.It EV_ERROR
|
|
See
|
|
.Sx RETURN VALUES
|
|
below.
|
|
.El
|
|
.Ss Filters
|
|
Filters are identified by a number.
|
|
There are two types of filters; pre-defined filters which
|
|
are described below, and third-party filters that may be added with
|
|
.Xr kfilter_register 9
|
|
by kernel sub-systems, third-party device drivers, or loadable
|
|
kernel modules.
|
|
.Pp
|
|
As a third-party filter is referenced by a well-known name instead
|
|
of a statically assigned number, two
|
|
.Xr ioctl 2 Ns s
|
|
are supported on the file descriptor returned by
|
|
.Fn kqueue
|
|
to map a filter name to a filter number, and vice-versa (passing
|
|
arguments in a structure described below):
|
|
.Bl -tag -width KFILTER_BYFILTER -offset indent
|
|
.It KFILTER_BYFILTER
|
|
Map
|
|
.Va filter
|
|
to
|
|
.Va name ,
|
|
which is of size
|
|
.Va len .
|
|
.It KFILTER_BYNAME
|
|
Map
|
|
.Va name
|
|
to
|
|
.Va filter .
|
|
.Va len
|
|
is ignored.
|
|
.El
|
|
.Pp
|
|
The following structure is used to pass arguments in and out of the
|
|
.Xr ioctl 2 :
|
|
.Bd -literal -offset indent
|
|
struct kfilter_mapping {
|
|
char *name; /* name to lookup or return */
|
|
size_t len; /* length of name */
|
|
uint32_t filter; /* filter to lookup or return */
|
|
};
|
|
.Ed
|
|
.Pp
|
|
Arguments may be passed to and from the filter via the
|
|
.Va fflags
|
|
and
|
|
.Va data
|
|
fields in the kevent structure.
|
|
.Pp
|
|
The predefined system filters are:
|
|
.Bl -tag -width EVFILT_SIGNAL
|
|
.It EVFILT_READ
|
|
Takes a descriptor as the identifier, and returns whenever
|
|
there is data available to read.
|
|
The behavior of the filter is slightly different depending
|
|
on the descriptor type.
|
|
.Pp
|
|
.Bl -tag -width 2n
|
|
.It Sockets
|
|
Sockets which have previously been passed to
|
|
.Fn listen
|
|
return when there is an incoming connection pending.
|
|
.Va data
|
|
contains the size of the listen backlog (i.e., the number of
|
|
connections ready to be accepted with
|
|
.Xr accept 2 . )
|
|
.Pp
|
|
Other socket descriptors return when there is data to be read,
|
|
subject to the
|
|
.Dv SO_RCVLOWAT
|
|
value of the socket buffer.
|
|
This may be overridden with a per-filter low water mark at the
|
|
time the filter is added by setting the
|
|
NOTE_LOWAT
|
|
flag in
|
|
.Va fflags ,
|
|
and specifying the new low water mark in
|
|
.Va data .
|
|
On return,
|
|
.Va data
|
|
contains the number of bytes in the socket buffer.
|
|
.Pp
|
|
If the read direction of the socket has shutdown, then the filter
|
|
also sets EV_EOF in
|
|
.Va flags ,
|
|
and returns the socket error (if any) in
|
|
.Va fflags .
|
|
It is possible for EOF to be returned (indicating the connection is gone)
|
|
while there is still data pending in the socket buffer.
|
|
.It Vnodes
|
|
Returns when the file pointer is not at the end of file.
|
|
.Va data
|
|
contains the offset from current position to end of file,
|
|
and may be negative.
|
|
.It "Fifos, Pipes"
|
|
Returns when the there is data to read;
|
|
.Va data
|
|
contains the number of bytes available.
|
|
.Pp
|
|
When the last writer disconnects, the filter will set EV_EOF in
|
|
.Va flags .
|
|
This may be cleared by passing in EV_CLEAR, at which point the
|
|
filter will resume waiting for data to become available before
|
|
returning.
|
|
.El
|
|
.It EVFILT_WRITE
|
|
Takes a descriptor as the identifier, and returns whenever
|
|
it is possible to write to the descriptor.
|
|
For sockets, pipes, fifos, and ttys,
|
|
.Va data
|
|
will contain the amount of space remaining in the write buffer.
|
|
The filter will set EV_EOF when the reader disconnects, and for
|
|
the fifo case, this may be cleared by use of EV_CLEAR.
|
|
Note that this filter is not supported for vnodes.
|
|
.Pp
|
|
For sockets, the low water mark and socket error handling is
|
|
identical to the EVFILT_READ case.
|
|
.It EVFILT_AIO
|
|
This is not implemented in
|
|
.Nx .
|
|
.ig
|
|
The sigevent portion of the AIO request is filled in, with
|
|
.Va sigev_notify_kqueue
|
|
containing the descriptor of the kqueue that the event should
|
|
be attached to,
|
|
.Va sigev_value
|
|
containing the udata value, and
|
|
.Va sigev_notify
|
|
set to SIGEV_EVENT.
|
|
When the aio_* function is called, the event will be registered
|
|
with the specified kqueue, and the
|
|
.Va ident
|
|
argument set to the
|
|
.Fa struct aiocb
|
|
returned by the aio_* function.
|
|
The filter returns under the same conditions as aio_error.
|
|
.Pp
|
|
Alternatively, a kevent structure may be initialized, with
|
|
.Va ident
|
|
containing the descriptor of the kqueue, and the
|
|
address of the kevent structure placed in the
|
|
.Va aio_lio_opcode
|
|
field of the AIO request.
|
|
However, this approach will not work on
|
|
architectures with 64-bit pointers, and should be considered deprecated.
|
|
..
|
|
.It EVFILT_VNODE
|
|
Takes a file descriptor as the identifier and the events to watch for in
|
|
.Va fflags ,
|
|
and returns when one or more of the requested events occurs on the descriptor.
|
|
The events to monitor are:
|
|
.Bl -tag -width XXNOTE_RENAME
|
|
.It NOTE_DELETE
|
|
.Fn unlink
|
|
was called on the file referenced by the descriptor.
|
|
.It NOTE_WRITE
|
|
A write occurred on the file referenced by the descriptor.
|
|
.It NOTE_EXTEND
|
|
The file referenced by the descriptor was extended.
|
|
.It NOTE_ATTRIB
|
|
The file referenced by the descriptor had its attributes changed.
|
|
.It NOTE_LINK
|
|
The link count on the file changed.
|
|
.It NOTE_RENAME
|
|
The file referenced by the descriptor was renamed.
|
|
.It NOTE_REVOKE
|
|
Access to the file was revoked via
|
|
.Xr revoke 2
|
|
or the underlying fileystem was unmounted.
|
|
.El
|
|
.Pp
|
|
On return,
|
|
.Va fflags
|
|
contains the events which triggered the filter.
|
|
.It EVFILT_PROC
|
|
Takes the process ID to monitor as the identifier and the events to watch for
|
|
in
|
|
.Va fflags ,
|
|
and returns when the process performs one or more of the requested events.
|
|
If a process can normally see another process, it can attach an event to it.
|
|
The events to monitor are:
|
|
.Bl -tag -width XXNOTE_TRACKERR
|
|
.It NOTE_EXIT
|
|
The process has exited.
|
|
.It NOTE_FORK
|
|
The process has called
|
|
.Fn fork .
|
|
.It NOTE_EXEC
|
|
The process has executed a new process via
|
|
.Xr execve 2
|
|
or similar call.
|
|
.It NOTE_TRACK
|
|
Follow a process across
|
|
.Fn fork
|
|
calls.
|
|
The parent process will return with NOTE_TRACK set in the
|
|
.Va fflags
|
|
field, while the child process will return with NOTE_CHILD set in
|
|
.Va fflags
|
|
and the parent PID in
|
|
.Va data .
|
|
.It NOTE_TRACKERR
|
|
This flag is returned if the system was unable to attach an event to
|
|
the child process, usually due to resource limitations.
|
|
.El
|
|
.Pp
|
|
On return,
|
|
.Va fflags
|
|
contains the events which triggered the filter.
|
|
.It EVFILT_SIGNAL
|
|
Takes the signal number to monitor as the identifier and returns
|
|
when the given signal is delivered to the current process.
|
|
This coexists with the
|
|
.Fn signal
|
|
and
|
|
.Fn sigaction
|
|
facilities, and has a lower precedence.
|
|
The filter will record
|
|
all attempts to deliver a signal to a process, even if the signal has
|
|
been marked as SIG_IGN.
|
|
Event notification happens after normal signal delivery processing.
|
|
.Va data
|
|
returns the number of times the signal has occurred since the last call to
|
|
.Fn kevent .
|
|
This filter automatically sets the EV_CLEAR flag internally.
|
|
.It EVFILT_TIMER
|
|
Establishes an arbitrary timer identified by
|
|
.Va ident .
|
|
When adding a timer,
|
|
.Va data
|
|
specifies the timeout period in milliseconds.
|
|
The timer will be periodic unless EV_ONESHOT is specified.
|
|
On return,
|
|
.Va data
|
|
contains the number of times the timeout has expired since the last call to
|
|
.Fn kevent .
|
|
This filter automatically sets the EV_CLEAR flag internally.
|
|
.El
|
|
.Sh RETURN VALUES
|
|
.Fn kqueue
|
|
creates a new kernel event queue and returns a file descriptor.
|
|
If there was an error creating the kernel event queue, a value of \-1 is
|
|
returned and errno set.
|
|
.Pp
|
|
.Fn kevent
|
|
returns the number of events placed in the
|
|
.Fa eventlist ,
|
|
up to the value given by
|
|
.Fa nevents .
|
|
If an error occurs while processing an element of the
|
|
.Fa changelist
|
|
and there is enough room in the
|
|
.Fa eventlist ,
|
|
then the event will be placed in the
|
|
.Fa eventlist
|
|
with
|
|
.Dv EV_ERROR
|
|
set in
|
|
.Va flags
|
|
and the system error in
|
|
.Va data .
|
|
Otherwise,
|
|
.Dv \-1
|
|
will be returned, and
|
|
.Dv errno
|
|
will be set to indicate the error condition.
|
|
If the time limit expires, then
|
|
.Fn kevent
|
|
returns 0.
|
|
.Sh ERRORS
|
|
The
|
|
.Fn kqueue
|
|
function fails if:
|
|
.Bl -tag -width Er
|
|
.It Bq Er ENOMEM
|
|
The kernel failed to allocate enough memory for the kernel queue.
|
|
.It Bq Er EMFILE
|
|
The per-process descriptor table is full.
|
|
.It Bq Er ENFILE
|
|
The system file table is full.
|
|
.El
|
|
.Pp
|
|
The
|
|
.Fn kevent
|
|
function fails if:
|
|
.Bl -tag -width Er
|
|
.It Bq Er EACCES
|
|
The process does not have permission to register a filter.
|
|
.It Bq Er EFAULT
|
|
There was an error reading or writing the
|
|
.Va kevent
|
|
structure.
|
|
.It Bq Er EBADF
|
|
The specified descriptor is invalid.
|
|
.It Bq Er EINTR
|
|
A signal was delivered before the timeout expired and before any
|
|
events were placed on the kqueue for return.
|
|
.It Bq Er EINVAL
|
|
The specified time limit or filter is invalid.
|
|
.It Bq Er ENOENT
|
|
The event could not be found to be modified or deleted.
|
|
.It Bq Er ENOMEM
|
|
No memory was available to register the event.
|
|
.It Bq Er ESRCH
|
|
The specified process to attach to does not exist.
|
|
.El
|
|
.Sh SEE ALSO
|
|
.\" .Xr aio_error 2 ,
|
|
.\" .Xr aio_read 2 ,
|
|
.\" .Xr aio_return 2 ,
|
|
.Xr ioctl 2 ,
|
|
.Xr poll 2 ,
|
|
.Xr read 2 ,
|
|
.Xr select 2 ,
|
|
.Xr sigaction 2 ,
|
|
.Xr write 2 ,
|
|
.Xr signal 3 ,
|
|
.Xr kfilter_register 9 ,
|
|
.Xr knote 9
|
|
.Sh HISTORY
|
|
The
|
|
.Fn kqueue
|
|
and
|
|
.Fn kevent
|
|
functions first appeared in
|
|
.Fx 4.1 ,
|
|
and then in
|
|
.Nx 2.0 .
|
|
.Sh AUTHORS
|
|
The
|
|
.Fn kqueue
|
|
system and this manual page were written by
|
|
.An Jonathan Lemon Aq jlemon@FreeBSD.org .
|
|
.Nx
|
|
port and manpage additions were done by
|
|
.An Luke Mewburn Aq lukem@NetBSD.org ,
|
|
.An Jason Thorpe Aq thorpej@NetBSD.org ,
|
|
and
|
|
.An Jaromir Dolecek Aq jdolecek@NetBSD.org .
|