returned to userland by read(2) also needs to be converted.
For this, the bpf descriptor is flagged as compat32 (or not) in the
open and ioctl functions (where the user process's pid is also updated
in the descriptor). When the bpf buffer is filled in, the 32bits or native
header is used depending on the information stored in the descriptor.
This won't work if a 64bit binary does the open and ioctls, and then
exec a 32bit program which will do the read. But this is very
unlikely to happen in real life ...
Tested on i386 and loongson; with these changes my loongson can run
dhclient and tcpdump with a n32 userland.
M_NOWAIT cause dhcpd on a low-memory server with lots of interfaces to
occasionally fail to start with ENOBUFS; (M_WAITOK | M_CANFAIL) seems to
fix this.
Tested on 3 different dhcp servers.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.
Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.
Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
failure of wpa_supplicant(8) to re-key promptly, as reported in
http://mail-index.netbsd.org/tech-net/2008/04/18/msg000459.html
- Make bpf's read timeout work more correctly with select/poll.
- A fix for catchpacket() which delays calling bpf_wakeup() until
the state has been updated.
1. Please don't cast function pointers to (void *), use the full function
prototype cast; this is for archs where a function pointer is not a regular
pointer.
2. Compare pointers to NULL not 0.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
compatibility with the older ioctls. This avoids stack smashing and
abuse of "struct sockaddr" when ioctls placed "struct sockaddr_foo's" that
were longer than "struct sockaddr".
XXX: Some of the emulations might be broken; I tried to add code for
them but I did not test them.
Rather than calling mircotime() in catchpacket(), make catchpacket()
take a timeval indicating when the packet was captured. Move
microtime() to the calling functions and grab the timestamp as soon
as we know that we're going to call catchpacket at least once.
This means that we call microtime() once per matched packet, as
opposed to once per matched packet per bpf listener. It also means
that we return the same timestamp to all bpf listeners, rather than
slightly different ones.
It would be more accurate to call microtime() even earlier for all
packets, as you have to grab (1+#listener) locks before you can
determine if the packet will be logged. You could always grab a
timestamp before the locks, but microtime() can be costly, so this
didn't seem like a good idea.
(I guess most ethernet interfaces will have a bpf listener these
days because of dhclient. That means that we could be doing two bpf
locks on most packets going through the interface.)
and net.bpf.peers sysctls respectively.
A new structure was added to describe the external (user viewable)
representation of a BPF file; a new entry was added to the bpf_d
structure to store the PID of the calling process; a simple_lock was added
to protect the insert/removal from the net.bpf.peers sysctl handler.
This idea came from FreeBSD (Christian S.J. Peron) but while it is
implemented with sysctl's it differs a bit.
Reviewed by: christos@ and atatat@ (who gave me the tip for the net.bpf.peers
sysctl helper function).
The __UNCONST macro is now used only where necessary and the RW macros
are gone. Most of the changes here are consumers of the
sysctl_createv(9) interface that now takes a pair of const pointers
which used not to be.
Add bpf_deliver prototype.
Rename bpf_measure to m_length and move it to sys/sys/mbuf.h. I
make m_length an inline function in the header file to preserve
its performance characteristics, for better or for worse.
Optimize m_length: use the length in m_pkthdr.len, if M_PKTHDR.
In bpf_deliver, zero the on-stack mbuf before we do anything else
with it.
particule device. In doing this, make a new the bpf_stat structure with
members that are u_long rather than u_int, matching the counters in the bpf_d.
the original bpf_stat is now bpf_stat_old and so to the original ioctl
is preserved as BIOCGSTATSOLD.
not a read operation should be allowed to sleep. This allows the use of
bd_rtout with a value of "-1" to be eliminated (signed comparison and
assignment to an unsigned long.)
* in 1.91, a change was introduced that had bpfpoll() returning POLLRDNORM
set when the timeout expired. This impacted poorly on performance as well
as causing select to return an fd available for reading when it wasn't.
Change the behaviour here to only allow the possibility of POLLIN being
returned as active in the event of a timeout.
Fix the behaviour of BIOCIMMEDIATE (fix from LBL BPF code via FreeBSD.)
In bpf_mtap(), optimise the calling of bpf_filter() and catchpacket()
based on whether or not the entire packet is in one mbuf (based on
similar change FreeBSD but fixes BIOC*SEESENT issue with that.)
Copy the implementation of BIOCSSEESENT, BIOCGSEESENT by FreeBSD.
Review Assistance: Guy Harris
PRs: kern/8674, kern/12170
Increase the default bpf buffer size used by naive apps that don't do
BIOCSBLEN, from 8k to 32k. The former value of 8192 is too small to
hold a normal jumbo Ethernet frame (circa 9k), 16k is a little small
for Large-jumbo (~16k) frames supported by newer gigabit
Ethernet/10Gbe, so (somewhat arbitrarily) increase the default to 32k.
Increase the upper limit to which BIOSBLEN can raise bpf buffer-size
drastically, to 1 Mbyte. State-of-the-art for packet capture circa
1999 was around 256k; savvy NetBSD developers now use 1 Mbyte.
Note that libpcap has been updated to do binary-search on BIOCSBLEN
values up to 1 Mbyte.
Work is in progress to make both values sysctl'able. Source comments
note that consensus on tech-net is that we should find some heuristic
to set the boot-time default values dynamically, based on system memory.
* introduce fsetown(), fgetown(), fownsignal() - this sets/retrieves/signals
the owner of descriptor, according to appropriate sematics
of TIOCSPGRP/FIOSETOWN/SIOCSPGRP/TIOCGPGRP/FIOGETOWN/SIOCGPGRP ioctl; use
these routines instead of custom code where appropriate
* make every place handling TIOCSPGRP/TIOCGPGRP handle also FIOSETOWN/FIOGETOWN
properly, and remove the translation of FIO[SG]OWN to TIOC[SG]PGRP
in sys_ioctl() & sys_fcntl()
* also remove the socket-specific hack in sys_ioctl()/sys_fcntl() and
pass the ioctls down to soo_ioctl() as any other ioctl
change discussed on tech-kern@
When using bpf(4) in immediate mode, and using kevent(2) to receive
notification of packet arrival, the usermode application isn't notified
until a second packet arrives.
This is because KNOTE() calls filt_bpfread() before bd_slen has been
updated with the newly arrived packet length, so it looks like there
is no data there.
Moving the bpf_wakeup() call for immediate mode to after bd_slen is set
fixes it.
From: wayne@epipe.com.au in pr 3175
be inserted into ktrace records. The general change has been to replace
"struct proc *" with "struct lwp *" in various function prototypes, pass
the lwp through and use l_proc to get the process pointer when needed.
Bump the kernel rev up to 1.6V
Do a little mbuf rework while here. Change all uses of MGET*(*, M_WAIT, *)
to m_get*(M_WAIT, *). These are not performance critical and making them
call m_get saves considerable space. Add m_clget analogue of MCLGET and
make corresponding change for M_WAIT uses.
Modify netinet, gem, fxp, tulip, nfs to support MBUFTRACE.
Begin to change netstat to use sysctl.
kqueue provides a stateful and efficient event notification framework
currently supported events include socket, file, directory, fifo,
pipe, tty and device changes, and monitoring of processes and signals
kqueue is supported by all writable filesystems in NetBSD tree
(with exception of Coda) and all device drivers supporting poll(2)
based on work done by Jonathan Lemon for FreeBSD
initial NetBSD port done by Luke Mewburn and Jason Thorpe
bpf_mtap() gets called with not-well-initialized mbuf, so we need to go through
it without touching m->m_pkthdr.len and such. it's part of our bpf_mtap() API
(at least today).
This merge changes the device switch tables from static array to
dynamically generated by config(8).
- All device switches is defined as a constant structure in device drivers.
- The new grammer ``device-major'' is introduced to ``files''.
device-major <prefix> char <num> [block <num>] [<rules>]
- All device major numbers must be listed up in port dependent majors.<arch>
by using this grammer.
- Added the new naming convention.
The name of the device switch must be <prefix>_[bc]devsw for auto-generation
of device switch tables.
- The backward compatibility of loading block/character device
switch by LKM framework is broken. This is necessary to convert
from block/character device major to device name in runtime and vice versa.
- The restriction to assign device major by LKM is completely removed.
We don't need to reserve LKM entries for dynamic loading of device switch.
- In compile time, device major numbers list is packed into the kernel and
the LKM framework will refer it to assign device major number dynamically.