kern/39940 and by Martti Kuparinen on current-users@: replace the
ioctl lock with finer-grained locking. Lock the ports list and
wait to if_clone_destroy() until all threads are out of the softc.
Thanks to Martti Kuparinen for testing these changes.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.
Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.
Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
#if NBPFILTER is no longer required in the client. This change
doesn't yet add support for loading bpf as a module, since drivers
can register before bpf is attached. However, callers of bpf can
now be modularized.
Dynamically loadable bpf could probably be done fairly easily with
coordination from the stub driver and the real driver by registering
attachments in the stub before the real driver is loaded and doing
a handoff. ... and I'm not going to ponder the depths of unload
here.
Tested with i386/MONOLITHIC, modified MONOLITHIC without bpf and rump.
read/write/accept, then the expectation is that the blocked thread will
exit and the close complete.
Since only one fd is affected, but many fd can refer to the same file,
the close code can only request the fs code unblock with ERESTART.
Fixed for pipes and sockets, ERESTART will only be generated after such
a close - so there should be no change for other programs.
Also rename fo_abort() to fo_restart() (this used to be fo_drain()).
Fixes PR/26567
do drain' in many places, whereas fo_drain() was called in order to force
blocking read()/write() etc calls to return to userspace so that a close()
call from a different thread can complete.
In the sockets code comment out the broken code in the inner function,
it was being called from compat code.
- Drop the INET6 block. The commands are never given to this function
and truncating the sockaddr is arguably not the desired result anyway.
- Clear the address before copying. This fixes SIOCGIFNETMASK and possible
other ioctls for users that don't check sa_len. This includes
COMPAT_43 and Linux emulation.
OK dyoung@
Pfsync interface exposes change in the pf(4) over a pseudo-interface, and can
be used to synchronise different pf.
This work was part of my 2009 GSoC
No objection on tech-net@
addresses. Make the kernel support SIOC[SG]IFADDRPREF for IPv6
interface addresses.
In in6ifa_ifpforlinklocal(), consult preference numbers before
making an otherwise arbitrary choice of in6_ifaddr. Otherwise,
preference numbers are *not* consulted by the kernel, but that will
be rather easy for somebody with a little bit of free time to fix.
Please note that setting the preference number for a link-local
IPv6 address does not work right, yet, but that ought to be fixed
soon.
In support of the changes above,
1 Add a method to struct domain for "externalizing" a sockaddr, and
provide an implementation for IPv6. Expect more work in this area: it
may be more proper to say that the IPv6 implementation "internalizes"
a sockaddr. Add sockaddr_externalize().
2 Add a subroutine, sofamily(), that returns a struct socket's address
family or AF_UNSPEC.
3 Make a lot of IPv4-specific code generic, and move it from
sys/netinet/ to sys/net/ for re-use by IPv6 parts of the kernel and
ifconfig(8).
queue's maximum length, current length, and number of drops. E.g.,
% sysctl net.interfaces.bnx0
net.interfaces.bnx0.sndq.len = 0
net.interfaces.bnx0.sndq.maxlen = 509
net.interfaces.bnx0.sndq.drops = 0
Let userland adjust the maximum queue length.
While I'm here, add a 64-bit generation number, if_index_gen, to
ifnet; the pair [ifp->if_index, ifp->if_index_gen] can serve to
identify an ifnet for the lifetime of the system. I will use this
in an upcoming change.
Ok matt@.
These changes allow vlans to be layered above agr, with the attach
and detach propogated to the member ports in the aggregation.
Note the agr interface must be up before the vlan is attached.
Adds SIOCINITIFADDR support to the wm driver for setting the AF_LINK
address, necessary for agr to be able to set the mac addresses of each
port to the agr address (i.e. so it can receive all intended traffic
at the hardware level).
Adds support for disabling the LACP protocol by setting LINK1 on the agr
interface (e.g. ifconfig agr0 link1).
In consultation with tls@.
will be processed when the radix "subsystem" is initialized -- all
users must be attached before any inits to know the max keylength.
Use of link sets is no longer required, and only attached domains
need to be considered.
BRDGGFLT and BRDGSFILT bridge controls are only available with BRIDGE_IPF and PFIL_HOOKS defined.
In amd64 GENERIC and XEN kernel configs PFIL_HOOKS is defined but BRIDGE_IPF is not.
When a BRDGGFLT or BRDGSFILT command comes in, then ifd->ifd_cmd is not in range
of bridge_control_table_size. Then bc is not set and is dereferenced
later => BOOM.
than one active reference to a file descriptor. It should dislodge threads
sleeping while holding a reference to the descriptor. Implemented only for
sockets but should be extended to pipes, fifos, etc.
Fixes the case of a multithreaded process doing something like the
following, which would have hung until the process got a signal.
thr0 accept(fd, ...)
thr1 close(fd)
context (reported on various mailing-lists, and part of PR kern/41114,
causing panic in pf(4) and possibly ipf(4) when BRIDGE_IPF is used).
Defer bridge_forward() to a software interrupt; bridge_input() enqueues
mbufs to ifp->if_snd which is handled in bridge_forward().
RT_ prefix and use them appropriately, instead of making copies. Make
pppd use the RT_ROUNDUP macro; fixes proxyarp setting on 64 bit hosts.
XXX: All this should be pulled up to 5.0
Note the vlan interface does not see updates to the parents capabilities
so if, for example, TSO is on in both, then turned off in the parent it
will remain on in the vlan interface.
Extends the Opencrypto API to allow the destination buffer size to be
specified when its not the same size as the input buffer (i.e. for
operations like compress and decompress).
The crypto_op and crypt_n_op structures gain a u_int dst_len field.
The session_op structure gains a comp_alg field to specify a compression
algorithm.
Moved four ioctls to new ids; CIOCGSESSION, CIOCNGSESSION, CIOCCRYPT,
and CIOCNCRYPTM.
Added four backward compatible ioctls; OCIOCGSESSION, OCIOCNGSESSION,
OCIOCCRYPT, and OCIOCNCRYPTM.
Backward compatibility is maintained in ocryptodev.h and ocryptodev.c which
implement the original ioctls and set dst_len and comp_alg to 0.
Adds user-space access to compression features.
Adds software gzip support (CRYPTO_GZIP_COMP).
Adds the fast version of crc32 from zlib to libkern. This should be generally
useful and provide a place to start normalizing the various crc32 routines
in the kernel. The crc32 routine is used in this patch to support GZIP.
With input and support from tls@NetBSD.org.
There are still about 1600 left, but they have ',' or /* ... */
in the actual variable definitions - which my awk script doesn't handle.
There are also many that need () -> (void).
(The script does handle misordered arguments.)
* A sign extension error creating the bridge ID corrupted the
priority (always making it the maximum).
* Do not catch STP packets on an interface for which STP is not
enabled -- it's a violation of the spec, and causes STP to fail on
neighboring bridges.
* An optimization to bstp_input() -- some information is already
known when we call it.
contributed anonymously.
- ref count each compressor
- allow {un,}registration of several modules at once
- une RUN_ONCE to make sure the mutex is initialised, because
unfortunately built-in (and bootloader-loaded) modules init functions
are run before pseudo-devices attach (reported by Nick Hudson).
structures. This is far from optimal, but gets rid of iffy
#ifdef INET in radix.c. The radix bonsai still needs lots of love
before loading domains dynamically is possible...
ifioctl_common(). Since the first tun_ioctl() call already holds
the simplelock, the second tun_ioctl() call will wait forever to
acquire it: deadlock.
To fix this, wait to acquire the lock until tuninit().
When a link-layer address changes (e.g., ifconfig ex0 link
02🇩🇪ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.
Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)
Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.
Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.
Return consistent, appropriate error codes from network drivers.
Improve readability. KNF.
*** Details ***
In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.
In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.
Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.
In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.
Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:
switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}
Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,
switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}
unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).
In ipw(4), remove an if_set_sadl() call that is out of place.
In nfe(4), reuse the jumbo MTU logic in ether_ioctl().
Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.
Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.
Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.
Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.
In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.
Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.
In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.
Let ifioctl_common() handle SIOCGIFADDR.
Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.
In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.
bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
that set it, ended up causing the kernel to reference random garbage. Ignore
it for compatibility, but add a DIAGNOSTIC message so that userland programs
that set it can be fixed. The only one so far is pppd. Hi dyoung!
initialize it. Use sc_dev in etherip_clone_destroy() instead of
casting the softc to struct device *.
Remove gratuitous casts. Use device_t and cfdata_t throughout.
unique. Some brands of ADSL modems pick a hard-coded session ID which
would otherwise make it impossible to use two of them in the same system
simultaneously.
The later changes were only cosmetic, cause problems in IPv6-only-
connections (reported by Wolfgang Solfrank in private mail), as well
as reintroducing the original bug again.
synchronously, but insert a (varying) delay. Before we have only been
decoupled from the peer via network latency - now we introduce some
explicit delay. This, at least, creates batter serialized debug output.
However, if we have to reconnect because of an authentication failure,
the peer may have just been unable to access it's radius server. (I have
a setup where this seems to happen every now and then, depending on time
of day.) Backoff reconnect in this cases seriously longer - this is better
than hitting the max-auth-failure limit within a few seconds.
(Maybe it shouldn't even be inline - but I'd have to work out where to put it).
VLAN_INPUT_TAG() now calls vlan_input_tag() and does '_errcase' when it fails.
In reality the callers should all be changed, _errcase is ALWAYS continue,
which used to 'continue' (ie break) the do .. while (0) loop - not the
intended action!
Found by ramming all the kernel sources through a modified lint and grepping
for a specific error.
While here enclose the body of VLAN_OUTPUT_TAG() in ().