ifioctl_common(). Since the first tun_ioctl() call already holds
the simplelock, the second tun_ioctl() call will wait forever to
acquire it: deadlock.
To fix this, wait to acquire the lock until tuninit().
When a link-layer address changes (e.g., ifconfig ex0 link
02🇩🇪ad:be:ef:02 active), send a gratuitous ARP and/or a Neighbor
Advertisement to update the network-/link-layer address bindings
on our LAN peers.
Refuse a change of ethernet address to the address 00:00:00:00:00:00
or to any multicast/broadcast address. (Thanks matt@.)
Reorder ifnet ioctl operations so that driver ioctls may inherit
the functions of their "class"---ether_ioctl(), fddi_ioctl(), et
cetera---and the class ioctls may inherit from the generic ioctl,
ifioctl_common(), but both driver- and class-ioctls may override
the generic behavior. Make network drivers share more code.
Distinguish a "factory" link-layer address from others for the
purposes of both protecting that address from deletion and computing
EUI64.
Return consistent, appropriate error codes from network drivers.
Improve readability. KNF.
*** Details ***
In if_attach(), always initialize the interface ioctl routine,
ifnet->if_ioctl, if the driver has not already initialized it.
Delete if_ioctl == NULL tests everywhere else, because it cannot
happen.
In the ioctl routines of network interfaces, inherit common ioctl
behaviors by calling either ifioctl_common() or whichever ioctl
routine is appropriate for the class of interface---e.g., ether_ioctl()
for ethernets.
Stop (ab)using SIOCSIFADDR and start to use SIOCINITIFADDR. In
the user->kernel interface, SIOCSIFADDR's argument was an ifreq,
but on the protocol->ifnet interface, SIOCSIFADDR's argument was
an ifaddr. That was confusing, and it would work against me as I
make it possible for a network interface to overload most ioctls.
On the protocol->ifnet interface, replace SIOCSIFADDR with
SIOCINITIFADDR. In ifioctl(), return EPERM if userland tries to
invoke SIOCINITIFADDR.
In ifioctl(), give the interface the first shot at handling most
interface ioctls, and give the protocol the second shot, instead
of the other way around. Finally, let compatibility code (COMPAT_OSOCK)
take a shot.
Pull device initialization out of switch statements under
SIOCINITIFADDR. For example, pull ..._init() out of any switch
statement that looks like this:
switch (...->sa_family) {
case ...:
..._init();
...
break;
...
default:
..._init();
...
break;
}
Rewrite many if-else clauses that handle all permutations of IFF_UP
and IFF_RUNNING to use a switch statement,
switch (x & (IFF_UP|IFF_RUNNING)) {
case 0:
...
break;
case IFF_RUNNING:
...
break;
case IFF_UP:
...
break;
case IFF_UP|IFF_RUNNING:
...
break;
}
unifdef lots of code containing #ifdef FreeBSD, #ifdef NetBSD, and
#ifdef SIOCSIFMTU, especially in fwip(4) and in ndis(4).
In ipw(4), remove an if_set_sadl() call that is out of place.
In nfe(4), reuse the jumbo MTU logic in ether_ioctl().
Let ethernets register a callback for setting h/w state such as
promiscuous mode and the multicast filter in accord with a change
in the if_flags: ether_set_ifflags_cb() registers a callback that
returns ENETRESET if the caller should reset the ethernet by calling
if_init(), 0 on success, != 0 on failure. Pull common code from
ex(4), gem(4), nfe(4), sip(4), tlp(4), vge(4) into ether_ioctl(),
and register if_flags callbacks for those drivers.
Return ENOTTY instead of EINVAL for inappropriate ioctls. In
zyd(4), use ENXIO instead of ENOTTY to indicate that the device is
not any longer attached.
Add to if_set_sadl() a boolean 'factory' argument that indicates
whether a link-layer address was assigned by the factory or some
other source. In a comment, recommend using the factory address
for generating an EUI64, and update in6_get_hw_ifid() to prefer a
factory address to any other link-layer address.
Add a routing message, RTM_LLINFO_UPD, that tells protocols to
update the binding of network-layer addresses to link-layer addresses.
Implement this message in IPv4 and IPv6 by sending a gratuitous
ARP or a neighbor advertisement, respectively. Generate RTM_LLINFO_UPD
messages on a change of an interface's link-layer address.
In ether_ioctl(), do not let SIOCALIFADDR set a link-layer address
that is broadcast/multicast or equal to 00:00:00:00:00:00.
Make ether_ioctl() call ifioctl_common() to handle ioctls that it
does not understand.
In gif(4), initialize if_softc and use it, instead of assuming that
the gif_softc and ifp overlap.
Let ifioctl_common() handle SIOCGIFADDR.
Sprinkle rtcache_invariants(), which checks on DIAGNOSTIC kernels
that certain invariants on a struct route are satisfied.
In agr(4), rewrite agr_ioctl_filter() to be a bit more explicit
about the ioctls that we do not allow on an agr(4) member interface.
bzero -> memset. Delete unnecessary casts to void *. Use
sockaddr_in_init() and sockaddr_in6_init(). Compare pointers with
NULL instead of "testing truth". Replace some instances of (type
*)0 with NULL. Change some K&R prototypes to ANSI C, and join
lines.
that set it, ended up causing the kernel to reference random garbage. Ignore
it for compatibility, but add a DIAGNOSTIC message so that userland programs
that set it can be fixed. The only one so far is pppd. Hi dyoung!
initialize it. Use sc_dev in etherip_clone_destroy() instead of
casting the softc to struct device *.
Remove gratuitous casts. Use device_t and cfdata_t throughout.
unique. Some brands of ADSL modems pick a hard-coded session ID which
would otherwise make it impossible to use two of them in the same system
simultaneously.
The later changes were only cosmetic, cause problems in IPv6-only-
connections (reported by Wolfgang Solfrank in private mail), as well
as reintroducing the original bug again.
synchronously, but insert a (varying) delay. Before we have only been
decoupled from the peer via network latency - now we introduce some
explicit delay. This, at least, creates batter serialized debug output.
However, if we have to reconnect because of an authentication failure,
the peer may have just been unable to access it's radius server. (I have
a setup where this seems to happen every now and then, depending on time
of day.) Backoff reconnect in this cases seriously longer - this is better
than hitting the max-auth-failure limit within a few seconds.
(Maybe it shouldn't even be inline - but I'd have to work out where to put it).
VLAN_INPUT_TAG() now calls vlan_input_tag() and does '_errcase' when it fails.
In reality the callers should all be changed, _errcase is ALWAYS continue,
which used to 'continue' (ie break) the do .. while (0) loop - not the
intended action!
Found by ramming all the kernel sources through a modified lint and grepping
for a specific error.
While here enclose the body of VLAN_OUTPUT_TAG() in ().
like it is a device_t.
In tap_clone_creator(), set cf_fstate to FSTATE_FOUND instead of
_NOTFOUND to avoid a panic in config_detach() on a DIAGNOSTIC
kernel. XXX I'm not sure that that is the right fix.
These changes should put a stop to the crash described in kern/38759.
an AF_LINK socket, only, to be consistent with SIOC[ADG]LIFADDR
behavior on AF_INET and AF_INET6 sockets. Let us create AF_LINK
sockets for this purpose. Note that most operations on AF_LINK
sockets are not implemented.
Move the function+line printing into GRE_DPRINTF().
Retire gre_closef(). Retire gre_join(). Constify gre_reconf(),
and don't pass it an LWP any longer.
Make this work in the new file descriptor regime. Add a kernel
thread per gre(4) instance whose purpose is to install the socket
into proc0's file descriptor table. Add gre_fp_send() and
gre_fp_recv() for passing file_t pointers to proc0.
Fix locking: don't solock() in the socket upcall, where it is
already held. Do solock() before calling soconnect().
Simplify reconfiguration.
Update a comment that mentions finding a less specific route, since
we don't do that any more.
proclist_mutex and proclist_lock into a single adaptive mutex (proc_lock).
Implications:
- Inspecting process state requires thread context, so signals can no longer
be sent from a hardware interrupt handler. Signal activity must be
deferred to a soft interrupt or kthread.
- As the proc state locking is simplified, it's now safe to take exit()
and wait() out from under kernel_lock.
- The system spends less time at IPL_SCHED, and there is less lock activity.
- Socket layer becomes MP safe.
- Unix protocols become MP safe.
- Allows protocol processing interrupts to safely block on locks.
- Fixes a number of race conditions.
With much feedback from matt@ and plunky@.
failure of wpa_supplicant(8) to re-key promptly, as reported in
http://mail-index.netbsd.org/tech-net/2008/04/18/msg000459.html
- Make bpf's read timeout work more correctly with select/poll.
- A fix for catchpacket() which delays calling bpf_wakeup() until
the state has been updated.
(rev 1.125): correct the check for fd_getsock() failure in
gre_socreate().
The second bug is more complicated to fix. Since rev 1.125,
gre_reconf() is using the file descriptor table of the current
process instead of the process 0's (the kernel's).
1. Please don't cast function pointers to (void *), use the full function
prototype cast; this is for archs where a function pointer is not a regular
pointer.
2. Compare pointers to NULL not 0.
Use fewer 'error = ...; break;' statements and more 'return
...;'
Make the SIOCSIFFLAGS case more clear by using a switch
statement instead of an if-else if-else chain.
Shorten a staircase, and remove two unnecessary curly
braces.
- Add a lot of missing selinit() and seldestroy() calls.
- Merge selwakeup() and selnotify() calls into a single selnotify().
- Add an additional 'events' argument to selnotify() call. It will
indicate which event (POLL_IN, POLL_OUT, etc) happen. If unknown,
zero may be used.
Note: please pass appropriate value of 'events' where possible.
Proposed on: <tech-kern>
the opportunity to handle an ioctl before generic ifioctl handling
occurs. This will ease extending the kernel and sharing of code
between drivers.
First steps: Make the signature of ifioctl_common() match struct
ifinet->if_ioctl. Convert SIOCSIFCAP and SIOCSIFMTU to the new
ifioctl() regime, throughout the kernel.
1 Extract subroutine if_dl_create() from if_alloc_sadl().
if_dl_create() allocates a link-layer ifaddr.
2 Extract subroutine ifioctl_common() from ifioctl(). ifioctl_common()
will be the basis for an ifnet "superclass" whose functions
drivers may inherit. Very simple drivers may set ifnet->if_ioctl
= ifioctl_common. More sophisticated drivers will set ifnet->if_ioctl
= driver_ioctl. driver_ioctl() will call ifioctl_common() to
re-use the common code.
reference, but mark the cache 'invalid'. Let the next user of the
route cache check to whether or not the cache is valid, and update
the rtentry reference if necessary. In this way, avoid hairy
splnet()/splx() protection of route caches, which I never did trust.
In rtcache_lookup2(), use the return values of rtcache_validate()
and _rtcache_init() instead of looking at _ro_rt. Also, check the
return code of rtcache_setdst() for an error.
ether_mediachange() to their own module that we compile only if
the kernel configuration demands support for both MII buses and
ethernet. Thanks to Tom Spindler for suggesting that these routines
move to dev/mii/.
Add if_set_sadl() that both sets the link-layer address length and
replaces the current link-layer address with a new one, and use it
throughout the tree.
to _ro_rt. Use rtcache_getrt() to access a route cache's struct
rtentry *.
Introduce struct ifnet->if_dl that always points at the interface
identifier/link-layer address. Make code that treated the first
ifaddr on struct ifnet->if_addrlist as the interface address use
if_dl, instead.
Remove stale debugging code from net/route.c. Move the rtflush()
code into rtcache_clear() and delete rtflush(). Delete rtalloc(),
because nothing uses it any more.
Make ND6_HINT an inline, lowercase subroutine, nd6_hint.
I've done my best to convert IP Filter, the ISO stack, and the
AppleTalk stack to rtcache_getrt(). They compile, but I have not
tested them. I have given the changes to PF, GRE, IPv4 and IPv6
stacks a lot of exercise.
- Reduce available SPL levels for hardware devices to none, vm, sched, high.
- Acquire kernel_lock only for interrupts at IPL_VM.
- Implement threaded soft interrupts.
instead of adding/subtracting our own IPv4 header.
There are many benefits: gre(4) needn't grok the outer encapsulation
header any longer, so this simplifies the gre(4) code. The IP
stack needn't grok GRE, so it is simplified, too. gre(4) will
benefit from optimizations in the socket code. Eventually, gre(4)
will gain an IPv6 encapsulation with very few new lines of code.
There is a small performance loss. A 133 MHz, 486-class AMD Elan
sinks/sources a TCP stream over GRE with about 93% the throughput
of the old code. TCP throughput on a 266 MHz, 586-class AMD Geode
is about 96% the throughput of the old code. A 175-MHz ADM5120
(MIPS) only sinks a TCP stream over GRE at about 90% of the old
code; I am still investigating that.
I produced stripped-down versions of sosend() and soreceive() for
gre(4) to use. They are guaranteed not to block, so they can be
called from a software interrupt and from a socket upcall,
respectively.
A kernel thread is no longer necessary for socket transmit/receive,
but I didn't get around to removing it, yet.
Thanks to Matt Thomas for suggesting the use of stripped-down socket
code and software interrupts, and to Andrew Doran for advice and
answers concerning software interrupts, threads, and performance.
ifreq * arguments to ether_addmulti() and ether_delmulti() to const
struct sockaddr *, since ether_{add,del}multi() only ever read the
sockaddr ifreq member, ifr_addr. Update uses in carp(4) and in
vlan(4).
Somehow having it above interfered with ctags(1) producing a tag
for etherip_softc.
Remove the sole member of the union etherip_softc.sc_scr; call it
sc_ro. Delete the union. Delete the #define for sc_ro. The union
was a holdover from days before the route caches were unified.
Copy the entire sockaddr to the buffer to be written to user space,
according to its length, not just the part that fits in struct
sockaddr.
This fixes the 'bad MAC address' problem in dhclient.
sockaddrs bigger than struct sockaddr. Tightly bind decrementing
available space and using it, avoiding incorrect accounting in an
error case. Document invariants. Document calling convention for
SIOCGIFCONF. Simplify by removing code to handle sockaddrs that don't
fit in struct ifreq; with sockaddr_storage this can no longer occur.
Add several KASSERTs.
This commit resolves the problem with racoon failing to list
interfaces.
Proposed on tech-net@ with no objections.
acquisition and final release out into gre_thread(). This will
fix a locking bug that LOCKDEBUG exposed: holding a spinlock over
an sosend() call is a no-no.
Cosmetic: join some lines, remove some unnecessary curly braces.
and dom_sa_len members from struct domain. Pools of fixed-size
objects are too rigid for sockaddr_dls, whose size can vary over
a wide range.
Return sockaddr_dl to its "historical" size. Now that I'm using
malloc(9) instead of pool(9) to allocate sockaddr_dl, I can create
a sockaddr_dl of any size in the kernel, so expanding sockaddr_dl
is useless.
Avoid using sizeof(struct sockaddr_dl) in the kernel.
Introduce sockaddr_dl_alloc() for allocating & initializing an
arbitrary sockaddr_dl on the heap.
Add an argument, the sockaddr length, to sockaddr_alloc(),
sockaddr_copy(), and sockaddr_dl_setaddr().
Constify: LLADDR() -> CLLADDR().
Where the kernel overwrites LLADDR(), use sockaddr_dl_setaddr(),
instead. Used properly, sockaddr_dl_setaddr() will not overrun
the end of the sockaddr.
identify sockaddr_dl abuse that remains in the kernel, especially
the potential for overwriting memory past the end of a sockaddr_dl
with, e.g., memcpy(LLADDR(), ...).
Use sockaddr_dl_setaddr() in a few places.
* Create the kernel thread in gre_clone_create() instead of trying
to create it in gre_ioctl(). (Thanks ad@ for suggesting it, and
pointing out that I can't kthread_create while I hold a spin
lock.) Run the thread always, but put it to sleep while the
gre(4) is not in UDP mode.
* Use sockaddr_in_init().
* Move some thread state off of the stack and into the softc.
* Extract subroutines gre_do_recv(), gre_do_send(), and gre_reconf()
from gre_thread1(), making the code more readable.
into sdl_data[].
Move the macro satocsdl() to net/if_dl.h, and introduce satosdl().
Add some helpers for initializing sockaddr_dl (sockaddr_dl_init),
for finding out the length to put in a sockaddr_dl's sdl_len member
(sockaddr_dl_measure), and for setting the link-layer address in
a sockaddr_dl to a new value (sockaddr_dl_setaddr).
Make sockaddr_copy() panic if the caller tries to copy a sockaddr
to a destination where it will not fit.
from the forwarding table's users:
Introduce rt_walktree() for walking the routing table and
applying a function to each rtentry. Replace most
rn_walktree() calls with it.
Use rt_getkey()/rt_setkey() to get/set a route's destination.
Keep a pointer to the sockaddr key in the rtentry, so that
rtentry users do not have to grovel in the radix_node for
the key.
Add a RTM_GET method to rtrequest. Use that instead of
radix_node lookups in, e.g., carp(4).
Add sys/net/link_proto.c, which supplies sockaddr routines for
link-layer socket addresses (sockaddr_dl).
Cosmetic:
Constify. KNF. Stop open-coding LIST_FOREACH, TAILQ_FOREACH,
et cetera. Use NULL instead of 0 for null pointers. Use
__arraycount(). Reduce gratuitous parenthesization.
Stop using variadic arguments for rip6_output(), it is
unnecessary.
Remove the unnecessary rtentry member rt_genmask and the
code to maintain it, since nothing actually used it.
Make rt_maskedcopy() easier to read by using meaningful variable
names.
Extract a subroutine intern_netmask() for looking up a netmask in
the masks table.
Start converting backslash-ridden IPv6 macros in
sys/netinet6/in6_var.h into inline subroutines that one
can read without special eyeglasses.
One functional change: when the kernel serves an RTM_GET, RTM_LOCK,
or RTM_CHANGE request, it applies the netmask (if supplied) to a
destination before searching for it in the forwarding table.
I have changed sys/netinet/ip_carp.c, carp_setroute(), to remove
the unlawful radix_node knowledge.
Apart from the changes to carp(4), netiso, ATM, and strip(4), I
have run the changes on three nodes in my wireless routing testbed,
which involves IPv4 + IPv6 dynamic routing acrobatics, and it's
working beautifully so far.
MRU to the link's MTU and initiate an MRU negotiation with the peer.
This is useful when the PPP session is bridged from Ethernet to ATM
by an ADSL modem (such as the Linksys AM200). Unless we negotiate the
lower MRU, the peer is unaware that 1500-byte packets will not make
it umolested across the link (the Linksys AM200 silently truncates them
to 1498 bytes, creating a nice PMTU blackhole).
Note that the PPP RFC says peers MUST accept 1500 byte packets,
regardless of the negotiated MRU, so most ISPs which use PPPoA will
probably still send 1500-byte packets. However, I persuaded my ISP
(Andrews and Arnold) to modify their software to generate an ICMP error
"fragment needed" for packets with IP.DF set which are larger than the
negotiated MRU. They will still forward non-IP.DF packets, with the
associated truncation, but at least my PMTU troubles have gone.
set to rn_walktree.
Introduce rt_walktree(), which applies a subroutine to every route
in a particular address family. Use it instead of rn_walktree()
virtually everywhere. This helps to hide the routing table
implementation.