Using softnet_lock for mutual exclusion between lltable_free and
arptimer was wrong and had an issue causing a deadlock between
them; lltable_free waits arptimer completion by calling
callout_halt with softnet_lock that is held in arptimer, however
lltable_free also holds llentry's lock that is also held in
arptimer so arptimer never obtain the lock and both never go
forward eventually. We have to pass llentry's lock to
callout_halt instead.
modular/filesystem. In the non-modular case we initialize through attach.
In the modular/builtin case we define the module to be class misc so it
attaches late (after percpu is initialized) since driver modules attach
too early. In the modular/filesystem case we define it to be a driver
module since we autoload it via /dev/npf open.
are initialized. MODULE_CLASS_DRIVER modules are now initialized before
autoconfiguration starts, but npf_init has a dependency on percpu(9) which
doesn't work until CPUs have attached (at least on ARM).
Callers of arpresolve() now pass the error code back to their caller,
masking out EWOULDBLOCK.
This allows applications such as ping(8) to display a suitable error
condition.
In it's place, use rtrequest1() inside rt_ifa_addlocal() and
rtdeletemsg() inside rt_ifa_remlocal().
This removes the need for INET/INET6 specific code and allows
greater control over the creation of the local address route.
Currently RX can run on a CPU other than CPU#0, so always enqueuing
to a pktqueue of CPU#0 makes no sense. Let's use a curcpu's pktqueue,
although bridge_foward softint doesn't run in parallel without
NET_MPSAFE.
This is a temporal solution. We need a fundamental solution.
With GATEWAY (fastforward), the whole forwarding processing runs in
hardware interrupt context. So we cannot use rwlock for lltable and
llentry in that case.
This change replaces rwlock with mutex(IPL_NET) for lltable and llentry
when GATEWAY is enabled. We need to tweak locking only around rtree
in lltable_free. Other than that, what we need to do is to change macros
for locks.
I hope fastforward runs in softint some day in the future...
The previous code took locks the following order:
- LLE_WLOCKs
- mutex_enter(softnet_lock)
- LLE_WUNLOCKs
- mutex_exit(softnet_lock)
This fix moves mutex_enter(softnet_lock) before LLE_WLOCKs.
We have to touch la_rt always with holding softnet_lock. And we have to
use callout_halt with softnet_lock instead of callout_stop for
la_timer (arptimer) because arptimer holds softnet_lock inside it.
This fix may solve a kernel panic christos@ encountered.
Highlights of the change are:
- Use llentry instead of llinfo to manage ARP caches
- ARP specific data are stored in the hashed list
of an interface instead of the global list (llinfo_arp)
- Fine-grain locking on llentry
- arptimer (callout) per ARP cache
- the global timer callout with the big locks can be
removed (though softnet_lock is still required for now)
- net.inet.arp.prune is now obsoleted
- it was the interval of the global timer callout
- net.inet.arp.refresh is now obsoleted
- it was a parameter that prevents expiration of active caches
- Removed to simplify the timer logic, but we may be able to
restore the feature if really needed
Proposed on tech-kern and tech-net.
lltable/llentry is new L2 nexthop cache data structures that
store caches in each interface (struct ifnet). It is imported
to replace the current ARP cache implementation that uses the
global list with the big kernel lock, and provide fine-grain
locking for cache operations. It is also planned to replace
NDP caches.
The code is based on FreeBSD's lltable/llentry as of r286629
and tweaked for NetBSD.
rtrequest has already done it. So we don't need to do it once more.
This fixes regressed behavior of ARP cache expiration which an expired
cache doesn't disappear.
Some codes in sys/net* use time_second to manage time periods such as
cache expirations. However, time_second doesn't increase monotonically
and can leap by say settimeofday(2) according to time_second(9). We
should use time_uptime instead of it to avoid such time leaps.
This change replaces time_second with time_uptime. Additionally it
converts a time based on time_uptime to a time based on time_second
when the kernel passes the time to userland programs that expect
the latter, and vice versa.
Note that we shouldn't leak time_uptime to other hosts over the
netowrk. My investigation shows there is no such leak:
http://mail-index.netbsd.org/tech-net/2015/08/06/msg005332.html
Discussed on tech-kern and tech-net.
So far bridge cannot receive frames via a member interface when the frames
come from another member interface. So when we assign an IP address to
a member interface, hosts connected to another member interface cannot
ping to the IP address. That behavior isn't expected. See PR 48104 for
more realistic examples of this issue.
The change does:
- drop M_PROMISC before ether_input, which allows a bridge member interface
to receive a frame coming from another bridge member interface
- receive broadcast/multicast frames via all bridge member interfaces,
which is required to receive IPv6 multicast packets destined to a
multicast group belonging to a bridge member interface that is different
from a packet arrival interface
roy@ helped testing of the fix, thanks!
rt_refcnt of rtentry was used in bad manners, for example, direct rt_refcnt++
and rt_refcnt-- outside route.c, "rt->rt_refcnt++; rtfree(rt);" idiom, and
touching rt after rt->rt_refcnt--.
These abuses seem to be needed because rt_refcnt manages only references
between rtentry and doesn't take care of references during packet processing
(IOW references from local variables). In order to reduce the above abuses,
the latter cases should be counted by rt_refcnt as well as the former cases.
This change improves consistency of use of rt_refcnt:
- rtentry is always accessed with rt_refcnt incremented
- rtentry's rt_refcnt is decremented after use (rtfree is always used instead
of rt_refcnt--)
- functions returning rtentry increment its rt_refcnt (and caller rtfree it)
Note that rt_refcnt prevents rtentry from being freed but doesn't prevent
rtentry from being updated. Toward MP-safe, we need to provide another
protection for rtentry, e.g., locks. (Or introduce a better data structure
allowing concurrent readers during updates.)