when the packet was received on a physical interface, it may not be if
the packet was received over L2TP/EtherIP.
In particular, if the inner ethernet header ends up on two separate IP
fragments. Here the KASSERT is triggered, and on !DIAGNOSTIC we corrupt
memory.
Note that this is a widespread problem: a lot of L2 code was written with
the assumption that "most" headers are present in the first mbuf.
Obviously, that's not true if L2 encapsulation is being used.
m_copym(m, 0, M_COPYALL, M_DONTWAIT)
by
m_copypacket(m, M_DONTWAIT)
when it is clear that we are copying a packet (that has M_PKTHDR) and not
a raw mbuf chain.
to make sure that m_next is not NULL, otherwise NULL deref. After that,
we must not touch m->m_pkthdr, given that 'm' may not be the first mbuf
of the chain anymore.
Declare mtop, and add a KASSERT to make sure it has M_PKTHDR set.
- Style and typos
- Use kmem_zalloc, in case there is a padding between the fields of
the structures
- Use ETHER_ADDR_LEN instead of a hard-coded '6'
- kmem_alloc(KM_SLEEP) can't fail
- Simplify ether_aton_r
- Use mutex_obj_free, not to leak memory
There are already two flags for if_output and if_start, however, it seems such
MPSAFE flags are eventually needed for all if_XXX operations. Having discrete
flags for each operation is wasteful of if_extflags bits. So let's unify
the flags into one: IFEF_MPSAFE.
Fortunately IFEF_*_MPSAFE flags have never been included in any releases, so
we can change them without breaking backward compatibility of the releases
(though the kernel version of -current should be bumped).
Note that if an interface have both MP-safe and non-MP-safe operations at a
time, we have to set the IFEF_MPSAFE flag and let callees of non-MP-safe
opeartions take the kernel lock.
Proposed on tech-kern@ and tech-net@
Unfortunately callers of ifp->if_ioctl (if_addr_init, if_flags_set
and if_mcast_op) may or may not hold if_ioctl_lock, so we have to
hold the lock only if it's not held.
The benefits of this change are:
- The flow is consistent with IPv4 (and FreeBSD and OpenBSD)
- old: ip6_output => nd6_output (do ND if needed) => L2_output (lookup a stored cache)
- new: ip6_output => L2_output (lookup a cache. Do ND if cache not found)
- We can remove some workarounds in nd6_output
- We can move L2 specific operations to their own place
- The performance slightly improves because one cache lookup is reduced
interface is configured. A callback function uses VLAN_ATTACHED() function
which check ec->ec_nvlans, the value should be incremented before calling the
callback. This bug was added in if_vlan.c rev. 1.83 (2015/11/19).
The data can be accessed from sysctl, ioctl, interface watchdog
(if_slowtimo) and interrupt handlers. We need to protect the data against
parallel accesses from them.
Currently the mutex is applied to some drivers, we need to apply it to all
drivers in the future.
Note that the mutex is adaptive one for ease of implementation but some
drivers access the data in interrupt context so we cannot apply the mutex
to every drivers as is. We have two options: one is to replace the mutex
with a spin one, which requires some additional works (see
ether_multicast_sysctl), and the other is to modify the drivers to access
the data not in interrupt context somehow.
carp_clone_destroy calls ether_ifdetach so not calling ether_ifattach is
inconsistent. If we add something pair of initialization and destruction
to ether_ifattach and ether_ifdetach (e.g., mutex_init/mutex_destroy),
ether_ifdetach of carp_clone_destroy won't work. So use ether_ifattach.
In order to do so, make ether_ifattach accept the 2nd argument (lla) as
NULL to allow carp to initialize its link level address by itself.
If NET_MPSAFE is enabled, don't hold KERNEL_LOCK and softnet_lock in
part of the network stack such as IP forwarding paths. The aim of the
change is to make it easy to test the network stack without the locks
and reduce our local diffs.
By default (i.e., if NET_MPSAFE isn't enabled), the locks are held
as they used to be.
Reviewed by knakahara@
If a underlying network device driver supports MSI/MSI-X, RX interrupts
can be delivered to arbitrary CPUs. This means that Layer 2 subroutines
such as ether_input (softint) and subsequent Layer 3 subroutines (softint)
which are called via traditional netisr can be dispatched on an arbitrary
CPU. Layer 2 subroutines now run without any locks (expected) and so a
Layer 2 subroutine and a Layer 3 subroutine can run in parallel.
There is a shared data between a Layer 2 routine and a Layer 3 routine,
that is ifqueue and IF_ENQUEUE (from L2) and IF_DEQUEUE (from L3) on it
are racy now.
To fix the race condition, use ifqueue#ifq_lock to protect ifqueue
instead of splnet that is meaningless now.
The same race condition exists in route_intr. Fix it as well.
Reviewed by knakahara@
This change makes struct ifaddr and its variants (in_ifaddr and in6_ifaddr)
MP-safe by using pserialize and psref. At this moment, pserialize_perform
and psref_target_destroy are disabled because (1) we don't need them
because of softnet_lock (2) they cause a deadlock because of softnet_lock.
So we'll enable them when we remove softnet_lock in the future.