We currently use use it up to 30. We should extend the limit to be able to use
more than 10Gbps speeds. Our ifmedia(4) is inconvenience and have some problem
so we should redesign the interface, but it's too late for netbsd-9 to do it.
So, we keep the data structure size and modify the structure a bit. The
strategy is almost the same as FreeBSD. Many bits of IFM_OMASK for Ethernet
have not used, so use some of them for Ethernet's subtype.
The differences against FreeBSD are:
- We use NetBSD style compat code (i.e. no SIOCGIFXMEDIA).
- FreeBSD's IFM_ETH_XTYPE's bit location is from 11 to "14" even though
IFM_OMASK is from 8 to "15". We use _IFM_ETH_XTMASK from bit 13 to "15".
- FreeBSD changed the meaning of IFM_TYPE_MATCH(). I think we should
not do it. We keep it not changing and added new IFM_TYPE_SUBTYPE_MATCH()
macro for matching both TYPE and SUBTYPE.
- Added up to 400GBASE-SR16.
New layout of the media word is as follows (from ifmedia_h):
* if_media Options word:
* Bits Use
* ---- -------
* 0-4 Media subtype MAX SUBTYPE == 255 for ETH and 31 for others
* 5-7 Media type
* 8-15 Type specific options
* 16-18 Mode (for multi-mode devices)
* 19 (Reserved for Future Use)
* 20-27 Shared (global) options
* 28-31 Instance
*
* 3 2 1
* 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
* +-------+---------------+-+-----+---------------+-----+---------+
* | | |R| | | | |
* | IMASK | GMASK |F|MMASK+-----+ OMASK |NMASK| TMASK |
* | | |U| |XTMSK| | | |
* +-------+---------------+-+-----+-----+---------+-----+---------+
* <-----> <---> <--->
* IFM_INST() IFM_MODE() IFM_TYPE()
*
* IFM_SUBTYPE(other than ETH)<------->
*
* <---> IFM_SUBTYPE(ETH)<------->
*
*
* <-------------> <------------->
* IFM_OPTIONS()
It is yet another psref leak detector that enables to tell where a leak occurs
while a simpler version that is already committed just tells an occurrence of a
leak.
Investigating of psref leaks is hard because once a leak occurs a percpu list of
psref that tracks references can be corrupted. A reference to a tracking object
is memorized in the list via an intermediate object (struct psref) that is
normally allocated on a stack of a thread. Thus, the intermediate object can be
overwritten on a leak resulting in corruption of the list.
The tracker makes a shadow entry to an intermediate object and stores some hints
into it (currently it's a caller address of psref_acquire). We can detect a
leak by checking the entries on certain points where any references should be
released such as the return point of syscalls and the end of each softint
handler.
The feature is expensive and enabled only if the kernel is built with
PSREF_DEBUG.
Proposed on tech-kern
IFF_ALLMULTI is set/unset to if_flags via if_mcast_op. To avoid data races on
if_flags, IFNET_LOCK was added for if_mcast_op. Unfortunately it produces
a deadlock so we want to remove added IFNET_LOCK by avoiding the data races by
another approach.
This fix introduces ec_flags to struct ethercom and stores IFF_ALLMULTI to it.
ec_flags is protected by ETHER_LOCK and thus IFNET_LOCK is no longer necessary
for if_mcast_op. Note that the fix is applied only to MP-safe drivers that
the data races matter.
In the kernel, IFF_ALLMULTI is set by a driver and used by the driver itself.
So changing the storing place doesn't break anything. One exception is
ioctl(SIOCGIFFLAGS); we have to include IFF_ALLMULTI in a result if needed to
export the flag as well as before.
A upcoming commit will remove IFNET_LOCK.
PR kern/54189
<subsystem>_<function>_<version>_hook
NFCI
XXX Note that although this introduces a change in the kernel-to-
XXX module interface, we are NOT bumping the kernel version number.
XXX We will bump the version number once the interface stabilizes.
Handle TX offload in software when a packet is sent via
bridge_output(). We can send it as is in the following
exceptional cases:
For unicast:
(1) When the destination interface is the same as source.
(2) When the destination supports all TX offload options
specified in a packet.
For multicast/broadcast:
(3) When all the members of the bridge support the specified
TX offload options.
For (3), add sc_csum_flags_tx flag to bridge softc, which is
logical AND b/w capabilities of TX offload options in member
interface (ifp->if_csum_flags_tx). The flag is updated when a
member is (i) added to or (ii) removed from a bridge, or (iii)
if_csum_flags_tx flag of a member interface is manipulated via
ifconfig(8).
Turn on M_CSUM_TSOv[46] bit in ifp->if_csum_flags_tx flag when
TSO[46] is enabled for that interface.
OK msaitoh thorpej
Some callers of rtrequest1(RTM_ADD) adjust rt_ifa of an rtentry created by
rtrequest1 that may change rt_ifa (in ifa_rtrequest) with another ifa that is
different from requested one. It's wasteful and even worse introduces a race
condition. rtrequest1 should just use a passed ifa as is if a caller hopes so.
e.g. do the following commands.
====================
# ifconfig vlan0 create
# ifconfig vlan0 vlan 100 vlanif wm0
# ifconfig vlan0 -vlanif wm0
# ifconfig vlan0 vlan 100 vlanif wm0
====================
ATF net/if_vlan do this type of test, however it cannot detect this bug.
Because the shmif(4)'s ifp->if_hwdl is always NULL as shmif(4)'s ethernet
address is set U/L bit.
See: https://nxr.netbsd.org/xref/src/sys/net/if_ethersubr.c#997
The node (and child nodes) is initialized in sysctl_net_pktq_setup, but the call
of sysctl_net_pktq_setup is skipped unexpectedly.
sysctl_net_pktq_setup is skipped if in6_present is false that indicates the
netinet6 component isn't loaded on rump kernels. However the flag is
accidentally always false because the flag is turned on in in6_dom_init that is
called after if_sysctl_setup on both normal and rump kernels.
Fix the issue by moving if_sysctl_setup after in6_dom_init (domaininit on normal
kernels). This fix is ad-hoc but good enough for netbsd-8. We should refine
the initialization order of network components in the future.
Pointed out by hikaru@
compatibility with BIOC[GS]SEESENT ioctl. The userland interface is the same
as FreeBSD.
This change also fixes a bug that the direction is misunderstand on some
environment by passing the direction to bpf_mtap*() instead of checking
m->m_pkthdr.rcvif.
It seems that there remain some paths that don't satisfy the constraint that is
required only if NET_MPSAFE. So don't check it by default.
One known path is nd6_rtrequest => in6_addmulti => if_mcast_op, which is not
easy to address.
if_input, i.e, ether_input and friends, now runs in softint without any
protections. It's ok for ether_input itself because it's already MP-safe,
however, subsequent routines called from it such as carp_input and agr_input
aren't safe because they're not MP-safe. Protect if_input with KERNEL_LOCK.
if_input can be called from a normal LWP context. In that case we need to
prevent interrupts (softint) from running by splsoftnet to protect non-MP-safe
codes (e.g., carp_input and agr_input).
Pointed out by mlelstv@
- Destroy if_ioctl_lock at the end of the if_detach because it's used in various
destruction routines
- Move psref_target_destroy after pr_purgeif because we want to use psref in
pr_purgeif (otherwise destruction procedures can be tricky)
Because it has decreased the performance of wm. And also I found that
wm_watchdog doesn't work well with if_watchdog framework at all. Sharing one
counter (if_timer) with multiple instances (hardware multi-queues) can't detect
a single (or some) stall of them because other instances reset the counter even
if the stalled one want the watchdog to fire.
Interfaces without IFEF_MPSAFE works safely with the original if_watchdog thanks
to KENREL_LOCK. OTOH, interfaces with IFEF_MPSAFE shouldn't use if_watchdog and
should implement their own watchdog timer that works with multiple instances.
At that point no other one modifies the list so IFADDR_READER_FOREACH
is unnecessary. Use of IFADDR_READER_FOREACH is harmless in general though,
if we try to detect contract violations of pserialize, using it violates
the contract. So avoid using it makes life easy.
if_link_queue is a queue to store events of link state changes, which is
used to pass events from (typically) an interrupt handler to
if_link_state_change softint. The queue was protected by KERNEL_LOCK so far,
but if IFEF_MPSAFE is enabled, it becomes unsafe because (perhaps) an interrupt
handler of an interface with IFEF_MPSAFE doesn't take KERNEL_LOCK. Protect it
by a spin mutex.
Additionally with this change KERNEL_LOCK of if_link_state_change softint is
omitted if NET_MPSAFE is enabled.
Note that the spin mutex is now ifp->if_snd.ifq_lock as well as the case of
if_timer (see the comment).