NetBSD/doc/TODO.smpnet

194 lines
6.0 KiB
Plaintext

$NetBSD: TODO.smpnet,v 1.25 2018/09/23 13:48:16 maxv Exp $
MP-safe components
==================
They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
kernel option. Some components scale up and some don't.
- Device drivers
- vioif(4)
- vmx(4)
- wm(4)
- ixg(4)
- ixv(4)
- Layer 2
- Ethernet (if_ethersubr.c)
- bridge(4)
- STP
- Fast forward (ipflow)
- Layer 3
- All except for items in the below section
- Interfaces
- gif(4)
- ipsecif(4)
- l2tp(4)
- pppoe(4)
- if_spppsubr.c
- tun(4)
- vlan(4)
- Packet filters
- npf(7)
- Others
- bpf(4)
- ipsec(4)
- opencrypto(9)
- pfil(9)
Non MP-safe components and kernel options
=========================================
The components and options aren't MP-safe, i.e., requires the big kernel lock,
yet. Some of them can be used safely even if NET_MPSAFE is enabled because
they're still protected by the big kernel lock. The others aren't protected and
so unsafe, e.g, they may crash the kernel.
Protected ones
--------------
- Device drivers
- Most drivers other than ones listed in the above section
- Layer 4
- DCCP
- SCTP
- TCP
- UDP
Unprotected ones
----------------
- Layer 2
- ARCNET (if_arcsubr.c)
- BRIDGE_IPF
- FDDI (if_fddisubr.c)
- HIPPI (if_hippisubr.c)
- IEEE 1394 (if_ieee1394subr.c)
- IEEE 802.11 (ieee80211(4))
- Token ring (if_tokensubr.c)
- Layer 3
- IPSELSRC
- MROUTING
- PIM
- MPLS (mpls(4))
- IPv6 address selection policy
- Interfaces
- agr(4)
- carp(4)
- faith(4)
- gre(4)
- ppp(4)
- sl(4)
- stf(4)
- strip(4)
- if_srt
- tap(4)
- Packet filters
- ipf(4)
- pf(4)
- Others
- AppleTalk (sys/netatalk/)
- Bluetooth (sys/netbt/)
- altq(4)
- CIFS (sys/netsmb/)
- kttcp(4)
- NFS
Know issues
===========
NOMPSAFE
--------
We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
yet. We use it in comments and also use as part of function names, for example
m_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe
codes by grep.
bpf
---
MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
or softint context, i.e., not in hardware interrupt context. For Tx, all
bpf_mtap satisfy the requrement. For Rx, most of bpf_mtap are called in softint.
Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
This is the list of the functions that have such bpf_mtap:
- sca_frame_process() @ sys/dev/ic/hd64570.c
- rxintr_cleanup() @ sys/dev/pci/if_lmc.c
Ideally we should make the functions run in softint somehow, but we don't have
actual devices, no time (or interest/love) to work on the task, so instead we
provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
context. It's a workaround and once the functions run in softint, we should use
the original bpf_mtap again.
Lingering obsolete variables
-----------------------------
Some obsolete global variables and member variables of structures remain to
avoid breaking old userland programs which directly access such variables via
kvm(3).
The following programs still use kvm(3) to get some information related to
the network stack.
- netstat(1)
- vmstat(1)
- fstat(1)
netstat(1) accesses ifnet_list, the head of a list of interface objects
(struct ifnet), and traverses each object through ifnet#if_list member variable.
ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
ifnet#if_pslist_entry respectively. netstat also accesses the IP address list
of an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
and struct in6_ifaddr are accessed and the following obsolete member variables
are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
implements alternative methods to fetch the above information via sysctl(3).
vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
The statistic information is retrieved via kvm(3). The global variables
in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
are kept for this purpose. We should provide a means to fetch statistics of
hash tables via sysctl(3).
fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
member variables are obsolete but remain. ifnet#if_xname is also accessed
via struct bpf_if and obsolete ifnet#if_list is required to remain to not change
the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
scalability the statistic counters should be per-CPU and we should stop using
atomic operations for them however we have to remain the counters and atomic
operations.
Scalability
-----------
- Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
flows per CPU
- ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
is O(n)
- opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
as they are serialized by one mutex
ec_multi* of ethercom
---------------------
ec_multiaddrs and ec_multicnt of struct ethercom and items listed in
ec_multiaddrs must be protected by ec_lock. The core of ethernet subsystem is
already MP-safe, however, device drivers that use the data should also be fixed.
A typical change should be to protect manipulations of the data via ETHER_*
macros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK.
ALTQ
----
If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
for packet transmissions, resulting in serializing all Tx packet processing on
the queue. We should probably design and implement an alternative queuing
mechanism that deals with multi-core systems at the first place, not making the
existing ALTQ MP-safe because it's just annoying.