$NetBSD: TODO.smpnet,v 1.27 2019/01/21 02:28:25 pgoyette Exp $ MP-safe components ================== They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE kernel option. Some components scale up and some don't. - Device drivers - vioif(4) - vmx(4) - wm(4) - ixg(4) - ixv(4) - Layer 2 - Ethernet (if_ethersubr.c) - bridge(4) - STP - Fast forward (ipflow) - Layer 3 - All except for items in the below section - Interfaces - gif(4) - ipsecif(4) - l2tp(4) - pppoe(4) - if_spppsubr.c - tun(4) - vlan(4) - Packet filters - npf(7) - Others - bpf(4) - ipsec(4) - opencrypto(9) - pfil(9) Non MP-safe components and kernel options ========================================= The components and options aren't MP-safe, i.e., requires the big kernel lock, yet. Some of them can be used safely even if NET_MPSAFE is enabled because they're still protected by the big kernel lock. The others aren't protected and so unsafe, e.g, they may crash the kernel. Protected ones -------------- - Device drivers - Most drivers other than ones listed in the above section - Layer 4 - DCCP - SCTP - TCP - UDP Unprotected ones ---------------- - Layer 2 - ARCNET (if_arcsubr.c) - BRIDGE_IPF - FDDI (if_fddisubr.c) - HIPPI (if_hippisubr.c) - IEEE 1394 (if_ieee1394subr.c) - IEEE 802.11 (ieee80211(4)) - Token ring (if_tokensubr.c) - Layer 3 - IPSELSRC - MROUTING - PIM - MPLS (mpls(4)) - IPv6 address selection policy - Interfaces - agr(4) - carp(4) - faith(4) - gre(4) - ppp(4) - sl(4) - stf(4) - strip(4) - if_srt - tap(4) - Packet filters - ipf(4) - pf(4) - Others - AppleTalk (sys/netatalk/) - Bluetooth (sys/netbt/) - altq(4) - CIFS (sys/netsmb/) - kttcp(4) - NFS Know issues =========== NOMPSAFE -------- We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe yet. We use it in comments and also use as part of function names, for example m_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe codes by grep. bpf --- MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context or softint context, i.e., not in hardware interrupt context. For Tx, all bpf_mtap satisfy the requrement. For Rx, most of bpf_mtap are called in softint. Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context. This is the list of the functions that have such bpf_mtap: - sca_frame_process() @ sys/dev/ic/hd64570.c Ideally we should make the functions run in softint somehow, but we don't have actual devices, no time (or interest/love) to work on the task, so instead we provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint context. It's a workaround and once the functions run in softint, we should use the original bpf_mtap again. Lingering obsolete variables ----------------------------- Some obsolete global variables and member variables of structures remain to avoid breaking old userland programs which directly access such variables via kvm(3). The following programs still use kvm(3) to get some information related to the network stack. - netstat(1) - vmstat(1) - fstat(1) netstat(1) accesses ifnet_list, the head of a list of interface objects (struct ifnet), and traverses each object through ifnet#if_list member variable. ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and ifnet#if_pslist_entry respectively. netstat also accesses the IP address list of an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr and struct in6_ifaddr are accessed and the following obsolete member variables are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list, in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already implements alternative methods to fetch the above information via sysctl(3). vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel. The statistic information is retrieved via kvm(3). The global variables in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4 addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist, are kept for this purpose. We should provide a means to fetch statistics of hash tables via sysctl(3). fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list member variables are obsolete but remain. ifnet#if_xname is also accessed via struct bpf_if and obsolete ifnet#if_list is required to remain to not change the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount, bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for scalability the statistic counters should be per-CPU and we should stop using atomic operations for them however we have to remain the counters and atomic operations. Scalability ----------- - Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple flows per CPU - ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up is O(n) - opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable as they are serialized by one mutex ec_multi* of ethercom --------------------- ec_multiaddrs and ec_multicnt of struct ethercom and items listed in ec_multiaddrs must be protected by ec_lock. The core of ethernet subsystem is already MP-safe, however, device drivers that use the data should also be fixed. A typical change should be to protect manipulations of the data via ETHER_* macros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK. ALTQ ---- If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd) for packet transmissions, resulting in serializing all Tx packet processing on the queue. We should probably design and implement an alternative queuing mechanism that deals with multi-core systems at the first place, not making the existing ALTQ MP-safe because it's just annoying. Using kernel modules -------------------- Please note that if you enable NET_MPSAFE in your kernel, and you use and loadable kernel modules (including compat_xx modules or individual network interface if_xxx device driver modules), you will need to build custom modules. For each module you will need to add the following line to its Makefile: CPPFLAGS+= NET_MPSAFE Failure to do this may result in unpredictable behavior.