199 lines
6.1 KiB
Plaintext
199 lines
6.1 KiB
Plaintext
$NetBSD: TODO.smpnet,v 1.23 2018/08/14 14:49:13 maxv Exp $
|
|
|
|
MP-safe components
|
|
==================
|
|
|
|
They work without the big kernel lock (KERNEL_LOCK), i.e., with NET_MPSAFE
|
|
kernel option. Some components scale up and some don't.
|
|
|
|
- Device drivers
|
|
- vioif(4)
|
|
- vmx(4)
|
|
- wm(4)
|
|
- ixg(4)
|
|
- ixv(4)
|
|
- Layer 2
|
|
- Ethernet (if_ethersubr.c)
|
|
- bridge(4)
|
|
- STP
|
|
- Fast forward (ipflow)
|
|
- Layer 3
|
|
- All except for items in the below section
|
|
- Interfaces
|
|
- gif(4)
|
|
- ipsecif(4)
|
|
- l2tp(4)
|
|
- pppoe(4)
|
|
- if_spppsubr.c
|
|
- tun(4)
|
|
- vlan(4)
|
|
- Packet filters
|
|
- npf(7)
|
|
- Others
|
|
- bpf(4)
|
|
- ipsec(4)
|
|
- opencrypto(9)
|
|
- pfil(9)
|
|
|
|
Non MP-safe components and kernel options
|
|
=========================================
|
|
|
|
The components and options aren't MP-safe, i.e., requires the big kernel lock,
|
|
yet. Some of them can be used safely even if NET_MPSAFE is enabled because
|
|
they're still protected by the big kernel lock. The others aren't protected and
|
|
so unsafe, e.g, they may crash the kernel.
|
|
|
|
Protected ones
|
|
--------------
|
|
|
|
- Device drivers
|
|
- Most drivers other than ones listed in the above section
|
|
- Layer 4
|
|
- DCCP
|
|
- SCTP
|
|
- TCP
|
|
- UDP
|
|
|
|
Unprotected ones
|
|
----------------
|
|
|
|
- Layer 2
|
|
- ARCNET (if_arcsubr.c)
|
|
- ATM (if_atmsubr.c)
|
|
- BRIDGE_IPF
|
|
- FDDI (if_fddisubr.c)
|
|
- HIPPI (if_hippisubr.c)
|
|
- IEEE 1394 (if_ieee1394subr.c)
|
|
- IEEE 802.11 (ieee80211(4))
|
|
- Token ring (if_tokensubr.c)
|
|
- Layer 3
|
|
- IPSELSRC
|
|
- MROUTING
|
|
- PIM
|
|
- MPLS (mpls(4))
|
|
- IPv6 address selection policy
|
|
- Interfaces
|
|
- agr(4)
|
|
- carp(4)
|
|
- faith(4)
|
|
- gre(4)
|
|
- ppp(4)
|
|
- sl(4)
|
|
- stf(4)
|
|
- strip(4)
|
|
- if_srt
|
|
- tap(4)
|
|
- Packet filters
|
|
- ipf(4)
|
|
- pf(4)
|
|
- Others
|
|
- AppleTalk (sys/netatalk/)
|
|
- ATM (sys/netnatm/)
|
|
- Bluetooth (sys/netbt/)
|
|
- altq(4)
|
|
- CIFS (sys/netsmb/)
|
|
- ISDN (sys/netisbn/)
|
|
- kttcp(4)
|
|
- NFS
|
|
|
|
Know issues
|
|
===========
|
|
|
|
NOMPSAFE
|
|
--------
|
|
|
|
We use "NOMPSAFE" as a mark that indicates that the code around it isn't MP-safe
|
|
yet. We use it in comments and also use as part of function names, for example
|
|
m_get_rcvif_NOMPSAFE. Let's use "NOMPSAFE" to make it easy to find non-MP-safe
|
|
codes by grep.
|
|
|
|
bpf
|
|
---
|
|
|
|
MP-ification of bpf requires all of bpf_mtap* are called in normal LWP context
|
|
or softint context, i.e., not in hardware interrupt context. For Tx, all
|
|
bpf_mtap satisfy the requrement. For Rx, most of bpf_mtap are called in softint.
|
|
Unfortunately some bpf_mtap on Rx are still called in hardware interrupt context.
|
|
|
|
This is the list of the functions that have such bpf_mtap:
|
|
|
|
- sca_frame_process() @ sys/dev/ic/hd64570.c
|
|
- en_intr() @ sys/dev/ic/midway.c
|
|
- rxintr_cleanup() @ sys/dev/pci/if_lmc.c
|
|
- ipr_rx_data_rdy() @ sys/netisdn/i4b_ipr.c
|
|
|
|
Ideally we should make the functions run in softint somehow, but we don't have
|
|
actual devices, no time (or interest/love) to work on the task, so instead we
|
|
provide a deferred bpf_mtap mechanism that forcibly runs bpf_mtap in softint
|
|
context. It's a workaround and once the functions run in softint, we should use
|
|
the original bpf_mtap again.
|
|
|
|
Lingering obsolete variables
|
|
-----------------------------
|
|
|
|
Some obsolete global variables and member variables of structures remain to
|
|
avoid breaking old userland programs which directly access such variables via
|
|
kvm(3).
|
|
|
|
The following programs still use kvm(3) to get some information related to
|
|
the network stack.
|
|
|
|
- netstat(1)
|
|
- vmstat(1)
|
|
- fstat(1)
|
|
|
|
netstat(1) accesses ifnet_list, the head of a list of interface objects
|
|
(struct ifnet), and traverses each object through ifnet#if_list member variable.
|
|
ifnet_list and ifnet#if_list is obsoleted by ifnet_pslist and
|
|
ifnet#if_pslist_entry respectively. netstat also accesses the IP address list
|
|
of an interface throught ifnet#if_addrlist. struct ifaddr, struct in_ifaddr
|
|
and struct in6_ifaddr are accessed and the following obsolete member variables
|
|
are stuck: ifaddr#ifa_list, in_ifaddr#ia_hash, in_ifaddr#ia_list,
|
|
in6_ifaddr#ia_next and in6_ifaddr#_ia6_multiaddrs. Note that netstat already
|
|
implements alternative methods to fetch the above information via sysctl(3).
|
|
|
|
vmstat(1) shows statistics of hash tables created by hashinit(9) in the kernel.
|
|
The statistic information is retrieved via kvm(3). The global variables
|
|
in_ifaddrhash and in_ifaddrhashtbl, which are for a hash table of IPv4
|
|
addresses and obsoleted by in_ifaddrhash_pslist and in_ifaddrhashtbl_pslist,
|
|
are kept for this purpose. We should provide a means to fetch statistics of
|
|
hash tables via sysctl(3).
|
|
|
|
fstat(1) shows information of bpf instances. Each bpf instance (struct bpf) is
|
|
obtained via kvm(3). bpf_d#_bd_next, bpf_d#_bd_filter and bpf_d#_bd_list
|
|
member variables are obsolete but remain. ifnet#if_xname is also accessed
|
|
via struct bpf_if and obsolete ifnet#if_list is required to remain to not change
|
|
the offset of ifnet#if_xname. The statistic counters (bpf#bd_rcount,
|
|
bpf#bd_dcount and bpf#bd_ccount) are also victims of this restriction; for
|
|
scalability the statistic counters should be per-CPU and we should stop using
|
|
atomic operations for them however we have to remain the counters and atomic
|
|
operations.
|
|
|
|
Scalability
|
|
-----------
|
|
|
|
- Per-CPU rtcaches (used in say IP forwarding) aren't scalable on multiple
|
|
flows per CPU
|
|
- ipsec(4) isn't scalable on the number of SA/SP; the cost of a look-up
|
|
is O(n)
|
|
- opencrypto(9)'s crypto_newsession()/crypto_freesession() aren't scalable
|
|
as they are serialized by one mutex
|
|
|
|
ec_multi* of ethercom
|
|
---------------------
|
|
|
|
ec_multiaddrs and ec_multicnt of struct ethercom and items listed in
|
|
ec_multiaddrs must be protected by ec_lock. The core of ethernet subsystem is
|
|
already MP-safe, however, device drivers that use the data should also be fixed.
|
|
A typical change should be to protect manipulations of the data via ETHER_*
|
|
macros such as ETHER_FIRST_MULTI by ETHER_LOCK and ETHER_UNLOCK.
|
|
|
|
ALTQ
|
|
----
|
|
|
|
If ALTQ is enabled in the kernel, it enforces to use just one Tx queue (if_snd)
|
|
for packet transmissions, resulting in serializing all Tx packet processing on
|
|
the queue. We should probably design and implement an alternative queuing
|
|
mechanism that deals with multi-core systems at the first place, not making the
|
|
existing ALTQ MP-safe because it's just annoying.
|