By the change, bpf_mtap can run without any locks as long as its bpf filter
doesn't match a target packet. Pushing data to a bpf buffer still needs
a lock. Removing the lock requires big changes and it's a future work.
Another known issue is that we need to remain some obsolete variables to
avoid breaking kvm(3) users such as netstat and fstat. One problem for
MP-ification is that in order to keep statistic counters of bpf_d we need
to use atomic operations for them. Once we retire the kvm(3) users, we
should make the counters per-CPU and remove the atomic operations.
bpf_mtap of some drivers is still called in hardware interrupt context.
We want to run them in softint as well as bpf_mtap of most drivers
(see if_percpuq_softint and if_input).
To this end, bpf_mtap_softint mechanism is implemented; it defers
bpf_mtap processing to a dedicated softint for a target driver.
By using the machanism, we can move bpf_mtap processing to softint
without changing target drivers much while it adds some overhead
on CPU and memory. Once target drivers are changed to softint-based,
we should return to normal bpf_mtap.
Proposed on tech-kern and tech-net
The rump code needs to call devsw_attach() in order to assign a dev_major
for bpf; it then uses this to create rumps /dev/bpf node. Unfortunately,
this leaves the devsw attached, so when the bpf module tries to initialize
itself, it gets an EEXIST error and fails.
So, once rump has figured what the dev_major should be, call devsw_detach()
to remove the devsw. Then, when the module initialization code calls
devsw_attach() it will succeed.
Having a pointer of an interface in a mbuf isn't safe if we remove big
kernel locks; an interface object (ifnet) can be destroyed anytime in any
packet processing and accessing such object via a pointer is racy. Instead
we have to get an object from the interface collection (ifindex2ifnet) via
an interface index (if_index) that is stored to a mbuf instead of an
pointer.
The change provides two APIs: m_{get,put}_rcvif_psref that use psref(9)
for sleep-able critical sections and m_{get,put}_rcvif that use
pserialize(9) for other critical sections. The change also adds another
API called m_get_rcvif_NOMPSAFE, that is NOT MP-safe and for transition
moratorium, i.e., it is intended to be used for places where are not
planned to be MP-ified soon.
The change adds some overhead due to psref to performance sensitive paths,
however the overhead is not serious, 2% down at worst.
Proposed on tech-kern and tech-net.
The API is used to set (or reset) a received interface of a mbuf.
They are counterpart of m_get_rcvif, which will come in another
commit, hide internal of rcvif operation, and reduce the diff of
the upcoming change.
No functional change.
can be included in kernels which need them without also duplicating
them in other modules. Removes the duplicate symbols I found which
prevented loading i2c and bpf modules after having fixed PR 45125.
This change intends to run the whole network stack in softint context
(or normal LWP), not hardware interrupt context. Note that the work is
still incomplete by this change; to that end, we also have to softint-ify
if_link_state_change (and bpf) which can still run in hardware interrupt.
This change softint-ifies at ifp->if_input that is called from
each device driver (and ieee80211_input) to ensure Layer 2 runs
in softint (e.g., ether_input and bridge_input). To this end,
we provide a framework (called percpuq) that utlizes softint(9)
and percpu ifqueues. With this patch, rxintr of most drivers just
queues received packets and schedules a softint, and the softint
dequeues packets and does rest packet processing.
To minimize changes to each driver, percpuq is allocated in struct
ifnet for now and that is initialized by default (in if_attach).
We probably have to move percpuq to softc of each driver, but it's
future work. At this point, only wm(4) has percpuq in its softc
as a reference implementation.
Additional information including performance numbers can be found
in the thread at tech-kern@ and tech-net@:
http://mail-index.netbsd.org/tech-kern/2016/01/14/msg019997.html
Acknowledgment: riastradh@ greatly helped this work.
Thank you very much!
designated initializers.
I have not built every extant kernel so I have probably broken at
least one build; however I've also found and fixed some wrong
cdevsw/bdevsw entries so even if so I think we come out ahead.
mode. Some return EINVAL when they are dying, but others like USB return EIO.
Downgrade to a DIAGNOSTIC printf. Same should be done for the malloc/NOWAIT,
but this is rarely hit.
which add a capability to call external functions in a predetermined way.
It can be thought as a BPF "coprocessor" -- a generic mechanism to offload
more complex packet inspection operations. There is no default coprocessor
and this functionality is not targeted to the /dev/bpf. This is primarily
targeted to the kernel subsystems, therefore there is no way to set a custom
coprocessor at the userlevel.
Discussed on: tech-net@
OK: core@
- When handling contiguous buffer in _bpf_tap(), pass its real size
rather than 0 to avoid reading packet data as mbuf struct on
out-of-bounds loads.
- Correctly pass pktlen and buflen arguments from bpf_deliver() to
bpf_filter() to avoid reading mbuf struct as packet data.
JIT case is still broken.
Also, test pointers againts NULL.