As suggested by dh, carefully disable interrupts before frobbing
interrupt mask, which might trigger more interrupts.
Don't bother with keeping BEV and such.
Note that we are zeroing out STATUS later on in the (NOFPU || emips)
case right now.
This change is risky for emips which wasn't tested and didn't reach
userland before.
ixgbe_rearm_queues() writes EICS register(s). 82599, X540 and X550
specifications say "Following a write of 1b to any bit in the EICS register
(interrupt cause set), its corresponding bit in the EIMS register is auto
set as well enabling its interrupt." in "Extended Interrupt Auto Mask Enable
(EIAM) Register" section. That is, ixgbe_rearm_queues() causes interrupts
regardless of the status managed by ixgbe_enable_queue()/ixgbe_disable_queue().
That can break poll mode assumption.
In fact, the problem occurs in the following situation
- CPU#A has high load traffic, in contrast, CPU#B has not so high load traffic
- CPU#A is occurred interrupt by its NIC queue
- CPU#A calls ixgbe_disable_queue() in interrupt handler(ixgbe_msix_que())
- CPU#A kick softint handler(ixgbe_handle_que())
- CPU#A begins softint
- CPU#A's NIC queue is set que->txr->busy flag
- With some reason, CPU#A can do ixg interrupt handler
E.g. when one of CPU#A's softnet handlers sleeps, ipl is lowered
- CPU#B starts callout
- CPU#B calls ixgbe_local_timer1()
- CPU#B writes EICS bit corresponding CPU#A's NIC queue bit
- CPU#A's NIC queue causes interrupt whie CPU#A is running in poll mode
- CPU#A calls ixgbe_disable_queue() in interrupt handler *again*
- CPU#A has done polling, and then CPU#A calls ixgbe_enable_queue() *once*
- CPU#A's NIC queue interrupt is disabled until ixg is detached as
ixgbe_disable_queue() is called twice though ixgbe_disable_queue() is
called once only
NOTE:
82598 does not say so, but it is treated in the same way because of no harm.
By the way, we will refactor ixgbe_local_timer(watchdog processing) later.
XXX pullup-8
At the end of the test we resume a tracer and expect to observe it to
collect the debuggee. We cannot from a parent point of view wait for
collecting it with WNOHANG without a race.
Remove the WNOHANG option from wait*(2) call. This corrects one type of
race.
This test is still racy for some other and unknown reason and this is being
investigated.
Sponsored by <The NetBSD Foundation>
This code after refactoring stopped calling functions that were designed
to trigger expected behavior and thus, tests were breaking.
Sponsored by <The NetBSD Foundation>
llentry is easy to be leaked and pool suits for it because pool is usable to
detect leaks.
Also sweep unnecessary wrappers for llentry, in_llentry and in6_llentry.
We have to delete entries on in_lltable_delete and in6_lltable_delete
unconditionally. Note that we don't need to worry about LLE_IFADDR because
there is no such entries now.
callout_reset and callout_halt can cancel a pending callout without telling us.
Detect a cancel and remove a reference by using callout_pending and
callout_stop (it's a bit tricy though, we can detect it).
While here, we can remove remaining abuses of mutex_owned for softnet_lock.