The primary race specific to this test has been fixed in previous commit
(wrong WNOHANG).
This test is still racy and breaks like once every 30,000 execution.
This is down like from once from every 100th execution in the past.
The remaning race is not specific to attach2 and I can reproduce it with
at least attach1. It still looks like being specific to NetBSD and it's
not reproducible on Linux and FreeBSD. Perhaps a bug with pipe(2)/write(2)/
read(2) or close to these features.
Sponsored by <The NetBSD Foundation>
macro conditions that is, _LP64.
The existing, previous code uses NOFPU as a condition for it.
This adds duplicated code (and later removes) for easy bisecting.
As suggested by dh, carefully disable interrupts before frobbing
interrupt mask, which might trigger more interrupts.
Don't bother with keeping BEV and such.
Note that we are zeroing out STATUS later on in the (NOFPU || emips)
case right now.
This change is risky for emips which wasn't tested and didn't reach
userland before.
ixgbe_rearm_queues() writes EICS register(s). 82599, X540 and X550
specifications say "Following a write of 1b to any bit in the EICS register
(interrupt cause set), its corresponding bit in the EIMS register is auto
set as well enabling its interrupt." in "Extended Interrupt Auto Mask Enable
(EIAM) Register" section. That is, ixgbe_rearm_queues() causes interrupts
regardless of the status managed by ixgbe_enable_queue()/ixgbe_disable_queue().
That can break poll mode assumption.
In fact, the problem occurs in the following situation
- CPU#A has high load traffic, in contrast, CPU#B has not so high load traffic
- CPU#A is occurred interrupt by its NIC queue
- CPU#A calls ixgbe_disable_queue() in interrupt handler(ixgbe_msix_que())
- CPU#A kick softint handler(ixgbe_handle_que())
- CPU#A begins softint
- CPU#A's NIC queue is set que->txr->busy flag
- With some reason, CPU#A can do ixg interrupt handler
E.g. when one of CPU#A's softnet handlers sleeps, ipl is lowered
- CPU#B starts callout
- CPU#B calls ixgbe_local_timer1()
- CPU#B writes EICS bit corresponding CPU#A's NIC queue bit
- CPU#A's NIC queue causes interrupt whie CPU#A is running in poll mode
- CPU#A calls ixgbe_disable_queue() in interrupt handler *again*
- CPU#A has done polling, and then CPU#A calls ixgbe_enable_queue() *once*
- CPU#A's NIC queue interrupt is disabled until ixg is detached as
ixgbe_disable_queue() is called twice though ixgbe_disable_queue() is
called once only
NOTE:
82598 does not say so, but it is treated in the same way because of no harm.
By the way, we will refactor ixgbe_local_timer(watchdog processing) later.
XXX pullup-8
At the end of the test we resume a tracer and expect to observe it to
collect the debuggee. We cannot from a parent point of view wait for
collecting it with WNOHANG without a race.
Remove the WNOHANG option from wait*(2) call. This corrects one type of
race.
This test is still racy for some other and unknown reason and this is being
investigated.
Sponsored by <The NetBSD Foundation>
This code after refactoring stopped calling functions that were designed
to trigger expected behavior and thus, tests were breaking.
Sponsored by <The NetBSD Foundation>