The workqueue worker can take the mutex before the tester tries to take it after
calling workqueue_enqueue. If it happens, the worker calls cv_broadcast before
the tester calls cv_timedwait and the tester will wait until the cv timed out
Take the mutex before calling workqueue_enqueue so that the tester surely calls
cv_timedwait before the worker calls cv_broadcast.
The fix stabilizes the test, t_workqueue/workqueue1.
will overlap with the requested scheduler range, so get the new scheduler
range, and then try to find a different priority. If that fails (to find
a different scheduling range), give up here.
to my message on tech-userlevel ...
Subject: tests/lib/libpthread/t_mutex:mutex6
Date: Thu, 23 Nov 2017 17:34:54 +0700
Message-ID: <28385.1511433294@andromeda.noi.kre.to>
which can be found at:
http://mail-index.netbsd.org/tech-userlevel/2017/11/23/msg011010.html
which analysed the mutex6 test case of this test, and concluded
that it was useless, nonsense, and broken (the whole test is just a
race - not even really using or testing mutexes), let it be henceforth
forever gone.
more precision bits than the data type expects, but (kind of obviously)
does not allow such values to be stored in memory, expecting the value
returned from strtod() (an intermediate result) to be identical (that is,
equal) to a stored value is incorrect.
So instead go back to checking that the two numbers are very very close.
See comments added to the test for more explanation.
of what are left are "race for the bus" type - if we lose, we just
wait for the next one ... slower but still reliable.
There are two exceptions ... when starting more than one rtadvd
(on different routers) we expect to receive an RA from each, but
all that we can check is that we received the (at least) right number
of RAs. It is possible (though unlikely) that one router sent two
before another sent any, in which case we will not have the data we
expect, and a sub-test will fail.
Second, there is no way to know for sure that we have waited long
enough when we're waiting for data to expire - in systems with
correctly working clocks that actually measure time, this should not
be an issue, if data is due to expire in < 5 seconds, and we wait
5 seconds, and the data is still there, then that indicates a
failure, which should be detected. Unfortunately with QEMU testing
time just isn't that reliable. But fortunately, it is generally the
sleep which takes longer, while other timers run correctly, which is
the way that makes us happy...
While here lots of cleanups - everything from white space and
line wrapping, to removing superfluous quotes and adding some
(but probably not enough) that are not (though given the data is
all known here, lack of quotes will rarely hurt.)
Also take note of the fact that current rtadvd *cannot* delete its
pidfile, so waiting for that file to be removed is doomed to failure.
Do things in a way that works, rather than simply resorting to assassination.
Because we do a lot less "sleep and hope it is long enough" and more
"wait until it is observed to happen" the tests generally run in less
elapsed time than before (20% less has been observed.) But because we
"wait until it is observed to happen" rather than just "sleep and hope
it is long enough" sometimes things take longer (and when that happens,
we no longer fail). Up to 7% slower (overall) has been observed.
(Observations on an amd64 DomU, no idea yet as to what QEMU might observe.)
computed using different methods, don't expect to achieve identical
results (here, one constant is perhaps converted to binary from a string by
a cross compiler, the other is converted at run time). Allow them to
have a small difference (for now, small is < 1e-7 - the constant is ~ 1e5,
so this is 12 orders of magnitude less) before failing (and include the
actual difference in the error message if it does fail.)
1. get pid of bg process with $! not $?
2. expect a single message from route monitor, not two, after ndp -d
3. run atf_check just once to verify correct output, not once for each string
1. get rid of the "$*" fetish.
2. more consistency (not complete .. yet) with RUMP_SERVER setting
3. white space (esp around pipe ('|') symbols.)
4. drop unnecessary \ line joining.
1. Be assertive when claiming the pid of the background route monitor command,
not polite... (ie: $! will give you the pid, $? is just 0 there).
2. Since "wait 0" simply (always) exits with status 127, immediately (we
know without thinking that we have no child with pid 0) the waits were
ineffective - now (after fix#1) they work .. which requires the
route monitor that watches the arp -d to exit after 1 message, not 2,
as 1 is all it gets. (If there really should be 2, someone needs to
find out why the kernel is sending only 1 - I am not that someone).
3. The file contents need to be read only once, no matter how many patterns
we need to look for, save some work, and do it that way (this is not
really a bug,m but saving time for the ATF tests is always a good thing.)
Not sure if this will stop it randomly failing on bablyon5, but it might.
(The likely cause is that the "route.monitor" has not flushed its stdout
buffers at the time the "grep -A 3" [aside: why that way to read the file??]
is performed, so fails to find its expected output ... the route monitor would
get an extra message once interfaces start being destroyed, I assume, and
would exit then, flushing its buffer, but by then it is too late.
If that is/was the cause, then it should be fixed now.)
detected as invalid, become the "someone" referred to in the
previous commit log, and add tests for 0 and 4095 as well, and
while here, throw in a few more that might elicit bugs.
And if the shell running the tests is able, add tests of a few
random vlan tags between 2 and 4093 (1 and 4094 are always tested)
to check that anything in range works (well, partially check...)
Move libevent from being a test playing sub-directory, to a groupy,
just hanging around, hoping someone will notice it, and throw it
a bone... (mixed metaphors?)
Currently (or when testing any shell that does not support -X) the
test will be skipped (also for [m]ksh (but not ksh93 etc) where there
is an absurdly badly named -X option, skip the new test for them as well.)
When -X appears in /bin/sh, this will verify that it is probably working
(the test is MUCH more gruelling than any rational use of -X would be.)