qemu_aio_coroutine_enter() is (indirectly) called recursively when
processing co_queue_wakeup. This can lead to stack exhaustion.
This patch rewrites co_queue_wakeup in an iterative fashion (instead of
recursive) with bounded memory usage to prevent stack exhaustion.
qemu_co_queue_run_restart() is inlined into qemu_aio_coroutine_enter()
and the qemu_coroutine_enter() call is turned into a loop to avoid
recursion.
There is one change that is worth mentioning: Previously, when
coroutine A queued coroutine B, qemu_co_queue_run_restart() entered
coroutine B from coroutine A. If A was terminating then it would still
stay alive until B yielded. After this patch B is entered by A's parent
so that a A can be deleted immediately if it is terminating.
It is safe to make this change since B could never interact with A if it
was terminating anyway.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20180322152834.12656-3-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
qemu_co_queue_next does not need to release and re-acquire the mutex,
because the queued coroutine does not run immediately. However, this
does not hold for qemu_co_enter_next. Now that qemu_co_queue_wait
can synchronize (via QemuLockable) with code that is not running in
coroutine context, it's important that code using qemu_co_enter_next
can easily use a standardized locking idiom.
First of all, qemu_co_enter_next must use aio_co_wake to restart the
coroutine. Second, the function gains a second argument, a QemuLockable*,
and the comments of qemu_co_queue_next and qemu_co_queue_restart_all
are adjusted to clarify the difference.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-5-pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
There are cases in which a queued coroutine must be restarted from
non-coroutine context (with qemu_co_enter_next). In this cases,
qemu_co_enter_next also needs to be thread-safe, but it cannot use
a CoMutex and so cannot qemu_co_queue_wait. Use QemuLockable so
that the CoQueue can interchangeably use CoMutex or QemuMutex.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180203153935.8056-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
These functions are more efficient in the presence of contention.
qemu_co_rwlock_downgrade also guarantees not to block, which may
be useful in some algorithms too.
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20170629132749.997-3-pbonzini@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Submission of requests on linux aio is a bit tricky and can lead to
requests completions on submission path:
44713c9e85 ("linux-aio: Handle io_submit() failure gracefully")
0ed93d84ed ("linux-aio: process completions from ioq_submit()")
That means that any coroutine which has been yielded in order to wait
for completion can be resumed from submission path and be eventually
terminated (freed).
The following use-after-free crash was observed when IO throttling
was enabled:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f5813dff700 (LWP 56417)]
virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
(gdb) bt
#0 virtqueue_unmap_sg (elem=0x7f5804009a30, len=1, vq=<optimized out>) at virtio.c:252
^^^^^^^^^^^^^^
remember the address
#1 virtqueue_fill (vq=0x5598b20d21b0, elem=0x7f5804009a30, len=1, idx=0) at virtio.c:282
#2 virtqueue_push (vq=0x5598b20d21b0, elem=elem@entry=0x7f5804009a30, len=<optimized out>) at virtio.c:308
#3 virtio_blk_req_complete (req=req@entry=0x7f5804009a30, status=status@entry=0 '\000') at virtio-blk.c:61
#4 virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at virtio-blk.c:126
#5 blk_aio_complete (acb=0x7f58040068d0) at block-backend.c:923
#6 coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at coroutine-ucontext.c:78
(gdb) p * elem
$8 = {index = 77, out_num = 2, in_num = 1,
in_addr = 0x7f5804009ad8, out_addr = 0x7f5804009ae0,
in_sg = 0x0, out_sg = 0x7f5804009a50}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
'in_sg' and 'out_sg' are invalid.
e.g. it is impossible that 'in_sg' is zero,
instead its value must be equal to:
(gdb) p/x 0x7f5804009ad8 + sizeof(elem->in_addr[0]) + 2 * sizeof(elem->out_addr[0])
$26 = 0x7f5804009af0
Seems 'elem' was corrupted. Meanwhile another thread raised an abort:
Thread 12 (Thread 0x7f57f2ffd700 (LWP 56426)):
#0 raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 qemu_coroutine_enter (co=0x7f5804009af0) at qemu-coroutine.c:113
#3 qemu_co_queue_run_restart (co=0x7f5804009a30) at qemu-coroutine-lock.c:60
#4 qemu_coroutine_enter (co=0x7f5804009a30) at qemu-coroutine.c:119
^^^^^^^^^^^^^^^^^^
WTF?? this is equal to elem from crashed thread
#5 qemu_co_queue_run_restart (co=0x7f57e7f16ae0) at qemu-coroutine-lock.c:60
#6 qemu_coroutine_enter (co=0x7f57e7f16ae0) at qemu-coroutine.c:119
#7 qemu_co_queue_run_restart (co=0x7f5807e112a0) at qemu-coroutine-lock.c:60
#8 qemu_coroutine_enter (co=0x7f5807e112a0) at qemu-coroutine.c:119
#9 qemu_co_queue_run_restart (co=0x7f5807f17820) at qemu-coroutine-lock.c:60
#10 qemu_coroutine_enter (co=0x7f5807f17820) at qemu-coroutine.c:119
#11 qemu_co_queue_run_restart (co=0x7f57e7f18e10) at qemu-coroutine-lock.c:60
#12 qemu_coroutine_enter (co=0x7f57e7f18e10) at qemu-coroutine.c:119
#13 qemu_co_enter_next (queue=queue@entry=0x5598b1e742d0) at qemu-coroutine-lock.c:106
#14 timer_cb (blk=0x5598b1e74280, is_write=<optimized out>) at throttle-groups.c:419
Crash can be explained by access of 'co' object from the loop inside
qemu_co_queue_run_restart():
while ((next = QSIMPLEQ_FIRST(&co->co_queue_wakeup))) {
QSIMPLEQ_REMOVE_HEAD(&co->co_queue_wakeup, co_queue_next);
^^^^^^^^^^^^^^^^^^^^
on each iteration 'co' is accessed,
but 'co' can be already freed
qemu_coroutine_enter(next);
}
When 'next' coroutine is resumed (entered) it can in its turn resume
'co', and eventually free it. That's why we see 'co' (which was freed)
has the same address as 'elem' from the first backtrace.
The fix is obvious: use temporary queue and do not touch coroutine after
first qemu_coroutine_enter() is invoked.
The issue is quite rare and happens every ~12 hours on very high IO
and CPU load (building linux kernel with -j512 inside guest) when IO
throttling is enabled. With the fix applied guest is running ~35 hours
and is still alive so far.
Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170601160847.23720-1-roman.penyaev@profitbricks.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This adds a CoMutex around the existing CoQueue. Because the write-side
can just take CoMutex, the old "writer" field is not necessary anymore.
Instead of removing it altogether, count the number of pending writers
during a read-side critical section and forbid further readers from
entering.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-7-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All that CoQueue needs in order to become thread-safe is help
from an external mutex. Add this to the API.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Running a very small critical section on pthread_mutex_t and CoMutex
shows that pthread_mutex_t is much faster because it doesn't actually
go to sleep. What happens is that the critical section is shorter
than the latency of entering the kernel and thus FUTEX_WAIT always
fails. With CoMutex there is no such latency but you still want to
avoid wait and wakeup. So introduce it artificially.
This only works with one waiters; because CoMutex is fair, it will
always have more waits and wakeups than a pthread_mutex_t.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-3-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This uses the lock-free mutex described in the paper '"Blocking without
Locking", or LFTHREADS: A lock-free thread library' by Gidenstam and
Papatriantafilou. The same technique is used in OSv, and in fact
the code is essentially a conversion to C of OSv's code.
[Added missing coroutine_fn in tests/test-aio-multithread.c.
--Stefan]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20170213181244.16297-2-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
As a small step towards the introduction of multiqueue, we want
coroutines to remain on the same AioContext that started them,
unless they are moved explicitly with e.g. aio_co_schedule. This patch
avoids that coroutines switch AioContext when they use a CoMutex.
For now it does not make much of a difference, because the CoMutex
is not thread-safe and the AioContext itself is used to protect the
CoMutex from concurrent access. However, this is going to change.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170213135235.12274-9-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
A coroutine that takes a lock must also release it again. If the
coroutine terminates without having released all its locks, it's buggy
and we'll probably run into a deadlock sooner or later. Make sure that
we don't get such cases.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
In cases of deadlocks, knowing who holds a given CoMutex is really
helpful for debugging. Keeping the information around doesn't cost much
and allows us to add another assertion to keep the code correct, so
let's just add it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.
Mostly done with the following semantic patch:
@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);
@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));
@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);
except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
CoQueue do not need to remove any element but the head of the list;
processing is always strictly FIFO. Therefore, the simpler singly-linked
QSIMPLEQ can be used instead.
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.
This commit was created with scripts/clean-includes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1454089805-5470-6-git-send-email-peter.maydell@linaro.org
The coroutine files are currently referenced by the block-obj-y
variable. The coroutine functionality though is already used by
more than just the block code. eg migration code uses coroutine
yield. In the future the I/O channel code will also use the
coroutine yield functionality. Since the coroutine code is nicely
self-contained it can be easily built as part of the libqemuutil.a
library, making it widely available.
The headers are also moved into include/qemu, instead of the
include/block directory, since they are now part of the util
codebase, and the impl was never in the block/ directory
either.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>