qemu/block
Fam Zheng ab27c3b5e7 mirror: Workaround for unexpected iohandler events during completion
Commit 5a7e7a0ba moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945c8 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc7b added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-04-22 16:44:09 +02:00
..
accounting.c block: Clean up includes 2016-01-20 13:36:23 +01:00
archipelago.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
backup.c block: Remove bdrv_(set_)enable_write_cache() 2016-03-30 12:16:03 +02:00
blkdebug.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
blkreplay.c replay: introduce block devices record/replay 2016-03-30 12:15:57 +02:00
blkverify.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
block-backend.c block: Don't ignore flags in blk_{,co,aio}_write_zeroes() 2016-04-15 17:22:12 +02:00
bochs.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
cloop.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
commit.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
crypto.c crypto: Avoid memory leak on failure 2016-04-05 17:23:21 +02:00
curl.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
dirty-bitmap.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
dmg.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
gluster.c block/gluster: prevent data loss after i/o error 2016-04-19 12:24:59 -04:00
io.c block: Fix bdrv_drain in coroutine 2016-04-11 16:59:09 +01:00
iscsi.c iscsi: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
linux-aio.c block: Clean up includes 2016-01-20 13:36:23 +01:00
Makefile.objs replay: introduce block devices record/replay 2016-03-30 12:15:57 +02:00
mirror.c mirror: Workaround for unexpected iohandler events during completion 2016-04-22 16:44:09 +02:00
nbd-client.c nbd: don't request FUA on FLUSH 2016-04-05 11:46:52 +02:00
nbd-client.h nbd: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
nbd.c nbd: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
nfs.c block/nfs: add missing #include "qemu/cutils.h" 2016-03-30 16:50:39 -04:00
null.c block/null-{co,aio}: Implement get_block_status() 2016-03-30 12:16:04 +02:00
parallels.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qapi.c block/qapi: Use blk_enable_write_cache() 2016-03-30 12:16:02 +02:00
qcow2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qcow2-cluster.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
qcow2-refcount.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
qcow2-snapshot.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
qcow2.c qcow2: Prevent backing file names longer than 1023 2016-04-12 18:06:51 +02:00
qcow2.h qcow2: Add function for refcount order amendment 2015-12-18 14:34:43 +01:00
qcow.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qed-check.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-cluster.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-gencb.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-l2-cache.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed-table.c block: Clean up includes 2016-01-20 13:36:23 +01:00
qed.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
qed.h util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
quorum.c quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode 2016-03-17 16:43:30 +01:00
raw_bsd.c raw: Support BDRV_REQ_FUA 2016-03-30 12:16:02 +02:00
raw-aio.h include/qemu/iov.h: Don't include qemu-common.h 2016-03-22 22:20:16 +01:00
raw-posix.c block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host 2016-03-30 11:59:32 +02:00
raw-win32.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
rbd.c util: move declarations out of qemu-common.h 2016-03-22 22:20:17 +01:00
sheepdog.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
snapshot.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
ssh.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
stream.c -----BEGIN PGP SIGNATURE----- 2016-03-29 19:54:49 +01:00
throttle-groups.c block: Clean up includes 2016-01-20 13:36:23 +01:00
vdi.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vhdx-endian.c block: Clean up includes 2016-01-20 13:36:23 +01:00
vhdx-log.c include/qemu/osdep.h: Don't include qapi/error.h 2016-03-22 22:20:15 +01:00
vhdx.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vhdx.h block: vhdx - update PAYLOAD_BLOCK_UNMAPPED value to match 1.00 spec 2014-12-12 15:42:22 +00:00
vmdk.c block: Always set writeback mode in blk_new_open() 2016-03-30 12:16:01 +02:00
vpc.c block/vpc: update comments to be compliant w/coding guidelines 2016-04-15 17:22:12 +02:00
vvfat.c block: Remove BDRV_O_CACHE_WB 2016-03-30 12:16:03 +02:00
win32-aio.c block: Clean up includes 2016-01-20 13:36:23 +01:00
write-threshold.c block: Clean up includes 2016-01-20 13:36:23 +01:00