qemu/block
Vladimir Sementsov-Ogievskiy 8c517de24a block/nbd: fix drain dead-lock because of nbd reconnect-delay
We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.

How to reproduce:

1. Create an image:
   qemu-img create -f qcow2 xx 100M

2. Start NBD server:
   qemu-nbd xx

3. Start vm with second nbd disk on node2, like this:

  ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
     file=/work/images/cent7.qcow2 -drive \
     driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
     -vnc :0 -m 2G -enable-kvm -vga std

4. Access the vm through vnc (or some other way?), and check that NBD
   drive works:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

   - the command should succeed.

5. Now, kill the nbd server, and run dd in the guest again:

   dd if=/dev/sdb of=/dev/null bs=1M count=10

Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.

VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.

Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09 15:04:32 -05:00
..
export block/export: Move writable to BlockExportOptions 2020-10-02 15:46:40 +02:00
monitor block/export: Add block-export-del 2020-10-02 15:46:40 +02:00
accounting.c block: add empty account cookie type 2019-10-10 10:56:18 +02:00
aio_task.c block: introduce aio task pool 2019-10-10 10:56:17 +02:00
amend.c block/amend: Check whether the node exists 2020-07-27 12:37:25 +02:00
backup-top.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
backup-top.h block: introduce backup-top filter driver 2019-10-10 10:56:18 +02:00
backup.c backup: Deal with filters 2020-09-07 12:31:31 +02:00
blkdebug.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
blklogwrites.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
blkreplay.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
blkverify.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
block-backend.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
block-copy.c block-copy: Use CAF to find sync=top base 2020-09-07 12:31:31 +02:00
block-gen.h scripts: add block-coroutine-wrapper.py 2020-10-05 10:59:06 +01:00
bochs.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
cloop.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
commit.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
copy-on-read.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
coroutines.h block/io: refactor save/load vmstate 2020-10-05 10:59:42 +01:00
create.c block/create: Do not abort if a block driver is not available 2019-09-13 12:18:37 +02:00
crypto.c block/crypto: disallow write sharing by default 2020-07-21 10:49:02 +02:00
crypto.h block/crypto: implement the encryption key management 2020-07-06 08:49:28 +02:00
curl.c error: Eliminate error_propagate() with Coccinelle, part 1 2020-07-10 15:18:08 +02:00
dirty-bitmap.c block/dirty-bitmap: add bdrv_has_named_bitmaps helper 2020-05-28 13:15:22 -05:00
dmg-bz2.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
dmg-lzfse.c block: adding lzfse decompressing support as a module. 2018-12-14 11:52:40 +01:00
dmg.c block: Use bdrv_default_perms() 2020-05-18 19:05:25 +02:00
dmg.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
file-posix.c block/file: switch to use qemu_open/qemu_create for improved errors 2020-09-16 10:33:48 +01:00
file-win32.c block/file: switch to use qemu_open/qemu_create for improved errors 2020-09-16 10:33:48 +01:00
filter-compress.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
gluster.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
io_uring.c io_uring: use io_uring_cq_ready() to check for ready cqes 2020-06-05 09:54:48 +01:00
io.c block/io: refactor save/load vmstate 2020-10-05 10:59:42 +01:00
iscsi-opts.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
iscsi.c qapi: Restrict query-uuid command to machine code 2020-09-29 15:41:35 +02:00
linux-aio.c misc: Replace zero-length arrays with flexible array member (automatic) 2020-03-16 22:07:42 +01:00
meson.build scripts: add block-coroutine-wrapper.py 2020-10-05 10:59:06 +01:00
mirror.c block: Inline bdrv_co_block_status_from_*() 2020-09-07 12:31:31 +02:00
nbd.c block/nbd: fix drain dead-lock because of nbd reconnect-delay 2020-10-09 15:04:32 -05:00
nfs.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
null.c block/null: Implement bdrv_get_allocated_file_size 2020-09-07 12:31:31 +02:00
nvme.c block/nvme: Replace magic value by SCALE_MS definition 2020-10-05 09:35:52 +01:00
parallels.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
parallels.h Clean up includes 2018-02-09 05:05:11 +01:00
qapi-sysemu.c block: Move system emulator QMP commands to block/qapi-sysemu.c 2020-03-06 17:15:38 +01:00
qapi.c migration: introduce icount field for snapshots 2020-10-06 08:34:49 +02:00
qcow2-bitmap.c qcow2: Use macros for the L1, refcount and bitmap table entry sizes 2020-09-15 11:05:12 +02:00
qcow2-cache.c core: replace getpagesize() with qemu_real_host_page_size 2019-10-26 15:38:06 +02:00
qcow2-cluster.c qcow2: Use L1E_SIZE in qcow2_write_l1_entry() 2020-10-02 15:46:40 +02:00
qcow2-refcount.c qcow2: Make qcow2_free_any_clusters() free only one cluster 2020-09-15 11:05:13 +02:00
qcow2-snapshot.c migration: introduce icount field for snapshots 2020-10-06 08:34:49 +02:00
qcow2-threads.c qcow2: add zstd cluster compression 2020-05-13 14:20:31 +02:00
qcow2.c qcow2: Convert qcow2_alloc_cluster_offset() into qcow2_alloc_host_offset() 2020-09-15 11:31:10 +02:00
qcow2.h qcow2: introduce icount field for snapshots 2020-10-06 08:34:49 +02:00
qcow.c block/qcow: remove runtime opts 2020-09-15 11:05:13 +02:00
qed-check.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed-cluster.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-l2-cache.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-table.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
qed.h qed: Simplify backing reads 2020-07-06 10:34:14 +02:00
quorum.c block/quorum.c: stable children names 2020-09-15 11:05:12 +02:00
raw-format.c error: Eliminate error_propagate() with Coccinelle, part 2 2020-07-10 15:18:08 +02:00
rbd.c block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[] 2020-09-15 11:31:10 +02:00
replication.c error: Reduce unnecessary error propagation 2020-07-10 15:18:08 +02:00
sheepdog.c block/sheepdog: Replace magic val by NANOSECONDS_PER_SECOND definition 2020-10-02 15:46:40 +02:00
snapshot.c block/snapshot: Fix fallback 2020-09-07 12:31:31 +02:00
ssh.c qapi: Smooth another visitor error checking pattern 2020-07-10 15:18:08 +02:00
stream.c stream: Deal with filters 2020-09-07 12:31:31 +02:00
throttle-groups.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
throttle.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
trace-events trace-events: Fix attribution of trace points to source 2020-09-09 17:17:58 +01:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdi.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vhdx-endian.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
vhdx-log.c block: Add flags to bdrv(_co)_truncate() 2020-04-30 17:51:07 +02:00
vhdx.c block/vhdx: Support vhdx image only with 512 bytes logical sector size 2020-09-15 11:05:13 +02:00
vhdx.h block/vhdx: Use IEC binary prefixes for size constants 2019-04-30 15:29:00 +02:00
vmdk.c vmdk: Drop vmdk_co_flush() 2020-09-07 12:31:31 +02:00
vpc.c error: Avoid error_propagate() after migrate_add_blocker() 2020-07-10 15:18:08 +02:00
vvfat.c util: rename qemu_open() to qemu_open_old() 2020-09-16 10:33:48 +01:00
win32-aio.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
write-threshold.c qapi: Drop qapi_event_send_FOO()'s Error ** argument 2018-08-28 18:21:38 +02:00