qemu/block
Vladimir Sementsov-Ogievskiy 4ddb5d2fde block/nbd: drop connection_co
OK, that's a big rewrite of the logic.

Pre-patch we have an always running coroutine - connection_co. It does
reply receiving and reconnecting. And it leads to a lot of difficult
and unobvious code around drained sections and context switch. We also
abuse bs->in_flight counter which is increased for connection_co and
temporary decreased in points where we want to allow drained section to
begin. One of these place is in another file: in nbd_read_eof() in
nbd/client.c.

We also cancel reconnect and requests waiting for reconnect on drained
begin which is not correct. And this patch fixes that.

Let's finally drop this always running coroutine and go another way:
do both reconnect and receiving in request coroutines.

The detailed list of changes below (in the sequence of diff hunks).

1. receiving coroutines are woken directly from nbd_channel_error, when
   we change s->state

2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
   and in nbd_teardown_connection() all requests should already be
   finished (and reconnect is done from request). So
   nbd_co_establish_connection_cancel() is called from
   nbd_cancel_in_flight() (to cancel the request that is doing
   nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
   (previously we didn't need it, as reconnect delay only should cancel
   active requests not the reconnection itself). But now reconnection
   itself is done in the separate thread (we now call
   nbd_client_connection_enable_retry() in nbd_open()), and we need to
   cancel the requests that wait in nbd_co_establish_connection()
   now).

2A. We do receive headers in request coroutine. But we also should
   dispatch replies for other pending requests. So,
   nbd_connection_entry() is turned into nbd_receive_replies(), which
   does reply dispatching while it receives other request headers, and
   returns when it receives the requested header.

3. All old staff around drained sections and context switch is dropped.
   In details:
   - we don't need to move connection_co to new aio context, as we
     don't have connection_co anymore
   - we don't have a fake "request" of connection_co (extra increasing
     in_flight), so don't care with it in drain_begin/end
   - we don't stop reconnection during drained section anymore. This
     means that drain_begin may wait for a long time (up to
     reconnect_delay). But that's an improvement and more correct
     behavior see below[*]

4. In nbd_teardown_connection() we don't have to wait for
   connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
   is moved here from removed connection_co.

5. In nbd_co_do_establish_connection() we now should handle
   NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
   NBD_CLIENT_CONNECTING_NOWAIT, it still should call
   nbd_co_establish_connection() (who knows, maybe the connection was
   already established by another thread in the background). But we
   shouldn't wait: if nbd_co_establish_connection() can't return new
   channel immediately the request should fail (we are in
   NBD_CLIENT_CONNECTING_NOWAIT state).

6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
   other requests in the caller, so here we just assert that fact.
   Also delay time is now initialized here: we can easily detect first
   attempt and start a timer.

7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
   retries are fully handle by thread (nbd/client-connection.c), delay
   timer we initialize in nbd_reconnect_attempt(), we don't have to
   bother with s->drained and friends. nbd_reconnect_attempt() now
   called from nbd_co_send_request().

8. nbd_connection_entry is dropped: reconnect is now handled by
   nbd_co_send_request(), receiving reply is now handled by
   nbd_receive_replies(): all handled from request coroutines.

9. So, welcome new nbd_receive_replies() called from request coroutine,
   that receives reply header instead of nbd_connection_entry().
   Like with sending requests, only one coroutine may receive in a
   moment. So we introduce receive_mutex, which is locked around
   nbd_receive_reply(). It also protects some related fields. Still,
   full audit of thread-safety in nbd driver is a separate task.
   New function waits for a reply with specified handle being received
   and works rather simple:

   Under mutex:
     - if current handle is 0, do receive by hand. If another handle
       received - switch to other request coroutine, release mutex and
       yield. Otherwise return success
     - if current handle == requested handle, we are done
     - otherwise, release mutex and yield

10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
    needed. Also waiting in free_sema queue we now wait for one of two
    conditions:
    - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
    - connectING, in_flight == 0, so we can call
      nbd_reconnect_attempt()
    And this logic is protected by s->send_mutex

    Also, on failure we don't have to care of removed s->connection_co

11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
    s->connection_co we just call new nbd_receive_replies().

12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
    which means that handling of the whole reply is finished. Here we
    need to wake one of coroutines sleeping in nbd_receive_replies().
    If none are sleeping - do nothing. That's another behavior change: we
    don't have endless recv() in the idle time. It may be considered as
    a drawback. If so, it may be fixed later.

13. nbd_reply_chunk_iter_receive(): don't care about removed
    connection_co, just ping in_flight waiters.

14. Don't create connection_co, enable retry in the connection thread
    (we don't have own reconnect loop anymore)

15. We now need to add a nbd_co_establish_connection_cancel() call in
    nbd_cancel_in_flight(), to cancel the request that is doing a
    connection attempt.

[*], ok, now we don't cancel reconnect on drain begin. That's correct:
    reconnect feature leads to possibility of long-running requests (up
    to reconnect delay). Still, drain begin is not a reason to kill
    long requests. We should wait for them.

    This also means, that we can again reproduce a dead-lock, described
    in 8c517de24a.
    Why we are OK with it:
    1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
       after reconnect delay. Actually 8c517de24a fixed a bug in
       NBD logic, that was not described in 8c517de24a and led to
       forever dead-lock. The problem was that nobody woke the free_sema
       queue, but drain_begin can't finish until there is a request in
       free_sema queue. Now we have a reconnect delay timer that works
       well.
    2. It's not a problem of the NBD driver, but of the ide code,
       because it does drain_begin under the global mutex; the problem
       doesn't reproduce when using scsi instead of ide.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210902103805.25686-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar and comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29 13:46:33 -05:00
..
export block/export/fuse.c: fix fuse-lseek on uclibc or musl 2021-09-01 14:38:08 +02:00
monitor block/monitor: Consolidate hmp_handle_error calls to reduce redundant code 2021-09-01 12:57:31 +02:00
accounting.c block/accounting: Use lock guard macros 2020-12-11 17:52:39 +01:00
aio_task.c block: introduce aio task pool 2019-10-10 10:56:17 +02:00
amend.c block/amend: Check whether the node exists 2020-07-27 12:37:25 +02:00
backup.c block/copy-before-write: initialize block-copy bitmap 2021-09-01 14:03:47 +02:00
blkdebug.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
blklogwrites.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
blkreplay.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
blkverify.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
block-backend.c block: introduce blk_replace_bs 2021-09-01 12:57:31 +02:00
block-copy.c block/block-copy: block_copy_state_new(): drop extra arguments 2021-09-01 14:38:08 +02:00
block-gen.h scripts: add block-coroutine-wrapper.py 2020-10-05 10:59:06 +01:00
bochs.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
cloop.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
commit.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
copy-before-write.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
copy-before-write.h block/copy-before-write: bdrv_cbw_append(): drop unused compress arg 2021-09-01 14:03:47 +02:00
copy-on-read.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
copy-on-read.h copy-on-read: add filter drop function 2021-01-26 11:26:54 +01:00
coroutines.h block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() 2021-06-18 12:21:22 -05:00
create.c block/create: Do not abort if a block driver is not available 2019-09-13 12:18:37 +02:00
crypto.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
crypto.h nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
curl.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
dirty-bitmap.c iotests: Improve and rename test 291 to qemu-img-bitmap 2021-07-21 14:14:41 -05:00
dmg-bz2.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
dmg-lzfse.c block: Remove unused include 2020-11-09 15:44:21 +01:00
dmg.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
dmg.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
file-posix.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
file-win32.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
filter-compress.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
gluster.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
io_uring.c block/io_uring: resubmit when result is -EAGAIN 2021-07-29 17:14:55 +01:00
io.c block/io: allow 64bit discard requests 2021-09-29 13:46:32 -05:00
iscsi-opts.c modules: add block module annotations 2021-07-09 18:20:27 +02:00
iscsi.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
linux-aio.c linux-aio: limit the batch size using aio-max-batch parameter 2021-07-21 13:47:50 +01:00
meson.build block: rename backup-top to copy-before-write 2021-09-01 12:57:31 +02:00
mirror.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
nbd.c block/nbd: drop connection_co 2021-09-29 13:46:33 -05:00
nfs.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
null.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
nvme.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
parallels-ext.c parallels: support bitmap extension for read-only mode 2021-03-08 14:56:55 +01:00
parallels.c parallels: support bitmap extension for read-only mode 2021-03-08 14:56:55 +01:00
parallels.h parallels: support bitmap extension for read-only mode 2021-03-08 14:56:55 +01:00
preallocate.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
progress_meter.c progressmeter: protect with a mutex 2021-06-25 14:24:24 +03:00
qapi-sysemu.c block: Move system emulator QMP commands to block/qapi-sysemu.c 2020-03-06 17:15:38 +01:00
qapi.c block: use GDateTime for formatting timestamp when dumping snapshot info 2021-06-14 13:28:50 +01:00
qcow2-bitmap.c nbd patches for 2021-03-09 2021-03-11 13:57:08 +00:00
qcow2-cache.c core: replace getpagesize() with qemu_real_host_page_size 2019-10-26 15:38:06 +02:00
qcow2-cluster.c block: use int64_t instead of uint64_t in driver read handlers 2021-09-29 13:46:31 -05:00
qcow2-refcount.c qcow2-refcount: check_refblocks(): add separate message for reserved 2021-09-15 18:42:38 +02:00
qcow2-snapshot.c block: consistently use bdrv_is_read_only() 2021-06-02 14:23:20 +02:00
qcow2-threads.c qcow2: add zstd cluster compression 2020-05-13 14:20:31 +02:00
qcow2.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
qcow2.h qcow2-refcount: check_refblocks(): add separate message for reserved 2021-09-15 18:42:38 +02:00
qcow.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
qed-check.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed-cluster.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-l2-cache.c qed: protect table cache with CoMutex 2017-07-17 11:34:11 +08:00
qed-table.c block/qed: add missed coroutine_fn markers 2019-04-30 15:29:00 +02:00
qed.c block: use int64_t instead of int in driver write_zeroes handlers 2021-09-29 13:46:32 -05:00
qed.h qed: Simplify backing reads 2020-07-06 10:34:14 +02:00
quorum.c block: use int64_t instead of int in driver write_zeroes handlers 2021-09-29 13:46:32 -05:00
raw-format.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
rbd.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
replication.c replication: Remove workaround 2021-07-20 16:11:53 +02:00
snapshot.c block/snapshot: Clarify goto fallback behavior 2021-06-24 09:49:04 +02:00
ssh.c util/uri: do not check argument of uri_free() 2021-07-09 12:26:05 +02:00
stream.c stream: Don't crash when node permission is denied 2021-03-19 10:15:06 +01:00
throttle-groups.c block/throttle-groups: throttle_group_co_io_limits_intercept(): 64bit bytes 2021-02-03 08:14:00 -06:00
throttle.c block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
trace-events block: use int64_t instead of int in driver discard handlers 2021-09-29 13:46:32 -05:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdi.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
vhdx-endian.c Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
vhdx-log.c block: consistently use bdrv_is_read_only() 2021-06-02 14:23:20 +02:00
vhdx.c block/vhdx: Support vhdx image only with 512 bytes logical sector size 2020-09-15 11:05:13 +02:00
vhdx.h block/vhdx: Use IEC binary prefixes for size constants 2019-04-30 15:29:00 +02:00
vmdk.c block: use int64_t instead of int in driver write_zeroes handlers 2021-09-29 13:46:32 -05:00
vpc.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
vvfat.c block: use int64_t instead of uint64_t in driver write handlers 2021-09-29 13:46:31 -05:00
win32-aio.c Include qemu/module.h where needed, drop it from qemu-common.h 2019-06-12 13:18:33 +02:00
write-threshold.c write-threshold: deal with includes 2021-05-14 16:14:10 +02:00