We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write_zeroes handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
max_write_zeroes is limited to INT_MAX. So, updated functions all are
safe, they will not get "bytes" larger than before.
Still, let's look through all updated functions, and add assertions to
the ones which are actually unprepared to values larger than INT_MAX.
For these drivers also set explicit max_pwrite_zeroes limit.
Let's go:
blkdebug: calculations can't overflow, thanks to
bdrv_check_qiov_request() in generic layer. rule_check() and
bdrv_co_pwrite_zeroes() both have 64bit argument.
blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.
blkreplay, copy-on-read, filter-compress: pass to
bdrv_co_pwrite_zeroes() which is OK
copy-before-write: Calls cbw_do_copy_before_write() and
bdrv_co_pwrite_zeroes, both have 64bit argument.
file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
In raw_do_pwrite_zeroes() calculations are OK due to
bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
which is uint64_t.
Check also where that uint64_t gets handed:
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap. All look safe.
gluster: bytes go to GlusterAIOCB::size which is int64_t and to
glfs_zerofill_async works with off_t.
iscsi: Aha, here we deal with iscsi_writesame16_task() that has
uint32_t num_blocks argument and iscsi_writesame16_task() has
uint16_t argument. Make comments, add assertions and clarify
max_pwrite_zeroes calculation.
iscsi_allocmap_() functions already has int64_t argument
is_byte_request_lun_aligned is simple to update, do it.
mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t
argument
nbd: Aha, here we have protocol limitation, and NBDRequest::len is
uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
OK for now.
nvme: Again, protocol limitation. And no inherent limit for
write-zeroes at all. But from code that calculates cdw12 it's obvious
that we do have limit and alignment. Let's clarify it. Also,
obviously the code is not prepared to handle bytes=0. Let's handle
this case too.
trace events already 64bit
preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both
64bit.
rbd: pass to qemu_rbd_start_co() which is 64bit.
qcow2: offset + bytes and alignment still works good (thanks to
bdrv_check_qiov_request()), so tail calculation is OK
qcow2_subcluster_zeroize() has 64bit argument, should be OK
trace events updated
qed: qed_co_request wants int nb_sectors. Also in code we have size_t
used for request length which may be 32bit. So, let's just keep
INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
don't care.
raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
64bit.
throttle: Both throttle_group_co_io_limits_intercept() and
bdrv_co_pwrite_zeroes() are 64bit.
vmdk: pass to vmdk_pwritev which is 64bit
quorum: pass to quorum_co_pwritev() which is 64bit
Hooray!
At this point all block drivers are prepared to support 64bit
write-zero requests, or have explicitly set max_pwrite_zeroes.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: use <= rather than < in assertions relying on max_pwrite_zeroes]
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver write handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in
block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to
be non-negative.
qcow2_save_vmstate() does bdrv_check_qiov_request().
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
shows several callers:
qcow2:
qcow2_co_truncate() write at most up to @offset, which is checked in
generic qcow2_co_truncate() by bdrv_check_request().
qcow2_co_pwritev_compressed_task() pass the request (or part of the
request) that already went through normal write path, so it should
be OK
qcow:
qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch
quorum:
quorum_co_pwrite_zeroes() pass int64_t and int - OK
throttle:
throttle_co_pwritev_compressed() pass int64_t, it's updated by this
patch
vmdk:
vmdk_co_pwritev_compressed() pass int64_t, it's updated by this
patch
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver read handlers parameters which are already 64bit to
signed type.
While being here, convert also flags parameter to be BdrvRequestFlags.
Now let's consider all callers. Simple
git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?'
shows that's there three callers of driver function:
bdrv_driver_preadv() in block/io.c, passes int64_t, checked by
bdrv_check_qiov_request() to be non-negative.
qcow2_load_vmstate() does bdrv_check_qiov_request().
do_perform_cow_read() has uint64_t argument. And a lot of things in
qcow2 driver are uint64_t, so converting it is big job. But we must
not work with requests that don't satisfy bdrv_check_qiov_request(),
so let's just assert it here.
Still, the functions may be called directly, not only by drv->...
Let's check:
git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \
awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \
while read func; do git grep "$func(" | \
grep -v "$func(BlockDriverState"; done
The only one such caller:
QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1);
...
ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0);
in tests/unit/test-bdrv-drain.c, and it's OK obviously.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: fix typos]
Signed-off-by: Eric Blake <eblake@redhat.com>
Although unlikely, qemu might hang in nbd_send_request().
Allow recovery in this case by registering the yank function before
calling it.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <20210704000730.1befb596@gecko.fritz.box>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Avoid accessing QCryptoTLSCreds internals by using
the qcrypto_tls_creds_check_endpoint() helper.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
req->receiving is a flag of request being in one concrete yield point
in nbd_co_do_receive_one_chunk().
Such kind of boolean flag is always better to unset before scheduling
the coroutine, to avoid double scheduling. So, let's be more careful.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-33-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We already have two similar helpers for other state. Let's add another
one for convenience.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-32-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
The only last step we need to reuse the function is coroutine-wrapper.
nbd_open() may be called from non-coroutine context. So, generate the
wrapper and use it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-31-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We'll need a possibility of non-blocking nbd_co_establish_connection(),
so that it returns immediately, and it returns success only if a
connections was previously established in background.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-30-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Split out the part that we want to reuse for nbd_open().
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-29-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
block/nbd doesn't need underlying sioc channel anymore. So, we can
update nbd/client-connection interface to return only one top-most io
channel, which is more straight forward.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-27-vsementsov@virtuozzo.com>
[eblake: squash in Vladimir's fixes for uninit usage caught by clang]
Signed-off-by: Eric Blake <eblake@redhat.com>
Currently sioc pointer is used just to pass from socket-connection to
nbd negotiation. Drop the field, and use local variables instead. With
next commit we'll update nbd/client-connection.c to behave
appropriately (return only top-most ioc, not two channels).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-26-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Negotiation during reconnect is now done in a thread, and s->sioc is
not available during negotiation. Negotiation in thread will be
cancelled by nbd_client_connection_release() called from
nbd_clear_bdrvstate(). So, we don't need this code chunk anymore.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-25-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Now that we can opt in to negotiation as part of the client connection
thread, use that to simplify connection_co. This is another step on
the way to moving all reconnect code into NBDClientConnection.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-24-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
To be reused in the following patch.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-23-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Add arguments and logic to support nbd negotiation in the same thread
after successful connection.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-20-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We now have bs-independent connection API, which consists of four
functions:
nbd_client_connection_new()
nbd_client_connection_release()
nbd_co_establish_connection()
nbd_co_establish_connection_cancel()
Move them to a separate file together with NBDClientConnection
structure which becomes private to the new API.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-18-vsementsov@virtuozzo.com>
[eblake: comment tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a last step of creating bs-independent nbd connection
interface. With next commit we can finally move it to separate file.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-17-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This is a step of creating bs-independent nbd connection interface.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-16-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to move the connection code to its own file, and want
clear names and APIs first.
The structure is shared between user and (possibly) several runs of
connect-thread. So it's wrong to call it "thread". Let's rename to
something more generic.
Appropriately rename connect_thread and thr variables to conn.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-15-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
nbd_co_establish_connection_cancel() actually needs only pointer to
NBDConnectThread. So, make it clean.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-14-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to split connection code to a separate file. Now we are
ready to give nbd_co_establish_connection() clean and bs-independent
interface.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-13-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We don't need all these states. The code refactored to use two boolean
variables looks simpler.
While moving the comment in nbd_co_establish_connection() rework it to
give better information. Also, we are going to move the connection code
to separate file and mentioning drained section would be confusing.
Improve also the comment in NBDConnectThread, while dropping removed
state names from it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-12-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: comment tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>
Instead of managing connect_bh, bh_ctx, and wait_connect fields, we
can use a single link to the waiting coroutine with proper mutex
protection.
So new logic is:
nbd_co_establish_connection() sets wait_co under the mutex, releases
the mutex, then yield()s. Note that wait_co may be scheduled by the
thread immediately after unlocking the mutex. Still, the main thread
(or iothread) will not reach the code for entering the coroutine until
the yield(), so we are safe.
connect_thread_func() and nbd_co_establish_connection_cancel() do
the following to handle wait_co:
Under the mutex, if thr->wait_co is not NULL, make it NULL and
schedule it. This way, we avoid scheduling the coroutine twice.
Still scheduling is a bit different:
In connect_thread_func() we can just call aio_co_wake under mutex,
after commit
[async: the main AioContext is only "current" if under the BQL]
we are sure that aio_co_wake() will not try to acquire the aio context
and do qemu_aio_coroutine_enter() but simply schedule the coroutine by
aio_co_schedule().
nbd_co_establish_connection_cancel() will be called from non-coroutine
context in further patch and will be able to go through
qemu_aio_coroutine_enter() path of aio_co_wake(). So keep current
behavior of waking the coroutine after the critical section.
Also, this commit reduces the dependence of
nbd_co_establish_connection() on the internals of bs (we now use a
generic pointer to the coroutine, instead of direct use of
s->connection_co). This is a step towards splitting the connection
API out of nbd.c.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-11-vsementsov@virtuozzo.com>
Reviewied-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
These fields are write-only. Drop them.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-10-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Message-Id: <20210610100802.5888-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Simplify lifetime management of BDRVNBDState->connect_thread by
delaying the possible cleanup of it until the BDRVNBDState itself goes
away.
This also reverts
0267101af6 "block/nbd: fix possible use after free of s->connect_thread"
as now s->connect_thread can't be cleared until the very end.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
[vsementsov: rebase, revert 0267101af6 changes]
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: tweak comment]
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-8-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Detecting monitor by current coroutine works bad when we are not in
coroutine context. And that's exactly so in nbd reconnect code, where
qio_channel_socket_connect_sync() is called from thread.
Monitor is needed only to parse named file descriptor. So, let's just
parse it during nbd_open(), so that all further users of s->saddr don't
need to access monitor.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-7-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
nbd_open() does it (through nbd_establish_connection()).
Actually we lost that call on reconnect path in 1dc4718d84
"block/nbd: use non-blocking connect: fix vm hang on connect()"
when we have introduced reconnect thread.
Fixes: 1dc4718d84
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210610100802.5888-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We have two "return error" paths in nbd_open() after
nbd_process_options(). Actually we should call nbd_clear_bdrvstate()
on these paths. Interesting that nbd_process_options() calls
nbd_clear_bdrvstate() by itself.
Let's fix leaks and refactor things to be more obvious:
- intialize yank at top of nbd_open()
- move yank cleanup to nbd_clear_bdrvstate()
- refactor nbd_open() so that all failure paths except for
yank-register goes through nbd_clear_bdrvstate()
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
nbd_free_connect_thread leaks the channel object if it hasn't been
stolen.
Unref it and fix the leak.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210610100802.5888-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Right now, users of qemu_co_sleep_ns_wakeable are simply passing
a pointer to QemuCoSleepState by reference to the function. But
QemuCoSleepState really is just a Coroutine*; making the
content of the struct public is just as efficient and lets us
skip the user_state_pointer indirection.
Since the usage is changed, take the occasion to rename the
struct to QemuCoSleep.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-6-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
All callers of qemu_co_sleep_wake are checking whether they are passing
a NULL argument inside the pointer-to-pointer: do the check in
qemu_co_sleep_wake itself.
As a side effect, qemu_co_sleep_wake can be called more than once and
it will only wake the coroutine once; after the first time, the argument
will be set to NULL via *sleep_state->user_state_pointer. However, this
would not be safe unless co_sleep_cb keeps using the QemuCoSleepState*
directly, so make it go through the pointer-to-pointer instead.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20210517100548.28806-4-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.
Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.
Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.
After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210406155114.1057355-1-vsementsov@virtuozzo.com>
Reviewed-by: Roman Kagan <rvkagan@yandex-team.ru>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Just stop waiting for connection in existing requests.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20210205163720.887197-3-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When an NBD block driver state is moved from one aio_context to another
(e.g. when doing a drain in a migration thread),
nbd_client_attach_aio_context_bh is executed that enters the connection
coroutine.
However, the assumption that ->connection_co is always present here
appears incorrect: the connection may have encountered an error other
than -EIO in the underlying transport, and thus may have decided to quit
rather than keep trying to reconnect, and therefore it may have
terminated the connection coroutine. As a result an attempt to reassign
the client in this state (NBD_CLIENT_QUIT) to a different aio_context
leads to a null pointer dereference:
#0 qio_channel_detach_aio_context (ioc=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/io/channel.c:452
#1 0x0000562a242824b3 in bdrv_detach_aio_context (bs=0x562a268d6a00)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6151
#2 bdrv_set_aio_context_ignore (bs=bs@entry=0x562a268d6a00,
new_context=new_context@entry=0x562a260c9580,
ignore=ignore@entry=0x7feeadc9b780)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6230
#3 0x0000562a24282969 in bdrv_child_try_set_aio_context
(bs=bs@entry=0x562a268d6a00, ctx=0x562a260c9580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/block.c:6332
#4 0x0000562a242bb7db in blk_do_set_aio_context (blk=0x562a2735d0d0,
new_context=0x562a260c9580,
update_root_node=update_root_node@entry=true, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:1989
#5 0x0000562a242be0bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-gYtjVn/qemu-5.0.1/block/block-backend.c:2010
#6 0x0000562a23fbd953 in virtio_blk_data_plane_stop (vdev=<optimized
out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#7 0x0000562a241fc7bf in virtio_bus_stop_ioeventfd (bus=0x562a260dbf08)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#8 0x0000562a23fefb2e in virtio_vmstate_change (opaque=0x562a260dbf90,
running=0, state=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/hw/virtio/virtio.c:3220
#9 0x0000562a2402ebfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-gYtjVn/qemu-5.0.1/softmmu/vl.c:1275
#10 0x0000562a23f7bc02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/cpus.c:1032
#11 0x0000562a24209765 in migration_completion (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:2914
#12 migration_iteration_run (s=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3275
#13 migration_thread (opaque=opaque@entry=0x562a260e83a0)
at /build/qemu-gYtjVn/qemu-5.0.1/migration/migration.c:3439
#14 0x0000562a2435ca96 in qemu_thread_start (args=<optimized out>)
at /build/qemu-gYtjVn/qemu-5.0.1/util/qemu-thread-posix.c:519
#15 0x00007feed31466ba in start_thread (arg=0x7feeadc9c700)
at pthread_create.c:333
#16 0x00007feed2e7c41d in __GI___sysctl (name=0x0, nlen=608471908,
oldval=0x562a2452b138, oldlenp=0x0, newval=0x562a2452c5e0
<__func__.28102>, newlen=0)
at ../sysdeps/unix/sysv/linux/sysctl.c:30
#17 0x0000000000000000 in ?? ()
Fix it by checking that the connection coroutine is non-null before
trying to enter it. If it is null, no entering is needed, as the
connection is probably going down anyway.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-3-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
When the reconnect in NBD client is in progress, the iochannel used for
NBD connection doesn't exist. Therefore an attempt to detach it from
the aio_context of the parent BlockDriverState results in a NULL pointer
dereference.
The problem is triggerable, in particular, when an outgoing migration is
about to finish, and stopping the dataplane tries to move the
BlockDriverState from the iothread aio_context to the main loop. If the
NBD connection is lost before this point, and the NBD client has entered
the reconnect procedure, QEMU crashes:
#0 qemu_aio_coroutine_enter (ctx=0x5618056c7580, co=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-coroutine.c:109
#1 0x00005618034b1b68 in nbd_client_attach_aio_context_bh (
opaque=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block/nbd.c:164
#2 0x000056180353116b in aio_wait_bh (opaque=0x7f60e1e63700)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:55
#3 0x0000561803530633 in aio_bh_call (bh=0x7f60d40a7e80)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:136
#4 aio_bh_poll (ctx=ctx@entry=0x5618056c7580)
at /build/qemu-6MF7tq/qemu-5.0.1/util/async.c:164
#5 0x0000561803533e5a in aio_poll (ctx=ctx@entry=0x5618056c7580,
blocking=blocking@entry=true)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-posix.c:650
#6 0x000056180353128d in aio_wait_bh_oneshot (ctx=0x5618056c7580,
cb=<optimized out>, opaque=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/aio-wait.c:71
#7 0x000056180345c50a in bdrv_attach_aio_context (new_context=0x5618056c7580,
bs=0x561805ed4c00) at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6172
#8 bdrv_set_aio_context_ignore (bs=bs@entry=0x561805ed4c00,
new_context=new_context@entry=0x5618056c7580,
ignore=ignore@entry=0x7f60e1e63780)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6237
#9 0x000056180345c969 in bdrv_child_try_set_aio_context (
bs=bs@entry=0x561805ed4c00, ctx=0x5618056c7580,
ignore_child=<optimized out>, errp=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/block.c:6332
#10 0x00005618034957db in blk_do_set_aio_context (blk=0x56180695b3f0,
new_context=0x5618056c7580, update_root_node=update_root_node@entry=true,
errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:1989
#11 0x00005618034980bd in blk_set_aio_context (blk=<optimized out>,
new_context=<optimized out>, errp=errp@entry=0x0)
at /build/qemu-6MF7tq/qemu-5.0.1/block/block-backend.c:2010
#12 0x0000561803197953 in virtio_blk_data_plane_stop (vdev=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/block/dataplane/virtio-blk.c:292
#13 0x00005618033d67bf in virtio_bus_stop_ioeventfd (bus=0x5618056d9f08)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio-bus.c:245
#14 0x00005618031c9b2e in virtio_vmstate_change (opaque=0x5618056d9f90,
running=0, state=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/hw/virtio/virtio.c:3220
#15 0x0000561803208bfd in vm_state_notify (running=running@entry=0,
state=state@entry=RUN_STATE_FINISH_MIGRATE)
at /build/qemu-6MF7tq/qemu-5.0.1/softmmu/vl.c:1275
#16 0x0000561803155c02 in do_vm_stop (state=RUN_STATE_FINISH_MIGRATE,
send_stop=<optimized out>) at /build/qemu-6MF7tq/qemu-5.0.1/cpus.c:1032
#17 0x00005618033e3765 in migration_completion (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:2914
#18 migration_iteration_run (s=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3275
#19 migration_thread (opaque=opaque@entry=0x5618056e6960)
at /build/qemu-6MF7tq/qemu-5.0.1/migration/migration.c:3439
#20 0x0000561803536ad6 in qemu_thread_start (args=<optimized out>)
at /build/qemu-6MF7tq/qemu-5.0.1/util/qemu-thread-posix.c:519
#21 0x00007f61085d06ba in start_thread ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007f610830641d in sysctl () from /lib/x86_64-linux-gnu/libc.so.6
#23 0x0000000000000000 in ?? ()
Fix it by checking that the iochannel is non-null before trying to
detach it from the aio_context. If it is null, no detaching is needed,
and it will get reattached in the proper aio_context once the connection
is reestablished.
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20210129073859.683063-2-rvkagan@yandex-team.ru>
Signed-off-by: Eric Blake <eblake@redhat.com>
Register a yank function which shuts down the socket and sets
s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an
error occured.
Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
This commit is the result of running the timer-del-timer-free.cocci
script on the whole source tree.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
Allow the server to expose an additional metacontext to be requested
by savvy clients. qemu-nbd adds a new option -A to expose the
qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this
can also be set via QMP when using block-export-add.
qemu as client is hacked into viewing the key aspects of this new
context by abusing the already-experimental x-dirty-bitmap option to
collapse all depths greater than 2, which results in a tri-state value
visible in the output of 'qemu-img map --output=json' (yes, that means
x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like
renaming it as it would introduce a needless break of back-compat,
even though we make no compat guarantees with x- members):
unallocated (depth 0) => "zero":false, "data":true
local (depth 1) => "zero":false, "data":false
backing (depth 2+) => "zero":true, "data":true
libnbd as client is probably a nicer way to get at the information
without having to decipher such hacks in qemu as client. ;)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20201027050556.269064-11-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
In a recent commit 12c75e20a2 we've improved
nbd_co_reconnect_loop() to not make drain wait for additional sleep.
Similarly, we shouldn't try to connect, if previous sleep was
interrupted by drain begin, otherwise drain_begin will have to wait for
the whole connection attempt.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
reconnect-delay has a design flaw: we handle it in the same loop where
we do connection attempt. So, reconnect-delay may be exceeded by
unpredictable time of connection attempt.
Let's instead use separate timer.
How to reproduce the bug:
1. Create an image on node1:
qemu-img create -f qcow2 xx 100M
2. Start NBD server on node1:
qemu-nbd xx
3. On node2 start qemu-io:
./build/qemu-io --image-opts \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15
4. Type 'read 0 512' in qemu-io interface to check that connection
works
Be careful: you should make steps 5-7 in a short time, less than 15
seconds.
5. Kill nbd server on node1
6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd
client goes to reconnect loop.
7. On node1 run the following command
sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
This will make the connect() call of qemu-io at node2 take a long time.
And you'll see that read command in qemu-io will hang for a long time,
more than 15 seconds specified by reconnect-delay parameter. It's the
bug.
8. Don't forget to drop iptables rule on node1:
sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
Important note: Step [5] is necessary to reproduce _this_ bug. If we
miss step [5], the read command (step 6) will hang for a long time and
this commit doesn't help, because there will be not long connect() to
unreachable host, but long sendmsg() to unreachable host, which should
be fixed by enabling and adjusting keep-alive on the socket, which is a
thing for further patch set.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Don't use nbd_client_detach_aio_context() driver handler where we want
to finalize the connection. We should directly use
qio_channel_detach_aio_context() in such cases. Driver handler may (and
will) contain another things, unrelated to the qio channel.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We pause reconnect process during drained section. So, if we have some
requests, waiting for reconnect we should cancel them, otherwise they
deadlock the drained section.
How to reproduce:
1. Create an image:
qemu-img create -f qcow2 xx 100M
2. Start NBD server:
qemu-nbd xx
3. Start vm with second nbd disk on node2, like this:
./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
file=/work/images/cent7.qcow2 -drive \
driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \
-vnc :0 -m 2G -enable-kvm -vga std
4. Access the vm through vnc (or some other way?), and check that NBD
drive works:
dd if=/dev/sdb of=/dev/null bs=1M count=10
- the command should succeed.
5. Now, kill the nbd server, and run dd in the guest again:
dd if=/dev/sdb of=/dev/null bs=1M count=10
Now Qemu is trying to reconnect, and dd-generated requests are waiting
for the connection (they will wait up to 60 seconds (see
reconnect-delay option above) and than fail). But suddenly, vm may
totally hang in the deadlock. You may need to increase reconnect-delay
period to catch the dead-lock.
VM doesn't respond because drain dead-lock happens in cpu thread with
global mutex taken. That's not good thing by itself and is not fixed
by this commit (true way is using iothreads). Still this commit fixes
drain dead-lock itself.
Note: probably, we can instead continue to reconnect during drained
section. To achieve this, we may move negotiation to the connect thread
to make it independent of bs aio context. But expanding drained section
doesn't seem good anyway. So, let's now fix the bug the simplest way.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
This makes nbd's connection_co yield during reconnects, so that
reconnect doesn't block the main thread. This is very important in
case of an unavailable nbd server host: connect() call may take a long
time, blocking the main thread (and due to reconnect, it will hang
again and again with small gaps of working time during pauses between
connection attempts).
Realization notes:
- We don't want to implement non-blocking connect() over non-blocking
socket, because getaddrinfo() doesn't have portable non-blocking
realization anyway, so let's just use a thread for both getaddrinfo()
and connect().
- We can't use qio_channel_socket_connect_async (which behaves
similarly and starts a thread to execute connect() call), as it's relying
on someone iterating main loop (g_main_loop_run() or something like
this), which is not always the case.
- We can't use thread_pool_submit_co API, as thread pool waits for all
threads to finish (but we don't want to wait for blocking reconnect
attempt on shutdown.
So, we just create the thread by hand. Some additional difficulties
are:
- We want our connect to avoid blocking drained sections and aio context
switches. To achieve this, we make it possible to "cancel" synchronous
wait for the connect (which is a coroutine yield actually), still,
the thread continues in background, and if successful, its result may be
reused on next reconnect attempt.
- We don't want to wait for reconnect on shutdown, so there is
CONNECT_THREAD_RUNNING_DETACHED thread state, which means that the block
layer is no longer interested in a result, and thread should close new
connected socket on finish and free the state.
How to reproduce the bug, fixed with this commit:
1. Create an image on node1:
qemu-img create -f qcow2 xx 100M
2. Start NBD server on node1:
qemu-nbd xx
3. Start vm with second nbd disk on node2, like this:
./x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \
file=/work/images/cent7.qcow2 -drive file=nbd+tcp://192.168.100.2 \
-vnc :0 -qmp stdio -m 2G -enable-kvm -vga std
4. Access the vm through vnc (or some other way?), and check that NBD
drive works:
dd if=/dev/sdb of=/dev/null bs=1M count=10
- the command should succeed.
5. Now, let's trigger nbd-reconnect loop in Qemu process. For this:
5.1 Kill NBD server on node1
5.2 run "dd if=/dev/sdb of=/dev/null bs=1M count=10" in the guest
again. The command should fail and a lot of error messages about
failing disk may appear as well.
Now NBD client driver in Qemu tries to reconnect.
Still, VM works well.
6. Make node1 unavailable on NBD port, so connect() from node2 will
last for a long time:
On node1 (Note, that 10809 is just a default NBD port):
sudo iptables -A INPUT -p tcp --dport 10809 -j DROP
After some time the guest hangs, and you may check in gdb that Qemu
hangs in connect() call, issued from the main thread. This is the
BUG.
7. Don't forget to drop iptables rule from your node1:
sudo iptables -D INPUT -p tcp --dport 10809 -j DROP
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200812145237.4396-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: minor wording and formatting tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>
We try to go to wakeable sleep, so that, if drain begins it will break
the sleep. But what if nbd_client_co_drain_begin() already called and
s->drained is already true? We'll go to sleep, and drain will have to
wait for the whole timeout. Let's improve it.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727184751.15704-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
On shutdown nbd driver may be in a connecting state. We should shutdown
it as well, otherwise we may hang in
nbd_teardown_connection, waiting for conneciton_co to finish in
BDRV_POLL_WHILE(bs, s->connection_co) loop if remote server is down.
How to reproduce the dead lock:
1. Create nbd-fault-injector.conf with the following contents:
[inject-error "mega1"]
event=data
io=readwrite
when=before
2. In one terminal run nbd-fault-injector in a loop, like this:
n=1; while true; do
echo $n; ((n++));
./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done
3. In another terminal run qemu-io in a loop, like this:
n=1; while true; do
echo $n; ((n++));
./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done
After some time, qemu-io will hang. Note, that this hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: allow drain during reconnect attempt".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20200727184751.15704-4-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
It should be safe to reenter qio_channel_yield() on io/channel read/write
path, so it's safe to reduce in_flight and allow attaching new aio
context. And no problem to allow drain itself: connection attempt is
not a guest request. Moreover, if remote server is down, we can hang
in negotiation, blocking drain section and provoking a dead lock.
How to reproduce the dead lock:
1. Create nbd-fault-injector.conf with the following contents:
[inject-error "mega1"]
event=data
io=readwrite
when=before
2. In one terminal run nbd-fault-injector in a loop, like this:
n=1; while true; do
echo $n; ((n++));
./nbd-fault-injector.py 127.0.0.1:10000 nbd-fault-injector.conf;
done
3. In another terminal run qemu-io in a loop, like this:
n=1; while true; do
echo $n; ((n++));
./qemu-io -c 'read 0 512' nbd://127.0.0.1:10000;
done
After some time, qemu-io will hang trying to drain, for example, like
this:
#3 aio_poll (ctx=0x55f006bdd890, blocking=true) at
util/aio-posix.c:600
#4 bdrv_do_drained_begin (bs=0x55f006bea710, recursive=false,
parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:427
#5 bdrv_drained_begin (bs=0x55f006bea710) at block/io.c:433
#6 blk_drain (blk=0x55f006befc80) at block/block-backend.c:1710
#7 blk_unref (blk=0x55f006befc80) at block/block-backend.c:498
#8 bdrv_open_inherit (filename=0x7fffba1563bc
"nbd+tcp://127.0.0.1:10000", reference=0x0, options=0x55f006be86d0,
flags=24578, parent=0x0, child_class=0x0, child_role=0,
errp=0x7fffba154620) at block.c:3491
#9 bdrv_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
block.c:3513
#10 blk_new_open (filename=0x7fffba1563bc "nbd+tcp://127.0.0.1:10000",
reference=0x0, options=0x0, flags=16386, errp=0x7fffba154620) at
block/block-backend.c:421
And connection_co stack like this:
#0 qemu_coroutine_switch (from_=0x55f006bf2650, to_=0x7fe96e07d918,
action=COROUTINE_YIELD) at util/coroutine-ucontext.c:302
#1 qemu_coroutine_yield () at util/qemu-coroutine.c:193
#2 qio_channel_yield (ioc=0x55f006bb3c20, condition=G_IO_IN) at
io/channel.c:472
#3 qio_channel_readv_all_eof (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
niov=1, errp=0x7fe96d729eb0) at io/channel.c:110
#4 qio_channel_readv_all (ioc=0x55f006bb3c20, iov=0x7fe96d729bf0,
niov=1, errp=0x7fe96d729eb0) at io/channel.c:143
#5 qio_channel_read_all (ioc=0x55f006bb3c20, buf=0x7fe96d729d28
"\300.\366\004\360U", buflen=8, errp=0x7fe96d729eb0) at
io/channel.c:247
#6 nbd_read (ioc=0x55f006bb3c20, buffer=0x7fe96d729d28, size=8,
desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
/work/src/qemu/master/include/block/nbd.h:365
#7 nbd_read64 (ioc=0x55f006bb3c20, val=0x7fe96d729d28,
desc=0x55f004f69644 "initial magic", errp=0x7fe96d729eb0) at
/work/src/qemu/master/include/block/nbd.h:391
#8 nbd_start_negotiate (aio_context=0x55f006bdd890,
ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
outioc=0x55f006bf19f8, structured_reply=true,
zeroes=0x7fe96d729dca, errp=0x7fe96d729eb0) at nbd/client.c:904
#9 nbd_receive_negotiate (aio_context=0x55f006bdd890,
ioc=0x55f006bb3c20, tlscreds=0x0, hostname=0x0,
outioc=0x55f006bf19f8, info=0x55f006bf1a00, errp=0x7fe96d729eb0) at
nbd/client.c:1032
#10 nbd_client_connect (bs=0x55f006bea710, errp=0x7fe96d729eb0) at
block/nbd.c:1460
#11 nbd_reconnect_attempt (s=0x55f006bf19f0) at block/nbd.c:287
#12 nbd_co_reconnect_loop (s=0x55f006bf19f0) at block/nbd.c:309
#13 nbd_connection_entry (opaque=0x55f006bf19f0) at block/nbd.c:360
#14 coroutine_trampoline (i0=113190480, i1=22000) at
util/coroutine-ucontext.c:173
Note, that the hang may be
triggered by another bug, so the whole case is fixed only together with
commit "block/nbd: on shutdown terminate connection attempt".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We are going to implement non-blocking version of
nbd_establish_connection, which for a while will be used only for
nbd_reconnect_attempt, not for nbd_open, so we need to call it
separately.
Refactor nbd_reconnect_attempt in a way which makes next commit
simpler.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20200727184751.15704-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
When converting to qcow2 compressed format, the last step is a special
zero length compressed write, ending in a call to bdrv_co_truncate(). This
call always fails for the nbd driver since it does not implement
bdrv_co_truncate().
For block devices, which have the same limits, the call succeeds since
the file driver implements bdrv_co_truncate(). If the caller asked to
truncate to the same or smaller size with exact=false, the truncate
succeeds. Implement the same logic for nbd.
Example failing without this change:
In one shell start qemu-nbd:
$ truncate -s 1g test.tar
$ qemu-nbd --socket=/tmp/nbd.sock --persistent --format=raw --offset 1536 test.tar
In another shell convert an image to qcow2 compressed via NBD:
$ echo "disk data" > disk.raw
$ truncate -s 1g disk.raw
$ qemu-img convert -f raw -O qcow2 -c disk1.raw nbd+unix:///?socket=/tmp/nbd.sock; echo $?
1
qemu-img failed, but the conversion was successful:
$ qemu-img info nbd+unix:///?socket=/tmp/nbd.sock
image: nbd+unix://?socket=/tmp/nbd.sock
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
...
$ qemu-img check nbd+unix:///?socket=/tmp/nbd.sock
No errors were found on the image.
1/16384 = 0.01% allocated, 100.00% fragmented, 100.00% compressed clusters
Image end offset: 393216
$ qemu-img compare disk.raw nbd+unix:///?socket=/tmp/nbd.sock
Images are identical.
Fixes: https://bugzilla.redhat.com/1860627
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Message-Id: <20200727215846.395443-2-nsoffer@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: typo fixes]
Signed-off-by: Eric Blake <eblake@redhat.com>