vhost-user-blk: perform immediate cleanup if disconnect on initialization

Commit 4bcad76f4c ("vhost-user-blk: delay vhost_user_blk_disconnect")
introduced postponing vhost_dev cleanup aiming to eliminate qemu aborts
because of connection problems with vhost-blk daemon.

However, it introdues a new problem. Now, any communication errors
during execution of vhost_dev_init() called by vhost_user_blk_device_realize()
lead to qemu abort on assert in vhost_dev_get_config().

This happens because vhost_user_blk_disconnect() is postponed but
it should have dropped s->connected flag by the time
vhost_user_blk_device_realize() performs a new connection opening.
On the connection opening, vhost_dev initialization in
vhost_user_blk_connect() relies on s->connection flag and
if it's not dropped, it skips vhost_dev initialization and returns
with success. Then, vhost_user_blk_device_realize()'s execution flow
goes to vhost_dev_get_config() where it's aborted on the assert.

To fix the problem this patch adds immediate cleanup on device
initialization(in vhost_user_blk_device_realize()) using different
event handlers for initialization and operation introduced in the
previous patch.
On initialization (in vhost_user_blk_device_realize()) we fully
control the initialization process. At that point, nobody can use the
device since it isn't initialized and we don't need to postpone any
cleanups, so we can do cleaup right away when there is a communication
problem with the vhost-blk daemon.
On operation we leave it as is, since the disconnect may happen when
the device is in use, so the device users may want to use vhost_dev's data
to do rollback before vhost_dev is re-initialized (e.g. in vhost_dev_set_log()).

Signed-off-by: Denis Plotnikov <den-plotnikov@yandex-team.ru>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20210325151217.262793-3-den-plotnikov@yandex-team.ru>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This commit is contained in:
Denis Plotnikov 2021-03-25 18:12:16 +03:00 committed by Michael S. Tsirkin
parent 0c99d722e7
commit bc79c87bcd

View File

@ -402,38 +402,38 @@ static void vhost_user_blk_event(void *opaque, QEMUChrEvent event,
break; break;
case CHR_EVENT_CLOSED: case CHR_EVENT_CLOSED:
/* /*
* A close event may happen during a read/write, but vhost * Closing the connection should happen differently on device
* code assumes the vhost_dev remains setup, so delay the * initialization and operation stages.
* stop & clear. There are two possible paths to hit this * On initalization, we want to re-start vhost_dev initialization
* disconnect event: * from the very beginning right away when the connection is closed,
* 1. When VM is in the RUN_STATE_PRELAUNCH state. The * so we clean up vhost_dev on each connection closing.
* vhost_user_blk_device_realize() is a caller. * On operation, we want to postpone vhost_dev cleanup to let the
* 2. In tha main loop phase after VM start. * other code perform its own cleanup sequence using vhost_dev data
* * (e.g. vhost_dev_set_log).
* For p2 the disconnect event will be delayed. We can't
* do the same for p1, because we are not running the loop
* at this moment. So just skip this step and perform
* disconnect in the caller function.
*
* TODO: maybe it is a good idea to make the same fix
* for other vhost-user devices.
*/ */
if (realized) { if (realized) {
/*
* A close event may happen during a read/write, but vhost
* code assumes the vhost_dev remains setup, so delay the
* stop & clear.
*/
AioContext *ctx = qemu_get_current_aio_context(); AioContext *ctx = qemu_get_current_aio_context();
qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL, qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, NULL,
NULL, NULL, false); NULL, NULL, false);
aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque); aio_bh_schedule_oneshot(ctx, vhost_user_blk_chr_closed_bh, opaque);
}
/* /*
* Move vhost device to the stopped state. The vhost-user device * Move vhost device to the stopped state. The vhost-user device
* will be clean up and disconnected in BH. This can be useful in * will be clean up and disconnected in BH. This can be useful in
* the vhost migration code. If disconnect was caught there is an * the vhost migration code. If disconnect was caught there is an
* option for the general vhost code to get the dev state without * option for the general vhost code to get the dev state without
* knowing its type (in this case vhost-user). * knowing its type (in this case vhost-user).
*/ */
s->dev.started = false; s->dev.started = false;
} else {
vhost_user_blk_disconnect(dev);
}
break; break;
case CHR_EVENT_BREAK: case CHR_EVENT_BREAK:
case CHR_EVENT_MUX_IN: case CHR_EVENT_MUX_IN: