qemu/migration
Eric Blake 403d18ae38 migration: Handle block device inactivation failures better
Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable.  The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail.  When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):

  (qemu) qemu-kvm: load of migration failed: Input/output error

at which point, our only hope for the guest is for the source to take
back control.  With the current code base, the host outputs a message, but then appears to resume:

  (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

  (src qemu)info status
   VM status: running

but a second migration attempt now asserts:

  (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion.  It looks like the problem is as follows:

In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices.  In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server.  With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all().  Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.

Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive.  Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again.  Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont().  And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.

Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.

See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:29:00 +02:00
..
block-dirty-bitmap.c migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
block.c migration/block: replace uses of blk_nb_sectors that do not check result 2023-04-11 16:40:53 +02:00
block.h migration: disable auto-converge during bulk block migration 2017-09-27 11:27:14 +01:00
channel-block.c io: Add support for MSG_PEEK for socket channel 2023-02-06 19:22:56 +01:00
channel-block.h migration: introduce a QIOChannel impl for BlockDriverState VMState 2022-06-22 19:33:43 +01:00
channel.c migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
channel.h migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
colo-failover.c migration/colo: Improve an x-colo-lost-heartbeat error message 2023-02-23 14:10:17 +01:00
colo.c error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
dirtyrate.c *: Add missing includes of qemu/error-report.h 2023-03-22 15:06:57 +00:00
dirtyrate.h migration/dirtyrate: Refactor dirty page rate calculation 2022-07-20 12:15:08 +01:00
exec.c *: Add missing includes of qemu/error-report.h 2023-03-22 15:06:57 +00:00
exec.h migration: Export exec.c functions in its own file 2017-06-01 18:49:22 +02:00
fd.c monitor: Use getter/setter functions for cur_mon 2020-10-09 07:08:19 +02:00
fd.h migration: Fix fd protocol for incoming defer 2019-06-05 12:43:55 +02:00
global_state.c migration: Silence compiler warning in global_state_store_running() 2020-10-02 12:28:48 +01:00
meson.build migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
migration-hmp-cmds.c error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
migration.c migration: Handle block device inactivation failures better 2023-04-24 11:29:00 +02:00
migration.h migration: Recover behavior of preempt channel creation for pre-7.2 2023-04-12 21:44:56 +02:00
multifd-zlib.c multifd: Create page_size fields into both MultiFD{Recv,Send}Params 2022-12-15 10:30:37 +01:00
multifd-zstd.c multifd: Create page_size fields into both MultiFD{Recv,Send}Params 2022-12-15 10:30:37 +01:00
multifd.c migration: Make dirty_sync_missed_zero_copy atomic 2023-04-24 11:28:57 +02:00
multifd.h migration/multifd: Move load_cleanup inside incoming_state_destroy 2023-02-13 03:45:40 +01:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c postcopy-ram: do not use qatomic_mb_read 2023-04-20 11:17:35 +02:00
postcopy-ram.h migration: Postpone postcopy preempt channel to be after main 2023-02-11 16:51:09 +01:00
qemu-file.c migration: mark mixed functions that can suspend 2023-04-20 11:17:35 +02:00
qemu-file.h migration: mark mixed functions that can suspend 2023-04-20 11:17:35 +02:00
ram.c migration: Rename normal to normal_pages 2023-04-24 11:29:00 +02:00
ram.h migration: Rename normal to normal_pages 2023-04-24 11:29:00 +02:00
rdma.c migration/rdma: Remove deprecated variable rdma_return_path 2023-03-16 16:07:07 +01:00
rdma.h migration: Export rdma.c functions in its own file 2017-06-01 18:49:23 +02:00
savevm.c migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
savevm.h migration: Rename res_{postcopy,precopy}_only 2023-02-15 20:04:30 +01:00
socket.c migration: Postcopy preemption preparation on channel creation 2022-07-20 12:15:08 +01:00
socket.h migration: Postcopy preemption preparation on channel creation 2022-07-20 12:15:08 +01:00
target.c migration: fix populate_vfio_info 2023-03-16 16:07:07 +01:00
threadinfo.c migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
threadinfo.h migration: Introduce interface query-migrationthreads 2023-02-06 19:22:57 +01:00
tls.c cleanup: Tweak and re-run return_directly.cocci 2022-12-14 16:19:35 +01:00
tls.h migration: Add helpers to detect TLS capability 2022-07-20 12:15:08 +01:00
trace-events migration: Remove unused res_compatible 2023-02-15 20:04:30 +01:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c Move CPU softfloat unions to cpu-float.h 2022-04-06 14:31:43 +02:00
vmstate.c migration: Add canary to VMSTATE_END_OF_LIST 2023-02-06 19:22:56 +01:00
xbzrle.c migration/xbzrle: fix out-of-bounds write with axv512 2023-03-16 16:07:07 +01:00
xbzrle.h AVX512 support for xbzrle_encode_buffer 2023-02-11 16:51:09 +01:00
yank_functions.c migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00
yank_functions.h migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00