Commit Graph

1749 Commits

Author SHA1 Message Date
Juan Quintela
1dfc4b9e19 migration: Move parameters functions to option.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
f80196b772 migration: Move migrate_cap_set() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
45c1de13f0 migration: Move qmp_migrate_set_capabilities() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
4d0c6b695b migration: Move qmp_query_migrate_capabilities() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
7760870645 migration: Move migrate_caps_check() to options.c
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
17cba690cd migration: Create migrate_rdma_pin_all() function
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

---

Fixed missing space after comma (fabiano)
2023-04-24 15:01:46 +02:00
Juan Quintela
38ad1110e3 migration: Move migrate_use_return() to options.c
Once that we are there, we rename the function to migrate_return_path()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
9d4b1e5f22 migration: Move migrate_use_block() to options.c
Once that we are there, we rename the function to migrate_block()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
87dca0c9bb migration: Move migrate_use_xbzrle() to options.c
Once that we are there, we rename the function to migrate_xbzrle()
to be consistent with all other capabilities.
We change the type to return bool also for consistency.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
b4bc342c76 migration: Move migrate_use_zero_copy_send() to options.c
Once that we are there, we rename the function to
migrate_zero_copy_send() to be consistent with all other capabilities.

We can remove the CONFIG_LINUX guard.  We already check that we can't
setup this capability in migrate_caps_check().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
51b07548f7 migration: Move migrate_use_multifd() to options.c
Once that we are there, we rename the function to migrate_multifd()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
b890902c9c migration: Move migrate_use_events() to options.c
Once that we are there, we rename the function to migrate_events()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
a7a94d1435 migration: Move migrate_use_compression() to options.c
Once that we are there, we rename the function to migrate_compress()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
5e80464455 migration: Move migrate_colo_enabled() to options.c
Once that we are there, we rename the function to migrate_colo() to be
consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
1f0776f1c0 migration: Create options.c
We move there all capabilities helpers from migration.c.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

Following David advise:
- looked through the history, capabilities are newer than 2012, so we
  can remove that bit of the header.
- This part is posterior to Anthony.
  Original Author is Orit. Once there,
  I put myself.  Peter Xu also did quite a bit of work here.
  Anyone else wants/needs to be there?  I didn't search too hard
  because nobody asked before to be added.

What do you think?
2023-04-24 15:01:46 +02:00
Juan Quintela
9eb1109cfb migration: Create migrate_cap_set()
And remove the convoluted use of qmp_migrate_set_capabilities() to
enable disable MIGRATION_CAPABILITY_BLOCK.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-04-24 15:01:46 +02:00
Juan Quintela
f9e1ef7482 spice: move client_migrate_info command to ui/
It has nothing to do with migration, except for the "migrate" in the
name of the command.  Move it with the rest of the ui commands.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-04-24 15:01:46 +02:00
Juan Quintela
c938157713 migration: move migration_global_dump() to migration-hmp-cmds.c
It is only used there, so we can make it static.
Once there, remove spice.h that it is not used.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

---

fix David Edmonson ui/qemu-spice.h unintended removal
2023-04-24 15:01:46 +02:00
Eric Blake
5d39f44d7a migration: Minor control flow simplification
No need to declare a temporary variable.

Suggested-by: Juan Quintela <quintela@redhat.com>
Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better")
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 15:01:46 +02:00
Juan Quintela
b02c7fc9ef migration: Pass migrate_caps_check() the old and new caps
We used to pass the old capabilities array and the new
capabilities as a list.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 11:29:02 +02:00
Juan Quintela
0cec2056ff migration: rename enabled_capabilities to capabilities
It is clear from the context what that means, and such a long name
with the extra long names of the capabilities make very difficilut to
stay inside the 80 columns limit.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 11:29:01 +02:00
Peter Xu
ae30b9b289 migration/postcopy: Detect file system on dest host
Postcopy requires the memory support userfaultfd to work.  Right now we
check it but it's a bit too late (when switching to postcopy migration).

Do that early right at enabling of postcopy.

Note that this is still only a best effort because ramblocks can be
dynamically created.  We can add check in hostmem creations and fail if
postcopy enabled, but maybe that's too aggressive.

Still, we have chance to fail the most obvious where we know there's an
existing unsupported ramblock.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:29:01 +02:00
Eric Blake
403d18ae38 migration: Handle block device inactivation failures better
Consider what happens when performing a migration between two host
machines connected to an NFS server serving multiple block devices to
the guest, when the NFS server becomes unavailable.  The migration
attempts to inactivate all block devices on the source (a necessary
step before the destination can take over); but if the NFS server is
non-responsive, the attempt to inactivate can itself fail.  When that
happens, the destination fails to get the migrated guest (good,
because the source wasn't able to flush everything properly):

  (qemu) qemu-kvm: load of migration failed: Input/output error

at which point, our only hope for the guest is for the source to take
back control.  With the current code base, the host outputs a message, but then appears to resume:

  (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

  (src qemu)info status
   VM status: running

but a second migration attempt now asserts:

  (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.

Whether the guest is recoverable on the source after the first failure
is debatable, but what we do not want is to have qemu itself fail due
to an assertion.  It looks like the problem is as follows:

In migration.c:migration_completion(), the source sets 'inactivate' to
true (since COLO is not enabled), then tries
savevm.c:qemu_savevm_state_complete_precopy() with a request to
inactivate block devices.  In turn, this calls
block.c:bdrv_inactivate_all(), which fails when flushing runs up
against the non-responsive NFS server.  With savevm failing, we are
now left in a state where some, but not all, of the block devices have
been inactivated; but migration_completion() then jumps to 'fail'
rather than 'fail_invalidate' and skips an attempt to reclaim those
those disks by calling bdrv_activate_all().  Even if we do attempt to
reclaim disks, we aren't taking note of failure there, either.

Thus, we have reached a state where the migration engine has forgotten
all state about whether a block device is inactive, because we did not
set s->block_inactive in enough places; so migration allows the source
to reach vm_start() and resume execution, violating the block layer
invariant that the guest CPUs should not be restarted while a device
is inactive.  Note that the code in migration.c:migrate_fd_cancel()
will also try to reactivate all block devices if s->block_inactive was
set, but because we failed to set that flag after the first failure,
the source assumes it has reclaimed all devices, even though it still
has remaining inactivated devices and does not try again.  Normally,
qmp_cont() will also try to reactivate all disks (or correctly fail if
the disks are not reclaimable because NFS is not yet back up), but the
auto-resumption of the source after a migration failure does not go
through qmp_cont().  And because we have left the block layer in an
inconsistent state with devices still inactivated, the later migration
attempt is hitting the assertion failure.

Since it is important to not resume the source with inactive disks,
this patch marks s->block_inactive before attempting inactivation,
rather than after succeeding, in order to prevent any vm_start() until
it has successfully reactivated all devices.

See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Acked-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:29:00 +02:00
Juan Quintela
8c0cda8fa0 migration: Rename normal to normal_pages
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of full pages transferred.
The name "normal" refered to the fact that they were sent without any
optimization (compression, xbzrle, zero_page, ...).

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:29:00 +02:00
Juan Quintela
1a386e8de5 migration: Rename duplicate to zero_pages
Rest of counters that refer to pages has a _pages suffix.
And historically, this showed the number of pages composed of the same
character, here comes the name "duplicated".  But since years ago, it
refers to the number of zero_pages.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:59 +02:00
Juan Quintela
3c764f9b2b migration: Make postcopy_requests atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:59 +02:00
Juan Quintela
536b5a4e56 migration: Make dirty_sync_count atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:58 +02:00
Juan Quintela
296a4ac2aa migration: Make downtime_bytes atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:58 +02:00
Juan Quintela
b013b5d1f3 migration: Make precopy_bytes atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:58 +02:00
Juan Quintela
4291823694 migration: Make dirty_sync_missed_zero_copy atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:57 +02:00
Juan Quintela
cf671116fa migration: Make multifd_bytes atomic
In the spirit of:

commit 394d323bc3451e4d07f13341cb8817fac8dfbadd
Author: Peter Xu <peterx@redhat.com>
Date:   Tue Oct 11 17:55:51 2022 -0400

    migration: Use atomic ops properly for page accountings

Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:28:57 +02:00
Juan Quintela
30fb22cda4 migration: Update atomic stats out of the mutex
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:28:56 +02:00
Juan Quintela
abce5fa16d migration: Merge ram_counters and ram_atomic_counters
Using MgrationStats as type for ram_counters mean that we didn't have
to re-declare each value in another struct. The need of atomic
counters have make us to create MigrationAtomicStats for this atomic
counters.

Create RAMStats type which is a merge of MigrationStats and
MigrationAtomicStats removing unused members.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

---

Fix typos found by David Edmondson
2023-04-24 11:28:56 +02:00
李皆俊
8ebb6ecc37 migration: remove extra whitespace character for code style
Fix code style.

Signed-off-by: 李皆俊 <a_lijiejun@163.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:28:55 +02:00
Paolo Bonzini
4592eaf387 postcopy-ram: do not use qatomic_mb_read
It does not even pair with a qatomic_mb_set(), so it is clearer to use
load-acquire in this case; they are synonyms.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-20 11:17:35 +02:00
Paolo Bonzini
394b9407e4 migration: mark mixed functions that can suspend
There should be no paths from a coroutine_fn to aio_poll, however in
practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine()
path.  By marking mixed functions, we can track accurately the call paths
that execute entirely in coroutine context, and find more missing
coroutine_fn markers.  This results in more accurate checks that
coroutine code does not end up blocking.

If the marking were extended transitively to all functions that call
these ones, static analysis could be done much more efficiently.
However, this is a start and makes it possible to use vrc's path-based
searches to find potential bugs where coroutine_fns call blocking functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-20 11:17:35 +02:00
Juan Quintela
28ef5339c3 migration: fix ram_state_pending_exact()
I removed that bit on commit:

commit c8df4a7aef
Author: Juan Quintela <quintela@redhat.com>
Date:   Mon Oct 3 02:00:03 2022 +0200

    migration: Split save_live_pending() into state_pending_*

Fixes: c8df4a7aef
Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12 22:47:50 +02:00
Lukas Straub
37502df32c migration/ram.c: Fix migration with compress enabled
Since ec6f3ab9, migration with compress enabled was broken, because
the compress threads use a dummy QEMUFile which just acts as a
buffer and that commit accidentally changed it to use the outgoing
migration channel instead.

Fix this by using the dummy file again in the compress threads.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12 21:51:34 +02:00
Peter Xu
06064a6715 migration: Recover behavior of preempt channel creation for pre-7.2
In 8.0 devel window we reworked preempt channel creation, so that there'll
be no race condition when the migration channel and preempt channel got
established in the wrong order in commit 5655aab079.

However no one noticed that the change will also be not compatible with
older qemus, majorly 7.1/7.2 versions where preempt mode started to be
supported.

Leverage the same pre-7.2 flag introduced in the previous patch to recover
the behavior hopefully before 8.0 releases, so we don't break migration
when we migrate from 8.0 to older qemu binaries.

Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12 21:44:56 +02:00
Peter Xu
6621883f93 migration: Fix potential race on postcopy_qemufile_src
postcopy_qemufile_src object should be owned by one thread, either the main
thread (e.g. when at the beginning, or at the end of migration), or by the
return path thread (when during a preempt enabled postcopy migration).  If
that's not the case the access to the object might be racy.

postcopy_preempt_shutdown_file() can be potentially racy, because it's
called at the end phase of migration on the main thread, however during
which the return path thread hasn't yet been recycled; the recycle happens
in await_return_path_close_on_source() which is after this point.

It means, logically it's posslbe the main thread and the return path thread
are both operating on the same qemufile.  While I don't think qemufile is
thread safe at all.

postcopy_preempt_shutdown_file() used to be needed because that's where we
send EOS to dest so that dest can safely shutdown the preempt thread.

To avoid the possible race, remove this only place that a race can happen.
Instead we figure out another way to safely close the preempt thread on
dest.

The core idea during postcopy on deciding "when to stop" is that dest will
send a postcopy SHUT message to src, telling src that all data is there.
Hence to shut the dest preempt thread maybe better to do it directly on
dest node.

This patch proposed such a way that we change postcopy_prio_thread_created
into PreemptThreadStatus, so that we kick the preempt thread on dest qemu
by a sequence of:

  mis->preempt_thread_status = PREEMPT_THREAD_QUIT;
  qemu_file_shutdown(mis->postcopy_qemufile_dst);

While here shutdown() is probably so far the easiest way to kick preempt
thread from a blocked qemu_get_be64().  Then it reads preempt_thread_status
to make sure it's not a network failure but a willingness to quit the
thread.

We could have avoided that extra status but just rely on migration status.
The problem is postcopy_ram_incoming_cleanup() is just called early enough
so we're still during POSTCOPY_ACTIVE no matter what.. So just make it
simple to have the status introduced.

One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of
postcopy preempt.

Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-12 21:44:38 +02:00
Paolo Bonzini
2c5451ca52 migration/block: replace uses of blk_nb_sectors that do not check result
Uses of blk_nb_sectors must check whether the result is negative.
Otherwise, underflow can happen.  Fortunately, alloc_aio_bitmap()
and bmds_aio_inflight() both have an alternative way to retrieve the
number of sectors in the file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20230407153303.391121-6-pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-11 16:40:53 +02:00
Richard Henderson
cc37d98bfb *: Add missing includes of qemu/error-report.h
This had been pulled in via qemu/plugin.h from hw/core/cpu.h,
but that will be removed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org>
[AJB: add various additional cases shown by CI]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org>
Reviewed-by: Emilio Cota <cota@braap.org>
2023-03-22 15:06:57 +00:00
Steve Sistare
fa76c854ae migration: fix populate_vfio_info
Include CONFIG_DEVICES so that populate_vfio_info is instantiated for
CONFIG_VFIO.  Without it, the 'info migrate' command never returns
info about vfio.

Fixes: 43bd0bf30f ("migration: Move populate_vfio_info() into a separate file")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Wei Wang
ff1585d1d8 migration/multifd: correct multifd_send_thread to trace the flags
The p->flags could be updated via the send_prepare callback, e.g. OR-ed
with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the
local "flags" before the send_prepare callback could only get partial of
p->flags. Fix it by moving the assignment of p->flags to the local flags
after the callback, so that the correct flags can be traced.

Fixes: ab7cbb0b9a ("multifd: Make no compression operations into its own structure")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Li Zhijian
bf0274192a migration/rdma: Remove deprecated variable rdma_return_path
It's no longer needed since commit
44bcfd45e9 ("migration/rdma: destination: create the return patch after the first accept")

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino
1776b70f55 migration/xbzrle: fix out-of-bounds write with axv512
xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its
outer loop, causing out-of-bounds writes:

$ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw
$ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle

==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430
WRITE of size 1 at 0x62100000b100 thread T0
    #0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831
    #1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275
    #2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153
    #3 0x7fb2fb65a58d  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d)
    #4 0x7fb2fb65a333  (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333)
    #5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79)
    #6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94)
    #7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218
    #8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082)
    #9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d)

0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100)
allocated by thread T0 here:
    #0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153
    #1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0)

Fix that by performing the overflow check in the inner loop, instead.

Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino
d84a78d15d migration/xbzrle: use ctz64 to avoid undefined result
__builtin_ctzll() produces undefined results when the argument is 0.
This can be seen through test-xbzrle, which produces the following
warning:

../migration/xbzrle.c:265: runtime error: passing zero to ctz(), which is not a valid argument

Replace __builtin_ctzll() with our ctz64() wrapper which properly
handles 0.

Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Dr. David Alan Gilbert
a5382214d8 migration/rdma: Fix return-path case
The RDMA code has return-path handling code, but it's only enabled
if postcopy is enabled; if the 'return-path' migration capability
is enabled, the return path is NOT setup but the core migration
code still tries to use it and breaks.

Enable the RDMA return path if either postcopy or the return-path
capability is enabled.

bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Peter Xu
a5d35dc7e0 migration: Wait on preempt channel in preempt thread
QEMU main thread will wait until dest preempt channel established during
processing the LISTEN command (within the whole postcopy PACKAGED data), by
waiting on the semaphore postcopy_qemufile_dst_done.

That's racy, because it's possible that the dest QEMU main thread hasn't
yet accept()ed the new connection when processing the LISTEN event.  The
sem_wait() will yield the main thread without being able to run anything
else including the accept() of the new socket, which can cause deadlock
within the main thread.

To avoid the race, move the "wait channel" from main thread to the preempt
thread right at the start.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main")
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
John Berberian, Jr
c31772ad68 Fix exec migration on Windows (w32+w64).
* Use cmd instead of /bin/sh on Windows.

* Try to auto-detect cmd.exe's path, but default to a hard-coded path.

Note that this will require that gspawn-win[32|64]-helper.exe and
gspawn-win[32|64]-helper-console.exe are included in the Windows binary
distributions (cc: Stefan Weil).

Signed-off-by: "John Berberian, Jr" <jeb.study@gmail.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-02 17:06:27 +01:00