Commit Graph

2347 Commits

Author SHA1 Message Date
Fabiano Rosas
8fa1a21c6e migration/multifd: Fix clearing of mapped-ram zero pages
When the zero page detection is done in the multifd threads, we need
to iterate the second part of the pages->offset array and clear the
file bitmap for each zero page. The piece of code we merged to do that
is wrong.

The reason this has passed all the tests is because the bitmap is
initialized with zeroes already, so clearing the bits only really has
an effect during live migration and when a data page goes from having
data to no data.

Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240321201242.6009-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-22 12:12:08 -04:00
Peter Xu
910c164736 migration/postcopy: Fix high frequency sync
With current code base I can observe extremely high sync count during
precopy, as long as one enables postcopy-ram=on before switchover to
postcopy.

To provide some context of when QEMU decides to do a full sync: it checks
must_precopy (which implies "data must be sent during precopy phase"), and
as long as it is lower than the threshold size we calculated (out of
bandwidth and expected downtime) QEMU will kick off the slow/exact sync.

However, when postcopy is enabled (even if still during precopy phase), RAM
only reports all pages as can_postcopy, and report must_precopy==0.  Then
"must_precopy <= threshold_size" mostly always triggers and enforces a slow
sync for every call to migration_iteration_run() when postcopy is enabled
even if not used.  That is insane.

It turns out it was a regress bug introduced in the previous refactoring in
8.0 as reported by Nina [1]:

  (a) c8df4a7aef ("migration: Split save_live_pending() into state_pending_*")

Then a workaround patch is applied at the end of release (8.0-rc4) to fix it:

  (b) 28ef5339c3 ("migration: fix ram_state_pending_exact()")

However that "workaround" was overlooked when during the cleanup in this
9.0 release in this commit..

  (c) b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")

Then the issue was re-exposed as reported by Nina [1].

The problem with (b) is that it only fixed the case for RAM, rather than
all the rest of iterators.  Here a slow sync should only be required if all
dirty data (precopy+postcopy) is less than the threshold_size that QEMU
calculated.  It is even debatable whether a sync is needed when switched to
postcopy.  Currently ram_state_pending_exact() will be mostly noop if
switched to postcopy, and that logic seems to apply too for all the rest of
iterators, as sync dirty bitmap during a postcopy doesn't make much sense.
However let's leave such change for later, as we're in rc phase.

So rather than reusing commit (b), this patch provides the complete fix for
all iterators.  When at it, cleanup a little bit on the lines around.

[1] https://gitlab.com/qemu-project/qemu/-/issues/1565

Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Fixes: b0504edd40 ("migration: Drop unnecessary check in ram's pending_exact()")
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240320214453.584374-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-22 12:12:08 -04:00
Fabiano Rosas
bd4480b0d0 migration: Revert mapped-ram multifd support to fd: URI
This reverts commit decdc76772 in full
and also the relevant migration-tests from
7a09f09283.

After the addition of the new QAPI-based migration address API in 8.2
we've been converting an "fd:" URI into a SocketAddress, missing the
fact that the "fd:" syntax could also be used for a plain file instead
of a socket. This is a problem because the SocketAddress is part of
the API, so we're effectively asking users to create a "socket"
channel to pass in a plain file.

The easiest way to fix this situation is to deprecate the usage of
both SocketAddress and "fd:" when used with a plain file for
migration. Since this has been possible since 8.2, we can wait until
9.1 to deprecate it.

For 9.0, however, we should avoid adding further support to migration
to a plain file using the old "fd:" syntax or the new SocketAddress
API, and instead require the usage of either the old-style "file:" URI
or the FileMigrationArgs::filename field of the new API with the
"/dev/fdset/NN" syntax, both of which are already supported.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240319210941.1907-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-22 12:12:08 -04:00
Peter Maydell
c6ea92aab8 Migration pull for 9.0-rc0
- Nicholas/Phil's fix on migration corruption / inconsistent for tcg
 - Cedric's fix on block migration over n_sectors==0
 - Steve's CPR reboot documentation page
 - Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd
   use race, etc.)
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZfdZEhIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wa+1AEA0+f7nCssvsILvCY9KifYO+OUJsLodUuQ
 JW0JBz+1iPMA+wSiyIVl2Xg78Q97nJxv71UJf+1cDJENA5EMmXMnxmYK
 =SLnA
 -----END PGP SIGNATURE-----

Merge tag 'migration-20240317-pull-request' of https://gitlab.com/peterx/qemu into staging

Migration pull for 9.0-rc0

- Nicholas/Phil's fix on migration corruption / inconsistent for tcg
- Cedric's fix on block migration over n_sectors==0
- Steve's CPR reboot documentation page
- Fabiano's misc fixes on mapped-ram (IOC leak, dup() errors, fd checks, fd
  use race, etc.)

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZfdZEhIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wa+1AEA0+f7nCssvsILvCY9KifYO+OUJsLodUuQ
# JW0JBz+1iPMA+wSiyIVl2Xg78Q97nJxv71UJf+1cDJENA5EMmXMnxmYK
# =SLnA
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 17 Mar 2024 20:56:50 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20240317-pull-request' of https://gitlab.com/peterx/qemu:
  migration/multifd: Duplicate the fd for the outgoing_args
  migration/multifd: Ensure we're not given a socket for file migration
  migration: Fix iocs leaks during file and fd migration
  migration: cpr-reboot documentation
  migration: Skip only empty block devices
  physmem: Fix migration dirty bitmap coherency with TCG memory access
  physmem: Factor cpu_physical_memory_dirty_bits_cleared() out
  physmem: Expose tlb_reset_dirty_range_all()
  migration: Fix error handling after dup in file migration
  io: Introduce qio_channel_file_new_dupfd

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-18 17:16:00 +00:00
Fabiano Rosas
9adfb308c1 migration/multifd: Duplicate the fd for the outgoing_args
We currently store the file descriptor used during the main outgoing
channel creation to use it again when creating the multifd
channels.

Since this fd is used for the first iochannel, there's risk that the
QIOChannel gets freed and the fd closed while outgoing_args.fd still
has it available. This could lead to an fd-reuse bug.

Duplicate the outgoing_args fd to avoid this issue.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240315032040.7974-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-15 11:26:33 -04:00
Fabiano Rosas
73f6f9a12f migration/multifd: Ensure we're not given a socket for file migration
When doing migration using the fd: URI, QEMU will fetch the file
descriptor passed in via the monitor at
fd_start_outgoing|incoming_migration(), which means the checks at
migration_channels_and_transport_compatible() happen too soon and we
don't know at that point whether the FD refers to a plain file or a
socket.

For this reason, we've been allowing a migration channel of type
SOCKET_ADDRESS_TYPE_FD to pass the initial verifications in scenarios
where the socket migration is not supported, such as with fd + multifd.

The commit decdc76772 ("migration/multifd: Add mapped-ram support to
fd: URI") was supposed to add a second check prior to starting
migration to make sure a socket fd is not passed instead of a file fd,
but failed to do so.

Add the missing verification and update the comment explaining this
situation which is currently incorrect.

Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240315032040.7974-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-15 11:26:33 -04:00
Fabiano Rosas
74228c598f migration: Fix iocs leaks during file and fd migration
The memory for the io channels is being leaked in three different ways
during file migration:

1) if the offset check fails we never drop the ioc reference;

2) we allocate an extra channel for no reason;

3) if multifd is enabled but channel creation fails when calling
   dup(), we leave the previous channels around along with the glib
   polling;

Fix all issues by restructuring the code to first allocate the
channels and only register the watches when all channels have been
created.

For multifd, the file and fd migrations can share code because both
are backed by a QIOChannelFile. For the non-multifd case, the fd needs
to be separate because it is backed by a QIOChannelSocket.

Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support")
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Reported-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240313212824.16974-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-14 11:39:08 -04:00
Cédric Le Goater
2e128776dc migration: Skip only empty block devices
The block .save_setup() handler calls a helper routine
init_blk_migration() which builds a list of block devices to take into
account for migration. When one device is found to be empty (sectors
== 0), the loop exits and all the remaining devices are ignored. This
is a regression introduced when bdrv_iterate() was removed.

Change that by skipping only empty devices.

Cc: Markus Armbruster <armbru@redhat.com>
Cc: qemu-stable <qemu-stable@nongnu.org>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Fixes: fea68bb6e9 ("block: Eliminate bdrv_iterate(), use bdrv_next()")
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Link: https://lore.kernel.org/r/20240312120431.550054-1-clg@redhat.com
[peterx: fix "Suggested-by:"]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-13 07:33:41 -04:00
Fabiano Rosas
c827fafcaa migration: Fix error handling after dup in file migration
The file migration code was allowing a possible -1 from a failed call
to dup() to propagate into the new QIOFileChannel::fd before checking
for validity. Coverity doesn't like that, possibly due to the the
lseek(-1, ...) call that would ensue before returning from the channel
creation routine.

Use the newly introduced qio_channel_file_dupfd() to properly check
the return of dup() before proceeding.

Fixes: CID 1539961
Fixes: CID 1539965
Fixes: CID 1539960
Fixes: 2dd7ee7a51 ("migration/multifd: Add incoming QIOChannelFile support")
Fixes: decdc76772 ("migration/multifd: Add mapped-ram support to fd: URI")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240311233335.17299-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-12 15:22:23 -04:00
Peter Maydell
e692f9c6a6 * Add missing ERRP_GUARD() statements in functions that need it
* Prefer fast cpu_env() over slower CPU QOM cast macro
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmXwPhYRHHRodXRoQHJl
 ZGhhdC5jb20ACgkQLtnXdP5wLbWHvBAAgKx5LHFjz3xREVA+LkDTQ49mz0lK3s32
 SGvNlIHjiaDGVttVYhVC4sinBWUruG4Lyv/2QN72OJBzn6WUsEUQE3KPH1d7Y3/s
 wS9X7mj70n4kugWJqeIJP5AXSRasHmWoQ4QJLVQRJd6+Eb9jqwep0x7bYkI1de6D
 bL1Q7bIfkFeNQBXaiPWAm2i+hqmT4C1r8HEAGZIjAsMFrjy/hzBEjNV+pnh6ZSq9
 Vp8BsPWRfLU2XHm4WX0o8d89WUMAfUGbVkddEl/XjIHDrUD+Zbd1HAhLyfhsmrnE
 jXIwSzm+ML1KX4MoF5ilGtg8Oo0gQDEBy9/xck6G0HCm9lIoLKlgTxK9glr2vdT8
 yxZmrM9Hder7F9hKKxmb127xgU6AmL7rYmVqsoQMNAq22D6Xr4UDpgFRXNk2/wO6
 zZZBkfZ4H4MpZXbd/KJpXvYH5mQA4IpkOy8LJdE+dbcHX7Szy9ksZdPA+Z10hqqf
 zqS13qTs3abxymy2Q/tO3hPKSJCk1+vCGUkN60Wm+9VoLWGoU43qMc7gnY/pCS7m
 0rFKtvfwFHhokX1orK0lP/ppVzPv/5oFIeK8YDY9if+N+dU2LCwVZHIuf2/VJPRq
 wmgH2vAn3JDoRKPxTGX9ly6AMxuZaeP92qBTOPap0gDhihYzIpaCq9ecEBoTakI7
 tdFhV0iRr08=
 =NiP4
 -----END PGP SIGNATURE-----

Merge tag 'pull-request-2024-03-12' of https://gitlab.com/thuth/qemu into staging

* Add missing ERRP_GUARD() statements in functions that need it
* Prefer fast cpu_env() over slower CPU QOM cast macro

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmXwPhYRHHRodXRoQHJl
# ZGhhdC5jb20ACgkQLtnXdP5wLbWHvBAAgKx5LHFjz3xREVA+LkDTQ49mz0lK3s32
# SGvNlIHjiaDGVttVYhVC4sinBWUruG4Lyv/2QN72OJBzn6WUsEUQE3KPH1d7Y3/s
# wS9X7mj70n4kugWJqeIJP5AXSRasHmWoQ4QJLVQRJd6+Eb9jqwep0x7bYkI1de6D
# bL1Q7bIfkFeNQBXaiPWAm2i+hqmT4C1r8HEAGZIjAsMFrjy/hzBEjNV+pnh6ZSq9
# Vp8BsPWRfLU2XHm4WX0o8d89WUMAfUGbVkddEl/XjIHDrUD+Zbd1HAhLyfhsmrnE
# jXIwSzm+ML1KX4MoF5ilGtg8Oo0gQDEBy9/xck6G0HCm9lIoLKlgTxK9glr2vdT8
# yxZmrM9Hder7F9hKKxmb127xgU6AmL7rYmVqsoQMNAq22D6Xr4UDpgFRXNk2/wO6
# zZZBkfZ4H4MpZXbd/KJpXvYH5mQA4IpkOy8LJdE+dbcHX7Szy9ksZdPA+Z10hqqf
# zqS13qTs3abxymy2Q/tO3hPKSJCk1+vCGUkN60Wm+9VoLWGoU43qMc7gnY/pCS7m
# 0rFKtvfwFHhokX1orK0lP/ppVzPv/5oFIeK8YDY9if+N+dU2LCwVZHIuf2/VJPRq
# wmgH2vAn3JDoRKPxTGX9ly6AMxuZaeP92qBTOPap0gDhihYzIpaCq9ecEBoTakI7
# tdFhV0iRr08=
# =NiP4
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 12 Mar 2024 11:35:50 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2024-03-12' of https://gitlab.com/thuth/qemu: (55 commits)
  user: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/xtensa: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/tricore: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/sparc: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/sh4: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/rx: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/ppc: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/openrisc: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/nios2: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/mips: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/microblaze: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/m68k: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/loongarch: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/i386/hvf: Use CPUState typedef
  target/hexagon: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/cris: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/avr: Prefer fast cpu_env() over slower CPU QOM cast macro
  target/alpha: Prefer fast cpu_env() over slower CPU QOM cast macro
  target: Replace CPU_GET_CLASS(cpu -> obj) in cpu_reset_hold() handler
  bulk: Call in place single use cpu_env()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-12 16:55:42 +00:00
Philippe Mathieu-Daudé
ee1004bba6 bulk: Access existing variables initialized to &S->F when available
When a variable is initialized to &struct->field, use it
in place. Rationale: while this makes the code more concise,
this also helps static analyzers.

Mechanical change using the following Coccinelle spatch script:

 @@
 type S, F;
 identifier s, m, v;
 @@
      S *s;
      ...
      F *v = &s->m;
      <+...
 -    &s->m
 +    v
      ...+>

Inspired-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20240129164514.73104-2-philmd@linaro.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
[thuth: Dropped hunks that need a rebase, and fixed sizeof() in pmu_realize()]
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-12 11:46:16 +01:00
Zhao Liu
46ff64a826 error: Move ERRP_GUARD() to the beginning of the function
Since the commit 05e385d2a9 ("error: Move ERRP_GUARD() to the beginning
of the function"), there are new codes that don't put ERRP_GUARD() at
the beginning of the functions.

As stated in the commit 05e385d2a9: "include/qapi/error.h advises to put
ERRP_GUARD() right at the beginning of the function, because only then
can it guard the whole function.", so clean up the few spots
disregarding the advice.

Inspired-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20240312060337.3240965-1-zhao1.liu@linux.intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-12 11:45:45 +01:00
Zhao Liu
35e83a9f61 migration/option: Fix missing ERRP_GUARD() for error_prepend()
As the comment in qapi/error, passing @errp to error_prepend() requires
ERRP_GUARD():

* = Why, when and how to use ERRP_GUARD() =
*
* Without ERRP_GUARD(), use of the @errp parameter is restricted:
...
* - It should not be passed to error_prepend(), error_vprepend() or
*   error_append_hint(), because that doesn't work with &error_fatal.
* ERRP_GUARD() lifts these restrictions.
*
* To use ERRP_GUARD(), add it right at the beginning of the function.
* @errp can then be used without worrying about the argument being
* NULL or &error_fatal.

ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user
can't see this additional information, because exit() happens in
error_setg earlier than information is added [1].

The migrate_params_check() passes @errp to error_prepend() without
ERRP_GUARD(), and it could be called from migration_object_init(),
where the passed @errp points to @error_fatal.

Therefore, the error message echoed in error_prepend() will be lost
because of the above issue.

To fix this, add missing ERRP_GUARD() at the beginning of this function.

[1]: Issue description in the commit message of commit ae7c80a7bd
     ("error: New macro ERRP_GUARD()").

Cc: Peter Xu <peterx@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Acked-by: Peter Xu <peterx@redhat.com>
Message-ID: <20240311033822.3142585-28-zhao1.liu@linux.intel.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-12 11:45:45 +01:00
Hao Xiang
70c25c92e6 migration/multifd: Enable multifd zero page checking by default.
1. Set default "zero-page-detection" option to "multifd". Now
zero page checking can be done in the multifd threads and this
becomes the default configuration.
2. Handle migration QEMU9.0 -> QEMU8.2 compatibility. We provide
backward compatibility where zero page checking is done from the
migration main thread.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240311180015.3359271-7-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:57:09 -04:00
Hao Xiang
9ae90f73e6 migration/multifd: Implement ram_save_target_page_multifd to handle multifd version of MigrationOps::ram_save_target_page.
1. Add a dedicated handler for MigrationOps::ram_save_target_page in
multifd live migration.
2. Refactor ram_save_target_page_legacy so that the legacy and multifd
handlers don't have internal functions calling into each other.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-Id: <20240226195654.934709-4-hao.xiang@bytedance.com>
Link: https://lore.kernel.org/r/20240311180015.3359271-6-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:57:09 -04:00
Hao Xiang
303e6f54f9 migration/multifd: Implement zero page transmission on the multifd thread.
1. Add zero_pages field in MultiFDPacket_t.
2. Implements the zero page detection and handling on the multifd
threads for non-compression, zlib and zstd compression backends.
3. Added a new value 'multifd' in ZeroPageDetection enumeration.
4. Adds zero page counters and updates multifd send/receive tracing
format to track the newly added counters.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:57:09 -04:00
Hao Xiang
5fdbb1dfcc migration/multifd: Add new migration option zero-page-detection.
This new parameter controls where the zero page checking is running.
1. If this parameter is set to 'legacy', zero page checking is
done in the migration main thread.
2. If this parameter is set to 'none', zero page checking is disabled.

Signed-off-by: Hao Xiang <hao.xiang@bytedance.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20240311180015.3359271-4-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:57:05 -04:00
Fabiano Rosas
c3cdf3fb18 migration/multifd: Allow clearing of the file_bmap from multifd
We currently only need to clear the mapped-ram file bitmap from the
migration thread during save_zero_page.

We're about to add support for zero page detection on the multifd
thread, so allow ramblock_set_file_bmap_atomic() to also clear the
bits.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240311180015.3359271-3-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:56:52 -04:00
Fabiano Rosas
44fe138edc migration/multifd: Allow zero pages in file migration
Currently, it's an error to have no data pages in the multifd file
migration because zero page detection is done in the migration thread
and zero pages don't reach multifd. This is enforced with the
pages->num assert.

We're about to add zero page detection on the multifd thread. Fix the
file_write_ramblock_iov() to stop considering p->iovs_num=0 an error.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240311180015.3359271-2-hao.xiang@linux.dev
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:34:51 -04:00
Steve Sistare
c9539d9b14 migration: purge MigrationState from public interface
Move remaining MigrationState references from the public file
misc.h to the private file migration.h.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-12-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
a3ed489336 migration: delete unused accessors
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
7395127f23 migration: privatize colo interfaces
Remove private migration interfaces from net/colo-compare.c and push them
to migration/colo.c.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
20c64c8a51 migration: migration_file_set_error
Define and export migration_file_set_error to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
9bb630c6ee migration: migration_is_device
Define and export migration_is_device to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
6e78563976 migration: migration_thread_is_self
Define and export migration_thread_is_self to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
714f33123b migration: export vcpu_dirty_limit_period
Define and export vcpu_dirty_limit_period to eliminate a dependency
on MigrationState.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
aeaafb1e59 migration: export migration_is_running
Delete the MigrationState parameter from migration_is_running and move
it to the public API in misc.h.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
3a6813b68c migration: export migration_is_active
Delete the MigrationState parameter from migration_is_active so it
can be exported and used without including migration.h.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179338-294359-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
7dcb3c87d8 migration: export migration_is_setup_or_active
Delete the MigrationState parameter from migration_is_setup_or_active
and move it to the public API in misc.h.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/1710179338-294359-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Steve Sistare
f3bff6c443 migration: export fewer options
A small number of migration options are accessed by migration clients,
but to see them clients must include all of options.h, which is mostly
for migration core code.  migrate_mode() in particular will be needed by
multiple clients.

Refactor the option declarations so clients can see the necessary few via
misc.h, which already exports a portion of the client API.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Link: https://lore.kernel.org/r/1710179319-294320-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 16:28:59 -04:00
Anthony PERARD
a1bb5dd169 migration: Fix format in error message
In file_write_ramblock_iov(), "offset" is "uintptr_t" and not
"ram_addr_t". While usually they are both equivalent, this is not the
case with CONFIG_XEN_BACKEND.

Use the right format. This will fix build on 32-bit.

Fixes: f427d90b98 ("migration/multifd: Support outgoing mapped-ram stream format")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Link: https://lore.kernel.org/r/20240311123439.16844-1-anthony.perard@citrix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:41 -04:00
Yu Zhang
69f7b00d05 migration/rdma: Fix a memory issue for migration
In commit 3fa9642ff7 change was made to convert the RDMA backend to
accept MigrateAddress struct. However, the assignment of "host" leads
to data corruption on the target host and the failure of migration.

    isock->host = rdma->host;

By allocating the memory explicitly for it with g_strdup_printf(), the
issue is fixed and the migration doesn't fail any more.

Fixes: 3fa9642ff7 ("migration: convert rdma backend to accept MigrateAddress")
Cc: qemu-stable <qemu-stable@nongnu.org>
Cc: Li Zhijian <lizhijian@fujitsu.com>
Link: https://lore.kernel.org/r/CAHEcVy4L_D6tuhJ8h=xLR4WaPaprJE3nnxZAEyUnoTrxQ6CF5w@mail.gmail.com
Signed-off-by: Yu Zhang <yu.zhang@ionos.com>
[peterx: use g_strdup() instead of g_strdup_printf(), per Zhijian]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Fabiano Rosas
61dec06082 migration/multifd: Don't fsync when closing QIOChannelFile
Commit bc38feddeb ("io: fsync before closing a file channel") added a
fsync/fdatasync at the closing point of the QIOChannelFile to ensure
integrity of the migration stream in case of QEMU crash.

The decision to do the sync at qio_channel_close() was not the best
since that function runs in the main thread and the fsync can cause
QEMU to hang for several minutes, depending on the migration size and
disk speed.

To fix the hang, remove the fsync from qio_channel_file_close().

At this moment, the migration code is the only user of the fsync and
we're taking the tradeoff of not having a sync at all, leaving the
responsibility to the upper layers.

Fixes: bc38feddeb ("io: fsync before closing a file channel")
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240305195629.9922-1-farosas@suse.de
Link: https://lore.kernel.org/r/20240305174332.2553-1-farosas@suse.de
[peterx: add more comment to the qio_channel_close()]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Cédric Le Goater
e6e08e8323 migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error
When commit bd2270608f ("migration/ram.c: add a notifier chain for
precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of
qemu_savevm_state_setup(), it didn't take into account a possible
error in the loop calling vmstate_save() or .save_setup() handlers.

Check ret value before calling the notifiers.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20240304122844.1888308-10-clg@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Cédric Le Goater
e8c44363fb migration: Report error when shutdown fails
This will help detect issues regarding I/O channels usage.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Link: https://lore.kernel.org/r/20240304122844.1888308-7-clg@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Maksim Davydov
12ab1e4fe8 migration/ram: add additional check
If a migration stream is broken, the address and flag reading can return
zero. Thus, an irrelevant flag error will be returned instead of EIO.
It can be fixed by additional check after the reading.

Signed-off-by: Maksim Davydov <davydov-max@yandex-team.ru>
Link: https://lore.kernel.org/r/20240304144203.158477-1-davydov-max@yandex-team.ru
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Avihai Horon
4e1871c450 migration: Don't serialize devices in qemu_savevm_state_iterate()
Commit 90697be889 ("live migration: Serialize vmstate saving in stage
2") introduced device serialization in qemu_savevm_state_iterate(). The
rationale behind it was to first complete migration of slower changing
block devices and only then migrate the RAM, to avoid sending fast
changing RAM pages over and over.

This commit was added a long time ago, and while it was useful back
then, it is not the case anymore:
1. Block migration is deprecated, see commit 66db46ca83 ("migration:
   Deprecate block migration").
2. Today there are other iterative devices besides RAM and block, such
   as VFIO, which are registered for migration after RAM. With current
   serialization behavior, a fast changing device can block other
   devices from sending their data, which may prevent migration from
   converging in some cases.

The issue described in item 2 was observed in several VFIO migration
scenarios with switchover-ack capability enabled, where some workload on
the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO
initial pre-copy data from being sent. Hence, destination could not ack
switchover and migration could not converge.

Fix that by not serializing iterative devices in
qemu_savevm_state_iterate().

Note that this still doesn't fully prevent device starvation. As
correctly pointed out by Peter [1], a fast changing device might
constantly consume all allocated bandwidth and block the following
devices. However, this scenario is more likely to happen only if
max-bandwidth is low.

[1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240304105339.20713-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-11 14:41:40 -04:00
Peter Maydell
7d4e29ef80 QAPI patches patches for 2024-03-04
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmXlaSISHGFybWJydUBy
 ZWRoYXQuY29tAAoJEDhwtADrkYZTdZ8P/iMgqLoAFkCCjwfkUc/rqZUezK52Ynr7
 LYwOPI/xcYD7EnVogdRgFgjWFNoivQLP5yKsU/eRTk29pwdDzTscFm/0ztTQX/Gb
 ypWV+GBcu5J8mKbp1KF5w68aDD8Bat4WRfEgDQ1DV7v6CoMiUzTiF3CGXkYzqK5Y
 kYNq97vdEkBFvFdOl/7scs/XXN2jG27egDhMp68RTxnPHlXZiAO9/2Bul3uVe3x0
 fzQ2ViYv0qLnjE/PwENDqqE3Thv3Sxp5iEeQQ6GWi07EVh07UtHpOM3RYyrTU0Sb
 VrTApSrg0oxlkOuR0CBd9Fi+timtbokBL0DWyUpXNTfIEZfLtA9H+8riUg3EOcDp
 r7a4SI/27VdPxX6Kc6zA3bi+/j1o7CLTW2LGEwuZs52nmixoo1HTWPIFdyh13g/V
 QjNbun0fViHb0FVLiyDlXF/7Y+EWUWIyqwwGqbvve1DyUHQmo3CUQAKGOpkeKSBe
 4eGciVDgpBoKhtw9Kv6LCDj2cwZKC8DxBMibf7GHkOnAsX2mnyuHcey7HvYNCoF+
 yYz7oIEXdlL2eWqg7CfBZK7lniCDln50RI4Ll1v+J4r1v1kRZGMLesTYXCdNc4ku
 yb4kpU4t22/RODffLE7K+fc3Onwze3fcfxlZMN66F+wFtk4KdPR2aQBE66bB8J99
 vuSKlTbT4cGL
 =s9AR
 -----END PGP SIGNATURE-----

Merge tag 'pull-qapi-2024-03-04' of https://repo.or.cz/qemu/armbru into staging

QAPI patches patches for 2024-03-04

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmXlaSISHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZTdZ8P/iMgqLoAFkCCjwfkUc/rqZUezK52Ynr7
# LYwOPI/xcYD7EnVogdRgFgjWFNoivQLP5yKsU/eRTk29pwdDzTscFm/0ztTQX/Gb
# ypWV+GBcu5J8mKbp1KF5w68aDD8Bat4WRfEgDQ1DV7v6CoMiUzTiF3CGXkYzqK5Y
# kYNq97vdEkBFvFdOl/7scs/XXN2jG27egDhMp68RTxnPHlXZiAO9/2Bul3uVe3x0
# fzQ2ViYv0qLnjE/PwENDqqE3Thv3Sxp5iEeQQ6GWi07EVh07UtHpOM3RYyrTU0Sb
# VrTApSrg0oxlkOuR0CBd9Fi+timtbokBL0DWyUpXNTfIEZfLtA9H+8riUg3EOcDp
# r7a4SI/27VdPxX6Kc6zA3bi+/j1o7CLTW2LGEwuZs52nmixoo1HTWPIFdyh13g/V
# QjNbun0fViHb0FVLiyDlXF/7Y+EWUWIyqwwGqbvve1DyUHQmo3CUQAKGOpkeKSBe
# 4eGciVDgpBoKhtw9Kv6LCDj2cwZKC8DxBMibf7GHkOnAsX2mnyuHcey7HvYNCoF+
# yYz7oIEXdlL2eWqg7CfBZK7lniCDln50RI4Ll1v+J4r1v1kRZGMLesTYXCdNc4ku
# yb4kpU4t22/RODffLE7K+fc3Onwze3fcfxlZMN66F+wFtk4KdPR2aQBE66bB8J99
# vuSKlTbT4cGL
# =s9AR
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 04 Mar 2024 06:24:34 GMT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-qapi-2024-03-04' of https://repo.or.cz/qemu/armbru:
  migration: simplify exec migration functions
  qapi: New strv_from_str_list()
  qapi: New QAPI_LIST_LENGTH()
  docs/devel/writing-monitor-commands: Minor improvements
  docs/devel/writing-monitor-commands: Repair a decade of rot
  qapi: Reject "Returns" section when command doesn't return anything
  qga/qapi-schema: Fix guest-set-memory-blocks documentation
  qga/qapi-schema: Tweak documentation of fsfreeze commands
  qga/qapi-schema: Clean up "Returns" sections
  qga/qapi-schema: Delete useless "Returns" sections
  qga/qapi-schema: Move error documentation to new "Errors" sections
  qapi/yank: Tweak @yank's error description for consistency
  qapi: Clean up "Returns" sections
  qapi: Delete useless "Returns" sections
  qapi: Move error documentation to new "Errors" sections
  qapi: New documentation section tag "Errors"
  qapi: Slightly clearer error message for invalid "Returns" section
  qapi: Memorize since & returns sections

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-05 11:20:15 +00:00
Peter Maydell
c90cfb5294 Migartion pull request for 20240304
- Bryan's fix on multifd compression level API
 - Fabiano's mapped-ram series (base + multifd only)
 - Steve's amend on cpr document in qapi/
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZeUjKhIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wbv5QD/ZexBUsmZA5qyxgGvZ2yvlUBEGNOvtmKY
 kRdiYPU7khMA/0N43rn4LcqKCoq4+T+EAnYizGjIyhH/7BRUyn4DUxgO
 =AeEn
 -----END PGP SIGNATURE-----

Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu into staging

Migartion pull request for 20240304

- Bryan's fix on multifd compression level API
- Fabiano's mapped-ram series (base + multifd only)
- Steve's amend on cpr document in qapi/

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZeUjKhIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wbv5QD/ZexBUsmZA5qyxgGvZ2yvlUBEGNOvtmKY
# kRdiYPU7khMA/0N43rn4LcqKCoq4+T+EAnYizGjIyhH/7BRUyn4DUxgO
# =AeEn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 04 Mar 2024 01:26:02 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu: (27 commits)
  migration/multifd: Document two places for mapped-ram
  tests/qtest/migration: Add a multifd + mapped-ram migration test
  migration/multifd: Add mapped-ram support to fd: URI
  migration/multifd: Support incoming mapped-ram stream format
  migration/multifd: Support outgoing mapped-ram stream format
  migration/multifd: Prepare multifd sync for mapped-ram migration
  migration/multifd: Add incoming QIOChannelFile support
  migration/multifd: Add outgoing QIOChannelFile support
  migration/multifd: Add a wrapper for channels_created
  migration/multifd: Allow receiving pages without packets
  migration/multifd: Allow multifd without packets
  migration/multifd: Decouple recv method from pages
  migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data
  tests/qtest/migration: Add tests for mapped-ram file-based migration
  migration/ram: Add incoming 'mapped-ram' migration
  migration/ram: Add outgoing 'mapped-ram' migration
  migration: Add mapped-ram URI compatibility check
  migration/ram: Introduce 'mapped-ram' migration capability
  migration/qemu-file: add utility methods for working with seekable channels
  io: fsync before closing a file channel
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

# Conflicts:
#	migration/ram.c
2024-03-05 11:19:58 +00:00
Steve Sistare
018d5fb1f9 migration: simplify exec migration functions
Simplify the exec migration code by using list utility functions.

As a side effect, this also fixes a minor memory leak.  On function return,
"g_auto(GStrv) argv" frees argv and each element, which is wrong, because
the function does not own the individual elements.  To compensate, the code
uses g_steal_pointer which NULLs argv and prevents the destructor from
running, but argv is leaked.

Fixes: cbab4face5 ("migration: convert exec backend ...")
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20240227153321.467343-4-armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2024-03-04 07:12:40 +01:00
Peter Xu
1a6e217c35 migration/multifd: Document two places for mapped-ram
Add two documentations for mapped-ram migration on two spots that may not
be extremely clear.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240301091524.39900-1-peterx@redhat.com
Cc: Prasad Pandit <ppandit@redhat.com>
[peterx: fix two English errors per Prasad]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-04 08:31:11 +08:00
Fabiano Rosas
decdc76772 migration/multifd: Add mapped-ram support to fd: URI
If we receive a file descriptor that points to a regular file, there's
nothing stopping us from doing multifd migration with mapped-ram to
that file.

Enable the fd: URI to work with multifd + mapped-ram.

Note that the fds passed into multifd are duplicated because we want
to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The
original fd doesn't need to be duplicated because monitor_get_fd()
transfers ownership to the caller.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
a49d15a38d migration/multifd: Support incoming mapped-ram stream format
For the incoming mapped-ram migration we need to read the ramblock
headers, get the pages bitmap and send the host address of each
non-zero page to the multifd channel thread for writing.

Usage on HMP is:

(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_incoming file:migfile

(the ram.h include needs to move because we've been previously relying
on it being included from migration.c. Now file.h will start including
multifd.h before migration.o is processed)

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-22-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
f427d90b98 migration/multifd: Support outgoing mapped-ram stream format
The new mapped-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.

Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.

This requires multifd to stop sending headers and leaving the stream
format to the mapped-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.

Usage on HMP is:

(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-21-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
9d01778af8 migration/multifd: Prepare multifd sync for mapped-ram migration
The mapped-ram migration can be performed live or non-live, but it is
always asynchronous, i.e. the source machine and the destination
machine are not migrating at the same time. We only need some pieces
of the multifd sync operations.

multifd_send_sync_main()
------------------------
  Issued by the ram migration code on the migration thread, causes the
  multifd send channels to synchronize with the migration thread and
  makes the sending side emit a packet with the MULTIFD_FLUSH flag.

  With mapped-ram we want to maintain the sync on the sending side
  because that provides ordering between the rounds of dirty pages when
  migrating live.

MULTIFD_FLUSH
-------------
  On the receiving side, the presence of the MULTIFD_FLUSH flag on a
  packet causes the receiving channels to start synchronizing with the
  main thread.

  We're not using packets with mapped-ram, so there's no MULTIFD_FLUSH
  flag and therefore no channel sync on the receiving side.

multifd_recv_sync_main()
------------------------
  Issued by the migration thread when the ram migration flag
  RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread
  on the receiving side to start synchronizing with the recv
  channels. Due to compatibility, this is also issued when
  RAM_SAVE_FLAG_EOS is received.

  For mapped-ram we only need to synchronize the channels at the end of
  migration to avoid doing cleanup before the channels have finished
  their IO.

Make sure the multifd syncs are only issued at the appropriate times.

Note that due to pre-existing backward compatibility issues, we have
the multifd_flush_after_each_section property that can cause a sync to
happen at EOS. Since the EOS flag is needed on the stream, allow
mapped-ram to just ignore it.

Also emit an error if any other unexpected flags are found on the
stream.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-20-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
2dd7ee7a51 migration/multifd: Add incoming QIOChannelFile support
On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-19-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
b7b03eb614 migration/multifd: Add outgoing QIOChannelFile support
Allow multifd to open file-backed channels. This will be used when
enabling the mapped-ram migration stream format which expects a
seekable transport.

The QIOChannel read and write methods will use the preadv/pwritev
versions which don't update the file offset at each call so we can
reuse the fd without re-opening for every channel.

Contrary to the socket migration, the file migration doesn't need an
asynchronous channel creation process, so expose
multifd_channel_connect() and call it directly.

Note that this is just setup code and multifd cannot yet make use of
the file channels.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-18-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
a8a3e7102c migration/multifd: Add a wrapper for channels_created
We'll need to access multifd_send_state->channels_created from outside
multifd.c, so introduce a helper for that.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-17-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
d117ed0699 migration/multifd: Allow receiving pages without packets
Currently multifd does not need to have knowledge of pages on the
receiving side because all the information needed is within the
packets that come in the stream.

We're about to add support to mapped-ram migration, which cannot use
packets because it expects the ramblock section in the migration file
to contain only the guest pages data.

Add a data structure to transfer pages between the ram migration code
and the multifd receiving threads.

We don't want to reuse MultiFDPages_t for two reasons:

a) multifd threads don't really need to know about the data they're
   receiving.

b) the receiving side has to be stopped to load the pages, which means
   we can experiment with larger granularities than page size when
   transferring data.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-16-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
06833d83f8 migration/multifd: Allow multifd without packets
For the upcoming support to the new 'mapped-ram' migration stream
format, we cannot use multifd packets because each write into the
ramblock section in the migration file is expected to contain only the
guest pages. They are written at their respective offsets relative to
the ramblock section header.

There is no space for the packet information and the expected gains
from the new approach come partly from being able to write the pages
sequentially without extraneous data in between.

The new format also simply doesn't need the packets and all necessary
information can be taken from the standard migration headers with some
(future) changes to multifd code.

Use the presence of the mapped-ram capability to decide whether to
send packets.

This only moves code under multifd_use_packets(), it has no effect for
now as mapped-ram cannot yet be enabled with multifd.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-15-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
9db1912513 migration/multifd: Decouple recv method from pages
Next patches will abstract the type of data being received by the
channels, so do some cleanup now to remove references to pages and
dependency on 'normal_num'.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-14-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
402dd7ac1c migration/multifd: Rename MultiFDSend|RecvParams::data to compress_data
Use a more specific name for the compression data so we can use the
generic for the multifd core code.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-13-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
2f6b8826a5 migration/ram: Add incoming 'mapped-ram' migration
Add the necessary code to parse the format changes for the
'mapped-ram' capability.

One of the more notable changes in behavior is that in the
'mapped-ram' case ram pages are restored in one go rather than
constantly looping through the migration stream.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-11-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
c2d5c4a7cb migration/ram: Add outgoing 'mapped-ram' migration
Implement the outgoing migration side for the 'mapped-ram' capability.

A bitmap is introduced to track which pages have been written in the
migration file. Pages are written at a fixed location for every
ramblock. Zero pages are ignored as they'd be zero in the destination
migration as well.

The migration stream is altered to put the dirty pages for a ramblock
after its header instead of having a sequential stream of pages that
follow the ramblock headers.

Without mapped-ram (current):        With mapped-ram (new):

 ---------------------               --------------------------------
 | ramblock 1 header |               | ramblock 1 header            |
 ---------------------               --------------------------------
 | ramblock 2 header |               | ramblock 1 mapped-ram header |
 ---------------------               --------------------------------
 | ...               |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | ramblock n header |               --------------------------------
 ---------------------               | ramblock 1 pages             |
 | RAM_SAVE_FLAG_EOS |               | ...                          |
 ---------------------               --------------------------------
 | stream of pages   |               | ramblock 2 header            |
 | (iter 1)          |               --------------------------------
 | ...               |               | ramblock 2 mapped-ram header |
 ---------------------               --------------------------------
 | RAM_SAVE_FLAG_EOS |               | padding to next 1MB boundary |
 ---------------------               | ...                          |
 | stream of pages   |               --------------------------------
 | (iter 2)          |               | ramblock 2 pages             |
 | ...               |               | ...                          |
 ---------------------               --------------------------------
 | ...               |               | ...                          |
 ---------------------               --------------------------------
                                     | RAM_SAVE_FLAG_EOS            |
                                     --------------------------------
                                     | ...                          |
                                     --------------------------------

where:
 - ramblock header: the generic information for a ramblock, such as
   idstr, used_len, etc.

 - ramblock mapped-ram header: the new information added by this
   feature: bitmap of pages written, bitmap size and offset of pages
   in the migration file.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-10-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
8d9e0d4100 migration: Add mapped-ram URI compatibility check
The mapped-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
4ed49feb44 migration/ram: Introduce 'mapped-ram' migration capability
Add a new migration capability 'mapped-ram'.

The core of the feature is to ensure that RAM pages are mapped
directly to offsets in the resulting migration file instead of being
streamed at arbitrary points.

The reasons why we'd want such behavior are:

 - The resulting file will have a bounded size, since pages which are
   dirtied multiple times will always go to a fixed location in the
   file, rather than constantly being added to a sequential
   stream. This eliminates cases where a VM with, say, 1G of RAM can
   result in a migration file that's 10s of GBs, provided that the
   workload constantly redirties memory.

 - It paves the way to implement O_DIRECT-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

 - It allows the usage of multifd so we can write RAM pages to the
   migration file in parallel.

For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.

Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
7f5b50a401 migration/qemu-file: add utility methods for working with seekable channels
Add utility methods that will be needed when implementing 'mapped-ram'
migration capability.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
4aac6b1e9b migration/multifd: Cleanup multifd_recv_sync_main
Some minor cleanups and documentation for multifd_recv_sync_main.

Use thread_count as done in other parts of the code. Remove p->id from
the multifd_recv_state sync, since that is global and not tied to a
channel. Add documentation for the sync steps.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Bryan Zhang
b4014a2bf5 migration: Properly apply migration compression level parameters
Some glue code was missing, so that using `qmp_migrate_set_parameters`
to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This
commit adds the glue code to fix that.

Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com>
Link: https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zhang@bytedance.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 14:14:55 +08:00
Richard Henderson
5d2203691e migration: Remove qemu_host_page_size
Replace with the maximum of the real host page size
and the target page size.  This is an exact replacement.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Acked-by: Helge Deller <deller@gmx.de>
Message-Id: <20240102015808.132373-12-richard.henderson@linaro.org>
2024-02-29 11:35:36 -10:00
Cédric Le Goater
9425ef3f99 migration: Use migrate_has_error() in close_return_path_on_source()
close_return_path_on_source() retrieves the migration error from the
the QEMUFile '->to_dst_file' to know if a shutdown is required. This
shutdown is required to exit the return-path thread.

Avoid relying on '->to_dst_file' and use migrate_has_error() instead.

(using to_dst_file is a heuristic to infer whether
rp_state.from_dst_file might be stuck on a recvmsg(). Using a generic
method for detecting errors is more reliable. We also want to reduce
dependency on QEMUFile::last_error)

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
[added some words about the motivation for this patch]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226203122.22894-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
22b04245f0 migration: Join the return path thread before releasing to_dst_file
The return path thread might hang at a blocking system call. Before
joining the thread we might need to issue a shutdown() on the socket
file descriptor to release it. To determine whether the shutdown() is
necessary we look at the QEMUFile error.

Make sure we only clean up the QEMUFile after the return path has been
waited for.

This fixes a hang when qemu_savevm_state_setup() produced an error
that was detected by migration_detect_error(). That skips
migration_completion() so close_return_path_on_source() would get
stuck waiting for the RP thread to terminate.

Reported-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226203122.22894-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
63f64d77f0 migration: Fix qmp_query_migrate mbps value
The QMP command query_migrate might see incorrect throughput numbers
if it runs after we've set the migration completion status but before
migration_calculate_complete() has updated s->total_time and s->mbps.

The migration status would show COMPLETED, but the throughput value
would be the one from the last iteration and not the one from the
whole migration. This will usually be a larger value due to the time
period being smaller (one iteration).

Move migration_calculate_complete() earlier so that the status
MIGRATION_STATUS_COMPLETED is only emitted after the final counters
update. Keep everything under the BQL so the QMP thread sees the
updates as atomic.

Rename migration_calculate_complete to migration_completion_end to
reflect its new purpose of also updating s->state.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226143335.14282-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
cbdafc1b34 migration: options incompatible with cpr
Fail the migration request if options are set that are incompatible
with cpr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
9867d4ddd0 migration: stop vm for cpr
When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
4af667f87c migration: notifier error checking
Check the status returned by migration notifiers for event type
MIG_EVENT_PRECOPY_SETUP, and report errors.  None of the notifiers
return an error status at this time.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
bf78a046b9 migration: refactor migrate_fd_connect failures
Move common code for the error path in migrate_fd_connect to a shared
fail label.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
6835f5a1bc migration: per-mode notifiers
Keep a separate list of migration notifiers for each migration mode.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
5663dd3f1a migration: MigrationNotifyFunc
Define MigrationNotifyFunc to improve type safety and simplify migration
notifiers.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
c763a23e41 migration: remove postcopy_after_devices
postcopy_after_devices and migration_in_postcopy_after_devices are no
longer used, so delete them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
9d9babf78d migration: MigrationEvent for notifiers
Passing MigrationState to notifiers is unsound because they could access
unstable migration state internals or even modify the state.  Instead, pass
the minimal info needed in a new MigrationEvent struct, which could be
extended in the future if needed.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
3e7757301c migration: convert to NotifierWithReturn
Change all migration notifiers to type NotifierWithReturn, so notifiers
can return an error status in a future patch.  For now, pass NULL for the
notifier error parameter, and do not check the return value.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sistare@oracle.com
[peterx: dropped unexpected update to roms/seabios-hppa]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
d91f33c72e migration: remove error from notifier data
Remove the error object from opaque data passed to notifiers.
Use the new error parameter passed to the notifier instead.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
be19d836cd notify: pass error to notifier with return
Pass an error object as the third parameter to "notifier with return"
notifiers, so clients no longer need to bundle an error object in the
opaque data.  The new parameter is used in a later patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-2-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Peter Xu
c9a7e83c9d migration/multifd: Drop unnecessary helper to destroy IOC
Both socket_send_channel_destroy() and multifd_send_channel_destroy() are
unnecessary wrappers to destroy an IOC, as the only thing to do is to
release the final IOC reference.  We have plenty of code that destroys an
IOC using direct unref() already; keep that style.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240222095301.171137-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Peter Xu
72b90b9687 migration/multifd: Cleanup outgoing_args in state destroy
outgoing_args is a global cache of socket address to be reused in multifd.
Freeing the cache in per-channel destructor is more or less a hack.  Move
it to multifd_send_cleanup_state() so it only get checked once.  Use a
small helper to do so because it's internal of socket.c.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240222095301.171137-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Peter Xu
770de49c00 migration/multifd: Make multifd_channel_connect() return void
It never fails, drop the retval and also the Error**.

Suggested-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240222095301.171137-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Peter Xu
0518b5d8d3 migration/multifd: Drop registered_yank
With a clear definition of p->c protocol, where we only set it up if the
channel is fully established (TLS or non-TLS), registered_yank boolean will
have equal meaning of "p->c != NULL".

Drop registered_yank by checking p->c instead.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240222095301.171137-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Peter Xu
9221e3c6a2 migration/multifd: Cleanup TLS iochannel referencing
Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to
blocking handshake") introduced a thread for TLS channels, which will
resolve the issue on blocking the main thread.  However in the same commit
p->c is slightly abused just to be able to pass over the pointer "p" into
the thread.

That's the major reason we'll need to conditionally free the io channel in
the fault paths.

To clean it up, using a separate structure to pass over both "p" and "tioc"
in the tls handshake thread.  Then we can make it a rule that p->c will
never be set until the channel is completely setup.  With that, we can drop
the tricky conditional unref of the io channel in the error path.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240222095301.171137-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
d13f0026c7 migration/multifd: Release recv sem_sync earlier
Now that multifd_recv_terminate_threads() is called only once, release
the recv side sem_sync earlier like we do for the send side.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240220224138.24759-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
11dd7be575 migration/multifd: Remove p->quit from recv side
Like we did on the sending side, replace the p->quit per-channel flag
with a global atomic 'exiting' flag.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240220224138.24759-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
93fa9dc2e0 migration/multifd: Add a synchronization point for channel creation
It is possible that one of the multifd channels fails to be created at
multifd_new_send_channel_async() while the rest of the channel
creation tasks are still in flight.

This could lead to multifd_save_cleanup() executing the
qemu_thread_join() loop too early and not waiting for the threads
which haven't been created yet, leading to the freeing of resources
that the newly created threads will try to access and crash.

Add a synchronization point after which there will be no attempts at
thread creation and therefore calling multifd_save_cleanup() past that
point will ensure it properly waits for the threads.

A note about performance: Prior to this patch, if a channel took too
long to be established, other channels could finish connecting first
and already start taking load. Now we're bounded by the
slowest-connecting channel.

Reported-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
2576ae488e migration/multifd: Unify multifd and TLS connection paths
During multifd channel creation (multifd_send_new_channel_async) when
TLS is enabled, the multifd_channel_connect function is called twice,
once to create the TLS handshake thread and another time after the
asynchrounous TLS handshake has finished.

This creates a slightly confusing call stack where
multifd_channel_connect() is called more times than the number of
channels. It also splits error handling between the two callers of
multifd_channel_connect() causing some code duplication. Lastly, it
gets in the way of having a single point to determine whether all
channel creation tasks have been initiated.

Refactor the code to move the reentrancy one level up at the
multifd_new_send_channel_async() level, de-duplicating the error
handling and allowing for the next patch to introduce a
synchronization point common to all the multifd channel creation,
regardless of TLS.

Note that the previous code would never fail once p->c had been set.
This patch changes this assumption, which affects refcounting, so add
comments around object_unref to explain the situation.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
dd904bc13f migration/multifd: Move multifd_send_setup into migration thread
We currently have an unfavorable situation around multifd channels
creation and the migration thread execution.

We create the multifd channels with qio_channel_socket_connect_async
-> qio_task_run_in_thread, but only connect them at the
multifd_new_send_channel_async callback, called from
qio_task_complete, which is registered as a glib event.

So at multifd_send_setup() we create the channels, but they will only
be actually usable after the whole multifd_send_setup() calling stack
returns back to the main loop. Which means that the migration thread
is already up and running without any possibility for the multifd
channels to be ready on time.

We currently rely on the channels-ready semaphore blocking
multifd_send_sync_main() until channels start to come up and release
it. However there have been bugs recently found when a channel's
creation fails and multifd_send_cleanup() is allowed to run while
other channels are still being created.

Let's start to organize this situation by moving the
multifd_send_setup() call into the migration thread. That way we
unblock the main-loop to dispatch the completion callbacks and
actually have a chance of getting the multifd channels ready for when
the migration thread needs them.

The next patches will deal with the synchronization aspects.

Note that this takes multifd_send_setup() out of the BQL.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
bd8b0a8f82 migration/multifd: Move multifd_send_setup error handling in to the function
Hide the error handling inside multifd_send_setup to make it cleaner
for the next patch to move the function around.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
a2a63c4abd migration/multifd: Remove p->running
We currently only need p->running to avoid calling qemu_thread_join()
on a non existent thread if the thread has never been created.

However, there are at least two bugs in this logic:

1) On the sending side, p->running is set too early and
qemu_thread_create() can be skipped due to an error during TLS
handshake, leaving the flag set and leading to a crash when
multifd_send_cleanup() calls qemu_thread_join().

2) During exit, the multifd thread clears the flag while holding the
channel lock. The counterpart at multifd_send_cleanup() reads the flag
outside of the lock and might free the mutex while the multifd thread
still has it locked.

Fix the first issue by setting the flag right before creating the
thread. Rename it from p->running to p->thread_created to clarify its
usage.

Fix the second issue by not clearing the flag at the multifd thread
exit. We don't have any use for that.

Note that these bugs are straight-forward logic issues and not race
conditions. There is still a gap for races to affect this code due to
multifd_send_cleanup() being allowed to run concurrently with the
thread creation loop. This issue is solved in the next patches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Reported-by: Avihai Horon <avihaih@nvidia.com>
Reported-by: chenyuhui5@huawei.com
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
e1921f10d9 migration/multifd: Join the TLS thread
We're currently leaking the resources of the TLS thread by not joining
it and also overwriting the p->thread pointer altogether.

Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake")
Cc: qemu-stable <qemu-stable@nongnu.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Avihai Horon
3205bebd4f migration: Fix logic of channels and transport compatibility check
The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: d95533e1cd ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:00 +08:00
Peter Xu
488c84acb4 migration/multifd: Optimize sender side to be lockless
When reviewing my attempt to refactor send_prepare(), Fabiano suggested we
try out with dropping the mutex in multifd code [1].

I thought about that before but I never tried to change the code.  Now
maybe it's time to give it a stab.  This only optimizes the sender side.

The trick here is multifd has a clear provider/consumer model, that the
migration main thread publishes requests (either pending_job/pending_sync),
while the multifd sender threads are consumers.  Here we don't have a lot
of complicated data sharing, and the jobs can logically be submitted
lockless.

Arm the code with atomic weapons.  Two things worth mentioning:

  - For multifd_send_pages(): we can use qatomic_load_acquire() when trying
  to find a free channel, but that's expensive if we attach one ACQUIRE per
  channel.  Instead, keep the qatomic_read() on reading the pending_job
  flag as we do already, meanwhile use one smp_mb_acquire() after the loop
  to guarantee the memory ordering.

  - For pending_sync: it doesn't have any extra data required since now
  p->flags are never touched, it should be safe to not use memory barrier.
  That's different from pending_job.

Provide rich comments for all the lockless operations to state how they are
paired.  With that, we can remove the mutex.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-06 11:05:49 +08:00
Peter Xu
98ea497d8b migration/multifd: Fix MultiFDSendParams.packet_num race
As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to
Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy
to be assigned and stored.  Consider two consequent operations of: (1)
queue a job into multifd send thread X, then (2) queue another sync request
to the same send thread X.  Then the MultiFDSendParams.packet_num will be
assigned twice, and the first assignment can get lost already.

To avoid that, we move the packet_num assignment from p->packet_num into
where the thread will fill in the packet.  Use atomic operations to protect
the field, making sure there's no race.

Note that atomic fetch_add() may not be good for scaling purposes, however
multifd should be fine as number of threads should normally not go beyond
16 threads.  Let's leave that concern for later but fix the issue first.

There's also a trick on how to make it always work even on 32 bit hosts for
uint64_t packet number.  Switching to uintptr_t as of now to simply the
case.  It will cause packet number to overflow easier on 32 bit, but that
shouldn't be a major concern for now as 32 bit systems is not the major
audience for any performance concerns like what multifd wants to address.

We also need to move multifd_send_state definition upper, so that
multifd_send_fill_packet() can reference it.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu
cde85c37ca migration/multifd: Stick with send/recv on function names
Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu
5e6ea8a1d6 migration/multifd: Cleanup multifd_load_cleanup()
Use similar logic to cleanup the recv side.

Note that multifd_recv_terminate_threads() may need some similar rework
like the sender side, but let's leave that for later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
12808db3b8 migration/multifd: Cleanup multifd_save_cleanup()
Shrink the function by moving relevant works into helpers: move the thread
join()s into multifd_send_terminate_threads(), then create two more helpers
to cover channel/state cleanups.

Add a TODO entry for the thread terminate process because p->running is
still buggy.  We need to fix it at some point but not yet covered.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
f88f86c4ee migration/multifd: Rewrite multifd_queue_page()
The current multifd_queue_page() is not easy to read and follow.  It is not
good with a few reasons:

  - No helper at all to show what exactly does a condition mean; in short,
  readability is low.

  - Rely on pages->ramblock being cleared to detect an empty queue.  It's
  slightly an overload of the ramblock pointer, per Fabiano [1], which I
  also agree.

  - Contains a self recursion, even if not necessary..

Rewrite this function.  We add some comments to make it even clearer on
what it does.

[1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
3b40964a86 migration/multifd: Change retval of multifd_send_pages()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
d6556d174a migration/multifd: Change retval of multifd_queue_page()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
3ab4441d97 migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
    side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
    all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
859ebaf346 migration/multifd: Forbid spurious wakeups
Now multifd's logic is designed to have no spurious wakeup.  I still
remember a talk to Juan and he seems to agree we should drop it now, and if
my memory was right it was there because multifd used to hit that when
still debugging.

Let's drop it and see what can explode; as long as it's not reaching
soft-freeze.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
25a1f87875 migration/multifd: Move header prepare/fill into send_prepare()
This patch redefines the interfacing of ->send_prepare().  It further
simplifies multifd_send_thread() especially on zero copy.

Now with the new interface, we require the hook to do all the work for
preparing the IOVs to send.  After it's completed, the IOVs should be ready
to be dumped into the specific multifd QIOChannel later.

So now the API looks like:

  p->pages ----------->  send_prepare() -------------> IOVs

This also prepares for the case where the input can be extended to even not
any p->pages.  But that's for later.

This patch will achieve similar goal of what Fabiano used to propose here:

https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de

However the send() interface may not be necessary.  I'm boldly attaching a
"Co-developed-by" for Fabiano.

Co-developed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
452b205702 migration/multifd: multifd_send_prepare_header()
Introduce a helper multifd_send_prepare_header() to setup the header packet
for multifd sender.

It's fine to setup the IOV[0] _before_ send_prepare() because the packet
buffer is already ready, even if the content is to be filled in.

With this helper, we can already slightly clean up the zero copy path.

Note that I explicitly put it into multifd.h, because I want it inlined
directly into multifd*.c where necessary later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
8a9ef17380 migration/multifd: Move trace_multifd_send|recv()
Move them into fill/unfill of packets.  With that, we can further cleanup
the send/recv thread procedure, and remove one more temp var.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
db7e1cc510 migration/multifd: Move total_normal_pages accounting
Just like the previous patch, move the accounting for total_normal_pages on
both src/dst sides into the packet fill/unfill procedures.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
05b7ec1890 migration/multifd: Rename p->num_packets and clean it up
This field, no matter whether on src or dest, is only used for debugging
purpose.

They can even be removed already, unless it still more or less provide some
accounting on "how many packets are sent/recved for this thread".  The
other more important one is called packet_num, which is embeded in the
multifd packet headers (MultiFDPacket_t).

So let's keep them for now, but make them much easier to understand, by
doing below:

  - Rename both of them to packets_sent / packets_recved, the old
  name (num_packets) are waaay too confusing when we already have
  MultiFDPacket_t.packets_num.

  - Avoid worrying on the "initial packet": we know we will send it, that's
  good enough.  The accounting won't matter a great deal to start with 0 or
  with 1.

  - Move them to where we send/recv the packets.  They're:

    - multifd_send_fill_packet() for senders.
    - multifd_recv_unfill_packet() for receivers.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
83c560fb42 migration/multifd: Drop pages->num check in sender thread
Now with a split SYNC handler, we always have pages->num set for
pending_job==true.  Assert it instead.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
e3cce9af10 migration/multifd: Simplify locking in sender thread
The sender thread will yield the p->mutex before IO starts, trying to not
block the requester thread.  This may be unnecessary lock optimizations,
because the requester can already read pending_job safely even without the
lock, because the requester is currently the only one who can assign a
task.

Drop that lock complication on both sides:

  (1) in the sender thread, always take the mutex until job done
  (2) in the requester thread, check pending_job clear lockless

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
f5f48a7891 migration/multifd: Separate SYNC request with normal jobs
Multifd provide a threaded model for processing jobs.  On sender side,
there can be two kinds of job: (1) a list of pages to send, or (2) a sync
request.

The sync request is a very special kind of job.  It never contains a page
array, but only a multifd packet telling the dest side to synchronize with
sent pages.

Before this patch, both requests use the pending_job field, no matter what
the request is, it will boost pending_job, while multifd sender thread will
decrement it after it finishes one job.

However this should be racy, because SYNC is special in that it needs to
set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
Consider a sequence of operations where:

  - migration thread enqueue a job to send some pages, pending_job++ (0->1)

  - [...before the selected multifd sender thread wakes up...]

  - migration thread enqueue another job to sync, pending_job++ (1->2),
    setup p->flags=MULTIFD_FLAG_SYNC

  - multifd sender thread wakes up, found pending_job==2
    - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
    - send the 2nd packet with flags==0 and no pages

This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
after all the pages are received.  Meanwhile, the 2nd packet will be
completely useless, which contains zero information.

I didn't verify above, but I think this issue is still benign in that at
least on the recv side we always receive pages before handling
MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.

One other reason I want to separate it is using p->flags to communicate
between the two threads is also not clearly defined, it's very hard to read
and understand why accessing p->flags is always safe; see the current impl
of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
need to be that complicated.

This patch introduces pending_sync, a separate flag just to show that the
requester needs a sync.  Alongside, we remove the tricky caching of
p->flags now because after this patch p->flags should only be used by
multifd sender thread now, which will be crystal clear.  So it is always
thread safe to access p->flags.

With that, we can also safely convert the pending_job into a boolean,
because we don't support >1 pending jobs anyway.

Always use atomic ops to access both flags to make sure no cache effect.
When at it, drop the initial setting of "pending_job = 0" because it's
always allocated using g_new0().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
efd8c5439d migration/multifd: Drop MultiFDSendParams.normal[] array
This array is redundant when p->pages exists.  Now we extended the life of
p->pages to the whole period where pending_job is set, it should be safe to
always use p->pages->offset[] rather than p->normal[].  Drop the array.

Alongside, the normal_num is also redundant, which is the same to
p->pages->num.

This doesn't apply to recv side, because there's no extra buffering on recv
side, so p->normal[] array is still needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
836eca47f6 migration/multifd: Postpone reset of MultiFDPages_t
Now we reset MultiFDPages_t object in the multifd sender thread in the
middle of the sending job.  That's not necessary, because the "*pages"
struct will not be reused anyway until pending_job is cleared.

Move that to the end after the job is completed, provide a helper to reset
a "*pages" object.  Use that same helper when free the object too.

This prepares us to keep using p->pages in the follow up patches, where we
may drop p->normal[].

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
15f3f21d59 migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Multifd send side has two fields to indicate error quits:

  - MultiFDSendParams.quit
  - &multifd_send_state->exiting

Merge them into the global one.  The replacement is done by changing all
p->quit checks into the global var check.  The global check doesn't need
any lock.

A few more things done on top of this altogether:

  - multifd_send_terminate_threads()

    Moving the xchg() of &multifd_send_state->exiting upper, so as to cover
    the tracepoint, migrate_set_error() and migrate_set_state().

  - multifd_send_sync_main()

    In the 2nd loop, add one more check over the global var to make sure we
    don't keep the looping if QEMU already decided to quit.

  - multifd_tls_outgoing_handshake()

    Use multifd_send_terminate_threads() to set the error state.  That has
    a benefit of updating MigrationState.error to that error too, so we can
    persist that 1st error we hit in that specific channel.

  - multifd_new_send_channel_async()

    Take similar approach like above, drop the migrate_set_error() because
    multifd_send_terminate_threads() already covers that.  Unwrap the helper
    multifd_new_send_channel_cleanup() along the way; not really needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
48c0f5d56f migration/multifd: multifd_send_kick_main()
When a multifd sender thread hit errors, it always needs to kick the main
thread by kicking all the semaphores that it can be waiting upon.

Provide a helper for it and deduplicate the code.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
8888a552bf migration/multifd: Drop stale comment for multifd zero copy
We've already done that with multifd_flush_after_each_section, for multifd
in general.  Drop the stale "TODO-like" comment.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
William Roche
06152b89db migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:41:58 +08:00
Fabiano Rosas
44d0d456d7 migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further
wrap the bottom half dispatch process to avoid future issues.

Move BH creation and scheduling together and wrap the dispatch with an
intermediary function that will ensure we always keep the ref/unref
balanced.

Also move the responsibility of deleting the BH into the wrapper and
remove the now unnecessary pointers.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
699d9476a0 migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the
current_migration object.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
9cf268965d migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use. Even on this
load-side function, we might still use the MigrationState, e.g to
check for capabilities.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
59094cfa7a migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
27eb8499ed migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half
to run without holding a reference to the 'current_migration' object,
which leads to a segmentation fault if the BH is still live after
migration_shutdown() has dropped the last reference to
current_migration.

In my system the bug manifests as migrate_multifd() returning true
when it shouldn't and multifd_load_shutdown() calling
multifd_recv_terminate_threads() which crashes due to an uninitialized
multifd_recv_state.

Fix the issue by holding a reference to the object when scheduling the
BH and dropping it before returning from the BH. The same is already
done for the cleanup_bh at migrate_fd_cleanup_schedule().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
0a5d1108ab migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering
yank functions. Query for features instead.

The yank method for all channels used with migration code currently is
to call the qio_channel_shutdown() function, so query for
QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the
future for indicating whether a channel supports yanking, but that
seems overkill at the moment.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20230911171320.24372-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Peter Xu
b0504edd40 migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means
this check:

  remaining_size < s->threshold_size

Must have been done already, actually at migration_iteration_run():

    if (must_precopy <= s->threshold_size) {
        qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);

That should be after one round of ram_state_pending_estimate().  It makes
the 2nd check meaningless and can be dropped.

To say it in another way, when reaching ->state_pending_exact(), we
unconditionally sync dirty bits for precopy.

Then we can drop migrate_get_current() there too.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Peter Xu
a8629e0c2f migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t.  Make it always clear
that it's never a negative.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Markus Armbruster
918f620d30 migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails.  Plug the leak
with g_autoptr().

Fixes: 967f2de5c9 (migration: Implement MigrateChannelList to hmp migration flow.) v8.2.0-rc0
Fixes: CID 1533125
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20240117140722.3979657-1-armbru@redhat.com
[peterx: fix CID number as reported by Peter Maydell]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Paolo Bonzini
73b4987858 userfaultfd: use 1ULL to build ioctl masks
There is no need to use the Linux-internal __u64 type, 1ULL is
guaranteed to be wide enough.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonzini@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Nick Briggs
44ce1b5d2f migration/rdma: define htonll/ntohll only if not predefined
Solaris has #defines for htonll and ntohll which cause syntax errors
when compiling code that attempts to (re)define these functions..

Signed-off-by: Nick Briggs <nicholas.h.briggs@gmail.com>
Link: https://lore.kernel.org/r/65a04a7d.497ab3.3e7bef1f@gateway.sonic.net
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:10 +08:00
Fabiano Rosas
e3b8ad5c13 migration: Report error in incoming migration
We're not currently reporting the errors set with migrate_set_error()
when incoming migration fails.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
6074f81625 migration/multifd: Change multifd_pages_init argument
The 'size' argument is actually the number of pages that fit in a
multifd packet. Change it to uint32_t and rename.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
9346fa1870 migration/multifd: Remove QEMUFile from where it is not needed
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
dca1bc7f24 migration/multifd: Remove MultiFDPages_t::packet_num
This was introduced by commit 34c55a94b1 ("migration: Create multipage
support") and never used.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Het Gala
0770ad438d migration: Simplify initial conditionals in migration for better readability
The inital conditional statements in qmp migration functions is harder
to understand than necessary. It is better to get all errors out of
the way in the beginning itself to have better readability and error
handling.

Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231205080039.197615-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Stefan Hajnoczi
a4a411fbaf Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Peter Maydell
c8193acc07 migration 1st pull for 9.0
- We lost Juan and Leo in the maintainers file
 - Steven's suspend state fix
 - Steven's fix for coverity on migrate_mode
 - Avihai's migration cleanup series
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
 Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
 =sfuM
 -----END PGP SIGNATURE-----

Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging

migration 1st pull for 9.0

- We lost Juan and Leo in the maintainers file
- Steven's suspend state fix
- Steven's fix for coverity on migrate_mode
- Avihai's migration cleanup series

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
# Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
# =sfuM
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 04 Jan 2024 04:30:07 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu: (26 commits)
  migration: fix coverity migrate_mode finding
  migration/multifd: Remove unnecessary usage of local Error
  migration: Remove unnecessary usage of local Error
  migration: Fix migration_channel_read_peek() error path
  migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
  migration/multifd: Fix leaking of Error in TLS error flow
  migration/multifd: Simplify multifd_channel_connect() if else statement
  migration/multifd: Fix error message in multifd_recv_initial_packet()
  migration: Remove errp parameter in migration_fd_process_incoming()
  migration: Refactor migration_incoming_setup()
  migration: Remove nulling of hostname in migrate_init()
  migration: Remove migrate_max_downtime() declaration
  tests/qtest: postcopy migration with suspend
  tests/qtest: precopy migration with suspend
  tests/qtest: option to suspend during migration
  tests/qtest: migration events
  migration: preserve suspended for bg_migration
  migration: preserve suspended for snapshot
  migration: preserve suspended runstate
  migration: propagate suspended runstate
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-01-05 13:35:25 +00:00
Steve Sistare
b12635ff08 migration: fix coverity migrate_mode finding
Coverity diagnoses a possible out-of-range array index here ...

    static GSList *migration_blockers[MIG_MODE__MAX];

    fill_source_migration_info() {
        GSList *cur_blocker = migration_blockers[migrate_mode()];

... because it does not know that MIG_MODE__MAX will never be returned as
a migration mode.  To fix, assert so in migrate_mode().

Fixes: fa3673e497 ("migration: per-mode blockers")

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1699907025-215450-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
3fc58efa93 migration/multifd: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in multifd.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-12-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
b6f4c0c788 migration: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in migration.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-11-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
4f8cf323e8 migration: Fix migration_channel_read_peek() error path
migration_channel_read_peek() calls qio_channel_readv_full() and handles
both cases of return value == 0 and return value < 0 the same way, by
calling error_setg() with errp. However, if return value < 0, errp is
already set, so calling error_setg() with errp will lead to an assert.

Fix it by handling these cases separately, calling error_setg() with
errp only in return value == 0 case.

Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-10-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
1d3886f837 migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
If multifd_load_setup() fails in migration_ioc_process_incoming(),
error_setg() is called with errp. This will lead to an assert because in
that case errp already contains an error.

Fix it by removing the redundant error_setg().

Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-9-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
6ae208ce96 migration/multifd: Fix leaking of Error in TLS error flow
If there is an error in multifd TLS handshake task,
multifd_tls_outgoing_handshake() retrieves the error with
qio_task_propagate_error() but never frees it.

Fix it by freeing the obtained Error.

In addition, the error is not reported at all, so report it with
migrate_set_error().

Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-8-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
a4395f5d3c migration/multifd: Simplify multifd_channel_connect() if else statement
The else branch in multifd_channel_connect() is redundant because when
the if branch is taken the function returns.

Simplify the code by removing the else branch.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-7-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c77b40859a migration/multifd: Fix error message in multifd_recv_initial_packet()
In multifd_recv_initial_packet(), if MultiFDInit_t->id is greater than
the configured number of multifd channels, an irrelevant error message
about multifd version is printed.

Change the error message to a relevant one about the channel id.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-6-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
b0cf3bfc69 migration: Remove errp parameter in migration_fd_process_incoming()
Errp parameter in migration_fd_process_incoming() is unused.
Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-5-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c606f3baa2 migration: Refactor migration_incoming_setup()
Commit 6720c2b327 ("migration: check magic value for deciding the
mapping of channels") extracted the only code that could fail in
migration_incoming_setup().

Now migration_incoming_setup() can't fail, so refactor it to return void
and remove errp parameter.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-4-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
80caba6955 migration: Remove nulling of hostname in migrate_init()
MigrationState->hostname is set to NULL in migrate_init(). This is
redundant because it is already freed and set to NULL in
migrade_fd_cleanup(). Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-3-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
17b9483baa migration: Remove migrate_max_downtime() declaration
migrate_max_downtime() has been removed long ago, but its declaration
was mistakenly left. Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
49a5020697 migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
58b105703e migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
b4e9ddccd1 migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
d3c86c99f3 migration: propagate suspended runstate
If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Richard Henderson
a77ffe9595 migration: Constify VMState
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-67-richard.henderson@linaro.org>
2023-12-30 07:38:06 +11:00
Richard Henderson
2027001919 migration: Make VMStateDescription.subsections const
Allow the array of pointers to itself be const.
Propagate this through the copies of this field.

Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-2-richard.henderson@linaro.org>
2023-12-29 11:17:30 +11:00
Stefan Hajnoczi
b0839e87af dirtylimit dirtyrate pull request 20231225
Nitpick about an unused parameter
 Please apply, thanks,
 Yong
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEaF0CINwmSCgVLlfC3/Ij1rP+y5wFAmWJVtkWHHlvbmcuaHVh
 bmdAc21hcnR4LmNvbQAKCRDf8iPWs/7LnLexEADjuQ7dtbv2vq1ksqDq6fct++FK
 sGPe6P9UAyJrB0qxW3jKRn2P4E/xiPHpmi+BbOOu8PppeZt86NFlzun681vMpkKu
 d3oAlwRtX8eklk3oub+yBXzkAfsGr+EHEWms/12ATsd6apItxi47zUUw/2aznujz
 TlZR7qOD1kjvPHOVnP07fQ3NYiPIITpzSqVgo3fxE+upmb+dnELIkyGTAvJY2Pjz
 Uh4CtG/sQq8OJJwUn1TFmhB1Ujw9oo8v/lOtIpsxfcKtQcAwyvgmyfndOMZ/h6O3
 ri3axypg8PyGyidYEh0JSoR3OVUIt5eoqPNp+sQTNeZsjTIwKjpBGUOLXbIA56nB
 Z/ZRbxSKj+JRD9qQZ2NODE1QurgEuT51aWkEeBEsx0OtUINcjZkwWeQmiasJQw1T
 4z4Hez0ipuET9MmeTbWkVh9rO3Bsxi9QxiXEiN6YPNqTqLhwomXU+MIlrCsZQzjB
 l4Jij9byeWL9ntx3qwtSWJGHV6Qql16UkPo+c2Co/q2rdx/FqiYrMgCtLee1yo5n
 lSCKmrVMLaLyQv7jnLY8txhU7/CXNnwij4YSgfi7nujOgM6L3uwYc9Tp5kcQHE/b
 o/Mr6oOi79Z+rdByk4pPUKaJV7FnbWMoUgjsqWPJgedGt0K9StRsXCGR1wYMUVdU
 R8Dv0IZyyGuSf6MhNw==
 =Y00D
 -----END PGP SIGNATURE-----

Merge tag 'dirtylimit-dirtyrate-pull-request-20231225' of https://github.com/newfriday/qemu into staging

dirtylimit dirtyrate pull request 20231225

Nitpick about an unused parameter
Please apply, thanks,
Yong

# -----BEGIN PGP SIGNATURE-----
#
# iQJKBAABCAA0FiEEaF0CINwmSCgVLlfC3/Ij1rP+y5wFAmWJVtkWHHlvbmcuaHVh
# bmdAc21hcnR4LmNvbQAKCRDf8iPWs/7LnLexEADjuQ7dtbv2vq1ksqDq6fct++FK
# sGPe6P9UAyJrB0qxW3jKRn2P4E/xiPHpmi+BbOOu8PppeZt86NFlzun681vMpkKu
# d3oAlwRtX8eklk3oub+yBXzkAfsGr+EHEWms/12ATsd6apItxi47zUUw/2aznujz
# TlZR7qOD1kjvPHOVnP07fQ3NYiPIITpzSqVgo3fxE+upmb+dnELIkyGTAvJY2Pjz
# Uh4CtG/sQq8OJJwUn1TFmhB1Ujw9oo8v/lOtIpsxfcKtQcAwyvgmyfndOMZ/h6O3
# ri3axypg8PyGyidYEh0JSoR3OVUIt5eoqPNp+sQTNeZsjTIwKjpBGUOLXbIA56nB
# Z/ZRbxSKj+JRD9qQZ2NODE1QurgEuT51aWkEeBEsx0OtUINcjZkwWeQmiasJQw1T
# 4z4Hez0ipuET9MmeTbWkVh9rO3Bsxi9QxiXEiN6YPNqTqLhwomXU+MIlrCsZQzjB
# l4Jij9byeWL9ntx3qwtSWJGHV6Qql16UkPo+c2Co/q2rdx/FqiYrMgCtLee1yo5n
# lSCKmrVMLaLyQv7jnLY8txhU7/CXNnwij4YSgfi7nujOgM6L3uwYc9Tp5kcQHE/b
# o/Mr6oOi79Z+rdByk4pPUKaJV7FnbWMoUgjsqWPJgedGt0K9StRsXCGR1wYMUVdU
# R8Dv0IZyyGuSf6MhNw==
# =Y00D
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 25 Dec 2023 05:18:01 EST
# gpg:                using RSA key 685D0220DC264828152E57C2DFF223D6B3FECB9C
# gpg:                issuer "yong.huang@smartx.com"
# gpg: Good signature from "Yong Huang <yong.huang@smartx.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 685D 0220 DC26 4828 152E  57C2 DFF2 23D6 B3FE CB9C

* tag 'dirtylimit-dirtyrate-pull-request-20231225' of https://github.com/newfriday/qemu:
  migration/dirtyrate: Remove an extra parameter

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-26 06:06:50 -05:00