Commit Graph

721 Commits

Author SHA1 Message Date
Fabiano Rosas
decdc76772 migration/multifd: Add mapped-ram support to fd: URI
If we receive a file descriptor that points to a regular file, there's
nothing stopping us from doing multifd migration with mapped-ram to
that file.

Enable the fd: URI to work with multifd + mapped-ram.

Note that the fds passed into multifd are duplicated because we want
to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The
original fd doesn't need to be duplicated because monitor_get_fd()
transfers ownership to the caller.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
f427d90b98 migration/multifd: Support outgoing mapped-ram stream format
The new mapped-ram stream format uses a file transport and puts ram
pages in the migration file at their respective offsets and can be
done in parallel by using the pwritev system call which takes iovecs
and an offset.

Add support to enabling the new format along with multifd to make use
of the threading and page handling already in place.

This requires multifd to stop sending headers and leaving the stream
format to the mapped-ram code. When it comes time to write the data, we
need to call a version of qio_channel_write that can take an offset.

Usage on HMP is:

(qemu) stop
(qemu) migrate_set_capability multifd on
(qemu) migrate_set_capability mapped-ram on
(qemu) migrate_set_parameter max-bandwidth 0
(qemu) migrate_set_parameter multifd-channels 8
(qemu) migrate file:migfile

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-21-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
2dd7ee7a51 migration/multifd: Add incoming QIOChannelFile support
On the receiving side we don't need to differentiate between main
channel and threads, so whichever channel is defined first gets to be
the main one. And since there are no packets, use the atomic channel
count to index into the params array.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-19-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
8d9e0d4100 migration: Add mapped-ram URI compatibility check
The mapped-ram migration format needs a channel that supports seeking
to be able to write each page to an arbitrary offset in the migration
stream.

Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Fabiano Rosas
4ed49feb44 migration/ram: Introduce 'mapped-ram' migration capability
Add a new migration capability 'mapped-ram'.

The core of the feature is to ensure that RAM pages are mapped
directly to offsets in the resulting migration file instead of being
streamed at arbitrary points.

The reasons why we'd want such behavior are:

 - The resulting file will have a bounded size, since pages which are
   dirtied multiple times will always go to a fixed location in the
   file, rather than constantly being added to a sequential
   stream. This eliminates cases where a VM with, say, 1G of RAM can
   result in a migration file that's 10s of GBs, provided that the
   workload constantly redirties memory.

 - It paves the way to implement O_DIRECT-enabled save/restore of the
   migration stream as the pages are ensured to be written at aligned
   offsets.

 - It allows the usage of multifd so we can write RAM pages to the
   migration file in parallel.

For now, enabling the capability has no effect. The next couple of
patches implement the core functionality.

Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-03-01 15:42:04 +08:00
Cédric Le Goater
9425ef3f99 migration: Use migrate_has_error() in close_return_path_on_source()
close_return_path_on_source() retrieves the migration error from the
the QEMUFile '->to_dst_file' to know if a shutdown is required. This
shutdown is required to exit the return-path thread.

Avoid relying on '->to_dst_file' and use migrate_has_error() instead.

(using to_dst_file is a heuristic to infer whether
rp_state.from_dst_file might be stuck on a recvmsg(). Using a generic
method for detecting errors is more reliable. We also want to reduce
dependency on QEMUFile::last_error)

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
[added some words about the motivation for this patch]
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226203122.22894-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
22b04245f0 migration: Join the return path thread before releasing to_dst_file
The return path thread might hang at a blocking system call. Before
joining the thread we might need to issue a shutdown() on the socket
file descriptor to release it. To determine whether the shutdown() is
necessary we look at the QEMUFile error.

Make sure we only clean up the QEMUFile after the return path has been
waited for.

This fixes a hang when qemu_savevm_state_setup() produced an error
that was detected by migration_detect_error(). That skips
migration_completion() so close_return_path_on_source() would get
stuck waiting for the RP thread to terminate.

Reported-by: Cédric Le Goater <clg@redhat.com>
Tested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226203122.22894-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
63f64d77f0 migration: Fix qmp_query_migrate mbps value
The QMP command query_migrate might see incorrect throughput numbers
if it runs after we've set the migration completion status but before
migration_calculate_complete() has updated s->total_time and s->mbps.

The migration status would show COMPLETED, but the throughput value
would be the one from the last iteration and not the one from the
whole migration. This will usually be a larger value due to the time
period being smaller (one iteration).

Move migration_calculate_complete() earlier so that the status
MIGRATION_STATUS_COMPLETED is only emitted after the final counters
update. Keep everything under the BQL so the QMP thread sees the
updates as atomic.

Rename migration_calculate_complete to migration_completion_end to
reflect its new purpose of also updating s->state.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240226143335.14282-1-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
cbdafc1b34 migration: options incompatible with cpr
Fail the migration request if options are set that are incompatible
with cpr.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
9867d4ddd0 migration: stop vm for cpr
When migration for cpr is initiated, stop the vm and set state
RUN_STATE_FINISH_MIGRATE before ram is saved.  This eliminates the
possibility of ram and device state being out of sync, and guarantees
that a guest in the suspended state remains suspended, because qmp_cont
rejects a cont command in the RUN_STATE_FINISH_MIGRATE state.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
4af667f87c migration: notifier error checking
Check the status returned by migration notifiers for event type
MIG_EVENT_PRECOPY_SETUP, and report errors.  None of the notifiers
return an error status at this time.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
bf78a046b9 migration: refactor migrate_fd_connect failures
Move common code for the error path in migrate_fd_connect to a shared
fail label.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
6835f5a1bc migration: per-mode notifiers
Keep a separate list of migration notifiers for each migration mode.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
5663dd3f1a migration: MigrationNotifyFunc
Define MigrationNotifyFunc to improve type safety and simplify migration
notifiers.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
c763a23e41 migration: remove postcopy_after_devices
postcopy_after_devices and migration_in_postcopy_after_devices are no
longer used, so delete them.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
9d9babf78d migration: MigrationEvent for notifiers
Passing MigrationState to notifiers is unsound because they could access
unstable migration state internals or even modify the state.  Instead, pass
the minimal info needed in a new MigrationEvent struct, which could be
extended in the future if needed.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-5-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Steve Sistare
3e7757301c migration: convert to NotifierWithReturn
Change all migration notifiers to type NotifierWithReturn, so notifiers
can return an error status in a future patch.  For now, pass NULL for the
notifier error parameter, and do not check the return value.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sistare@oracle.com
[peterx: dropped unexpected update to roms/seabios-hppa]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-28 11:31:28 +08:00
Fabiano Rosas
dd904bc13f migration/multifd: Move multifd_send_setup into migration thread
We currently have an unfavorable situation around multifd channels
creation and the migration thread execution.

We create the multifd channels with qio_channel_socket_connect_async
-> qio_task_run_in_thread, but only connect them at the
multifd_new_send_channel_async callback, called from
qio_task_complete, which is registered as a glib event.

So at multifd_send_setup() we create the channels, but they will only
be actually usable after the whole multifd_send_setup() calling stack
returns back to the main loop. Which means that the migration thread
is already up and running without any possibility for the multifd
channels to be ready on time.

We currently rely on the channels-ready semaphore blocking
multifd_send_sync_main() until channels start to come up and release
it. However there have been bugs recently found when a channel's
creation fails and multifd_send_cleanup() is allowed to run while
other channels are still being created.

Let's start to organize this situation by moving the
multifd_send_setup() call into the migration thread. That way we
unblock the main-loop to dispatch the completion callbacks and
actually have a chance of getting the multifd channels ready for when
the migration thread needs them.

The next patches will deal with the synchronization aspects.

Note that this takes multifd_send_setup() out of the BQL.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Fabiano Rosas
bd8b0a8f82 migration/multifd: Move multifd_send_setup error handling in to the function
Hide the error handling inside multifd_send_setup to make it cleaner
for the next patch to move the function around.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:18 +08:00
Avihai Horon
3205bebd4f migration: Fix logic of channels and transport compatibility check
The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: d95533e1cd ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:00 +08:00
Peter Xu
cde85c37ca migration/multifd: Stick with send/recv on function names
Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
William Roche
06152b89db migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:41:58 +08:00
Fabiano Rosas
44d0d456d7 migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further
wrap the bottom half dispatch process to avoid future issues.

Move BH creation and scheduling together and wrap the dispatch with an
intermediary function that will ensure we always keep the ref/unref
balanced.

Also move the responsibility of deleting the BH into the wrapper and
remove the now unnecessary pointers.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
699d9476a0 migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the
current_migration object.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
59094cfa7a migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
27eb8499ed migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half
to run without holding a reference to the 'current_migration' object,
which leads to a segmentation fault if the BH is still live after
migration_shutdown() has dropped the last reference to
current_migration.

In my system the bug manifests as migrate_multifd() returning true
when it shouldn't and multifd_load_shutdown() calling
multifd_recv_terminate_threads() which crashes due to an uninitialized
multifd_recv_state.

Fix the issue by holding a reference to the object when scheduling the
BH and dropping it before returning from the BH. The same is already
done for the cleanup_bh at migrate_fd_cleanup_schedule().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
e3b8ad5c13 migration: Report error in incoming migration
We're not currently reporting the errors set with migrate_set_error()
when incoming migration fails.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Het Gala
0770ad438d migration: Simplify initial conditionals in migration for better readability
The inital conditional statements in qmp migration functions is harder
to understand than necessary. It is better to get all errors out of
the way in the beginning itself to have better readability and error
handling.

Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231205080039.197615-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Stefan Hajnoczi
a4a411fbaf Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Avihai Horon
b6f4c0c788 migration: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in migration.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-11-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
1d3886f837 migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
If multifd_load_setup() fails in migration_ioc_process_incoming(),
error_setg() is called with errp. This will lead to an assert because in
that case errp already contains an error.

Fix it by removing the redundant error_setg().

Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-9-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
b0cf3bfc69 migration: Remove errp parameter in migration_fd_process_incoming()
Errp parameter in migration_fd_process_incoming() is unused.
Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-5-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c606f3baa2 migration: Refactor migration_incoming_setup()
Commit 6720c2b327 ("migration: check magic value for deciding the
mapping of channels") extracted the only code that could fail in
migration_incoming_setup().

Now migration_incoming_setup() can't fail, so refactor it to return void
and remove errp parameter.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-4-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
80caba6955 migration: Remove nulling of hostname in migrate_init()
MigrationState->hostname is set to NULL in migrate_init(). This is
redundant because it is already freed and set to NULL in
migrade_fd_cleanup(). Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-3-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
49a5020697 migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
b4e9ddccd1 migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Stefan Hajnoczi
019f8c19df Migration patches for rc3:
- One more memleak regression fix from Het
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC
 oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J
 =T0/r
 -----END PGP SIGNATURE-----

Merge tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu into staging

Migration patches for rc3:

- One more memleak regression fix from Het

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC
# oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J
# =T0/r
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Dec 2023 11:35:57 EST
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [full]
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu:
  migration: Plug memory leak with migration URIs

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-04 08:01:24 -05:00
Het Gala
bc1d54ee51 migration: Plug memory leak with migration URIs
migrate_uri_parse() allocates memory to 'channel' if the user
opts for old syntax - uri, which is leaked because there is no
code for freeing 'channel'.
So, free channel to avoid memory leak in case where 'channels'
is empty and uri parsing is required.

Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration flow")
Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20231129204301.131228-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2023-12-01 11:01:28 -05:00
Zongmin Zhou
41581265aa migration: free 'saddr' since be no longer used
Since socket_parse() will allocate memory for 'saddr',and its value
will pass to 'addr' that allocated by migrate_uri_parse(),
then 'saddr' will no longer used,need to free.
But due to 'saddr->u' is shallow copying the contents of the union,
the members of this union containing allocated strings,and will be used after that.
So just free 'saddr' itself without doing a deep free on the contents of the SocketAddress.

Fixes: 72a8192e22 ("migration: convert migration 'uri' into 'MigrateAddress'")
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231120031428.908295-1-zhouzongmin@kylinos.cn>
2023-11-30 09:51:24 +01:00
Het Gala
967f2de5c9 migration: Implement MigrateChannelList to hmp migration flow.
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for hmp migration.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231023182053.8711-14-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-11-02 11:35:04 +01:00
Het Gala
5994024fd1 migration: Implement MigrateChannelList to qmp migration flow.
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for qmp migration.

For current series, limit the size of MigrateChannelList
to single element (single interface) as runtime check.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-13-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
d95533e1cd migration: modify migration_channels_and_uri_compatible() for new QAPI syntax
migration_channels_and_uri_compatible() check for transport mechanism
suitable for multifd migration gets executed when the caller calls old
uri syntax. It needs it to be run when using the modern MigrateChannel
QAPI syntax too.

After URI -> 'MigrateChannel' :
migration_channels_and_uri_compatible() ->
migration_channels_and_transport_compatible() passes object as argument
and check for valid transport mechanism.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-12-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
074dbce5fc migration: New migrate and migrate-incoming argument 'channels'
MigrateChannelList allows to connect accross multiple interfaces.
Add MigrateChannelList struct as argument to migration QAPIs.

We plan to include multiple channels in future, to connnect
multiple interfaces. Hence, we choose 'MigrateChannelList'
as the new argument over 'MigrateChannel' to make migration
QAPIs future proof.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-10-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Fabiano Rosas
02afba63e9 migration: Convert the file backend to the new QAPI syntax
Convert the file: URI to accept a FileMigrationArgs to be compatible
with the new migration QAPI.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-9-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
cbab4face5 migration: convert exec backend to accept MigrateAddress.
Exec transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for exec connection into strList struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-8-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
3fa9642ff7 migration: convert rdma backend to accept MigrateAddress
RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs
accept new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for RDMA connection into well defined InetSocketAddress struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-7-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Het Gala
34dfc5e407 migration: convert socket backend to accept MigrateAddress
Socket transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for socket connection into well defined SocketAddress struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-6-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Het Gala
72a8192e22 migration: convert migration 'uri' into 'MigrateAddress'
This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri'
string containing migration connection related information
and stores them inside well defined 'MigrateAddress' struct.

Fabiano fixed for "file" transport.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-4-farosas@suse.de>
Message-ID: <20231023182053.8711-5-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Peter Xu
88577f3242 migration: Change ram_dirty_bitmap_reload() retval to bool
Now we have a Error** passed into the return path thread stack, which is
even clearer than an int retval.  Change ram_dirty_bitmap_reload() and the
callers to use a bool instead to replace errnos.

Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-5-peterx@redhat.com>
2023-11-02 11:35:03 +01:00