Commit Graph

2210 Commits

Author SHA1 Message Date
Avihai Horon
3205bebd4f migration: Fix logic of channels and transport compatibility check
The commit in the fixes line mistakenly modified the channels and
transport compatibility check logic so it now checks multi-channel
support only for socket transport type.

Thus, running multifd migration using a transport other than socket that
is incompatible with multi-channels (such as "exec") would lead to a
segmentation fault instead of an error message.
For example:
  (qemu) migrate_set_capability multifd on
  (qemu) migrate -d "exec:cat > /tmp/vm_state"
  Segmentation fault (core dumped)

Fix it by checking multi-channel compatibility for all transport types.

Cc: qemu-stable <qemu-stable@nongnu.org>
Fixes: d95533e1cd ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07 09:53:00 +08:00
Peter Xu
488c84acb4 migration/multifd: Optimize sender side to be lockless
When reviewing my attempt to refactor send_prepare(), Fabiano suggested we
try out with dropping the mutex in multifd code [1].

I thought about that before but I never tried to change the code.  Now
maybe it's time to give it a stab.  This only optimizes the sender side.

The trick here is multifd has a clear provider/consumer model, that the
migration main thread publishes requests (either pending_job/pending_sync),
while the multifd sender threads are consumers.  Here we don't have a lot
of complicated data sharing, and the jobs can logically be submitted
lockless.

Arm the code with atomic weapons.  Two things worth mentioning:

  - For multifd_send_pages(): we can use qatomic_load_acquire() when trying
  to find a free channel, but that's expensive if we attach one ACQUIRE per
  channel.  Instead, keep the qatomic_read() on reading the pending_job
  flag as we do already, meanwhile use one smp_mb_acquire() after the loop
  to guarantee the memory ordering.

  - For pending_sync: it doesn't have any extra data required since now
  p->flags are never touched, it should be safe to not use memory barrier.
  That's different from pending_job.

Provide rich comments for all the lockless operations to state how they are
paired.  With that, we can remove the mutex.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-06 11:05:49 +08:00
Peter Xu
98ea497d8b migration/multifd: Fix MultiFDSendParams.packet_num race
As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to
Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy
to be assigned and stored.  Consider two consequent operations of: (1)
queue a job into multifd send thread X, then (2) queue another sync request
to the same send thread X.  Then the MultiFDSendParams.packet_num will be
assigned twice, and the first assignment can get lost already.

To avoid that, we move the packet_num assignment from p->packet_num into
where the thread will fill in the packet.  Use atomic operations to protect
the field, making sure there's no race.

Note that atomic fetch_add() may not be good for scaling purposes, however
multifd should be fine as number of threads should normally not go beyond
16 threads.  Let's leave that concern for later but fix the issue first.

There's also a trick on how to make it always work even on 32 bit hosts for
uint64_t packet number.  Switching to uintptr_t as of now to simply the
case.  It will cause packet number to overflow easier on 32 bit, but that
shouldn't be a major concern for now as 32 bit systems is not the major
audience for any performance concerns like what multifd wants to address.

We also need to move multifd_send_state definition upper, so that
multifd_send_fill_packet() can reference it.

[1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de

Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu
cde85c37ca migration/multifd: Stick with send/recv on function names
Most of the multifd code uses send/recv to represent the two sides, but
some rare cases use save/load.

Since send/recv is the majority, replacing the save/load use cases to use
send/recv globally.  Now we reach a consensus on the naming.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:11 +08:00
Peter Xu
5e6ea8a1d6 migration/multifd: Cleanup multifd_load_cleanup()
Use similar logic to cleanup the recv side.

Note that multifd_recv_terminate_threads() may need some similar rework
like the sender side, but let's leave that for later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
12808db3b8 migration/multifd: Cleanup multifd_save_cleanup()
Shrink the function by moving relevant works into helpers: move the thread
join()s into multifd_send_terminate_threads(), then create two more helpers
to cover channel/state cleanups.

Add a TODO entry for the thread terminate process because p->running is
still buggy.  We need to fix it at some point but not yet covered.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
f88f86c4ee migration/multifd: Rewrite multifd_queue_page()
The current multifd_queue_page() is not easy to read and follow.  It is not
good with a few reasons:

  - No helper at all to show what exactly does a condition mean; in short,
  readability is low.

  - Rely on pages->ramblock being cleared to detect an empty queue.  It's
  slightly an overload of the ramblock pointer, per Fabiano [1], which I
  also agree.

  - Contains a self recursion, even if not necessary..

Rewrite this function.  We add some comments to make it even clearer on
what it does.

[1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
3b40964a86 migration/multifd: Change retval of multifd_send_pages()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
d6556d174a migration/multifd: Change retval of multifd_queue_page()
Using int is an overkill when there're only two options.  Change it to a
boolean.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
3ab4441d97 migration/multifd: Split multifd_send_terminate_threads()
Split multifd_send_terminate_threads() into two functions:

  - multifd_send_set_error(): used when an error happened on the sender
    side, set error and quit state only

  - multifd_send_terminate_threads(): used only by the main thread to kick
    all multifd send threads out of sleep, for the last recycling.

Use multifd_send_set_error() in the three old call sites where only the
error will be set.

Use multifd_send_terminate_threads() in the last one where the main thread
will kick the multifd threads at last in multifd_save_cleanup().

Both helpers will need to set quitting=1.

Suggested-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
859ebaf346 migration/multifd: Forbid spurious wakeups
Now multifd's logic is designed to have no spurious wakeup.  I still
remember a talk to Juan and he seems to agree we should drop it now, and if
my memory was right it was there because multifd used to hit that when
still debugging.

Let's drop it and see what can explode; as long as it's not reaching
soft-freeze.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
25a1f87875 migration/multifd: Move header prepare/fill into send_prepare()
This patch redefines the interfacing of ->send_prepare().  It further
simplifies multifd_send_thread() especially on zero copy.

Now with the new interface, we require the hook to do all the work for
preparing the IOVs to send.  After it's completed, the IOVs should be ready
to be dumped into the specific multifd QIOChannel later.

So now the API looks like:

  p->pages ----------->  send_prepare() -------------> IOVs

This also prepares for the case where the input can be extended to even not
any p->pages.  But that's for later.

This patch will achieve similar goal of what Fabiano used to propose here:

https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de

However the send() interface may not be necessary.  I'm boldly attaching a
"Co-developed-by" for Fabiano.

Co-developed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
452b205702 migration/multifd: multifd_send_prepare_header()
Introduce a helper multifd_send_prepare_header() to setup the header packet
for multifd sender.

It's fine to setup the IOV[0] _before_ send_prepare() because the packet
buffer is already ready, even if the content is to be filled in.

With this helper, we can already slightly clean up the zero copy path.

Note that I explicitly put it into multifd.h, because I want it inlined
directly into multifd*.c where necessary later.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
8a9ef17380 migration/multifd: Move trace_multifd_send|recv()
Move them into fill/unfill of packets.  With that, we can further cleanup
the send/recv thread procedure, and remove one more temp var.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
db7e1cc510 migration/multifd: Move total_normal_pages accounting
Just like the previous patch, move the accounting for total_normal_pages on
both src/dst sides into the packet fill/unfill procedures.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
05b7ec1890 migration/multifd: Rename p->num_packets and clean it up
This field, no matter whether on src or dest, is only used for debugging
purpose.

They can even be removed already, unless it still more or less provide some
accounting on "how many packets are sent/recved for this thread".  The
other more important one is called packet_num, which is embeded in the
multifd packet headers (MultiFDPacket_t).

So let's keep them for now, but make them much easier to understand, by
doing below:

  - Rename both of them to packets_sent / packets_recved, the old
  name (num_packets) are waaay too confusing when we already have
  MultiFDPacket_t.packets_num.

  - Avoid worrying on the "initial packet": we know we will send it, that's
  good enough.  The accounting won't matter a great deal to start with 0 or
  with 1.

  - Move them to where we send/recv the packets.  They're:

    - multifd_send_fill_packet() for senders.
    - multifd_recv_unfill_packet() for receivers.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
83c560fb42 migration/multifd: Drop pages->num check in sender thread
Now with a split SYNC handler, we always have pages->num set for
pending_job==true.  Assert it instead.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
e3cce9af10 migration/multifd: Simplify locking in sender thread
The sender thread will yield the p->mutex before IO starts, trying to not
block the requester thread.  This may be unnecessary lock optimizations,
because the requester can already read pending_job safely even without the
lock, because the requester is currently the only one who can assign a
task.

Drop that lock complication on both sides:

  (1) in the sender thread, always take the mutex until job done
  (2) in the requester thread, check pending_job clear lockless

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
f5f48a7891 migration/multifd: Separate SYNC request with normal jobs
Multifd provide a threaded model for processing jobs.  On sender side,
there can be two kinds of job: (1) a list of pages to send, or (2) a sync
request.

The sync request is a very special kind of job.  It never contains a page
array, but only a multifd packet telling the dest side to synchronize with
sent pages.

Before this patch, both requests use the pending_job field, no matter what
the request is, it will boost pending_job, while multifd sender thread will
decrement it after it finishes one job.

However this should be racy, because SYNC is special in that it needs to
set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request.
Consider a sequence of operations where:

  - migration thread enqueue a job to send some pages, pending_job++ (0->1)

  - [...before the selected multifd sender thread wakes up...]

  - migration thread enqueue another job to sync, pending_job++ (1->2),
    setup p->flags=MULTIFD_FLAG_SYNC

  - multifd sender thread wakes up, found pending_job==2
    - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages
    - send the 2nd packet with flags==0 and no pages

This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done
after all the pages are received.  Meanwhile, the 2nd packet will be
completely useless, which contains zero information.

I didn't verify above, but I think this issue is still benign in that at
least on the recv side we always receive pages before handling
MULTIFD_FLAG_SYNC.  However that's not always guaranteed and just tricky.

One other reason I want to separate it is using p->flags to communicate
between the two threads is also not clearly defined, it's very hard to read
and understand why accessing p->flags is always safe; see the current impl
of multifd_send_thread() where we tried to cache only p->flags.  It doesn't
need to be that complicated.

This patch introduces pending_sync, a separate flag just to show that the
requester needs a sync.  Alongside, we remove the tricky caching of
p->flags now because after this patch p->flags should only be used by
multifd sender thread now, which will be crystal clear.  So it is always
thread safe to access p->flags.

With that, we can also safely convert the pending_job into a boolean,
because we don't support >1 pending jobs anyway.

Always use atomic ops to access both flags to make sure no cache effect.
When at it, drop the initial setting of "pending_job = 0" because it's
always allocated using g_new0().

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
efd8c5439d migration/multifd: Drop MultiFDSendParams.normal[] array
This array is redundant when p->pages exists.  Now we extended the life of
p->pages to the whole period where pending_job is set, it should be safe to
always use p->pages->offset[] rather than p->normal[].  Drop the array.

Alongside, the normal_num is also redundant, which is the same to
p->pages->num.

This doesn't apply to recv side, because there's no extra buffering on recv
side, so p->normal[] array is still needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
836eca47f6 migration/multifd: Postpone reset of MultiFDPages_t
Now we reset MultiFDPages_t object in the multifd sender thread in the
middle of the sending job.  That's not necessary, because the "*pages"
struct will not be reused anyway until pending_job is cleared.

Move that to the end after the job is completed, provide a helper to reset
a "*pages" object.  Use that same helper when free the object too.

This prepares us to keep using p->pages in the follow up patches, where we
may drop p->normal[].

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
15f3f21d59 migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths
Multifd send side has two fields to indicate error quits:

  - MultiFDSendParams.quit
  - &multifd_send_state->exiting

Merge them into the global one.  The replacement is done by changing all
p->quit checks into the global var check.  The global check doesn't need
any lock.

A few more things done on top of this altogether:

  - multifd_send_terminate_threads()

    Moving the xchg() of &multifd_send_state->exiting upper, so as to cover
    the tracepoint, migrate_set_error() and migrate_set_state().

  - multifd_send_sync_main()

    In the 2nd loop, add one more check over the global var to make sure we
    don't keep the looping if QEMU already decided to quit.

  - multifd_tls_outgoing_handshake()

    Use multifd_send_terminate_threads() to set the error state.  That has
    a benefit of updating MigrationState.error to that error too, so we can
    persist that 1st error we hit in that specific channel.

  - multifd_new_send_channel_async()

    Take similar approach like above, drop the migrate_set_error() because
    multifd_send_terminate_threads() already covers that.  Unwrap the helper
    multifd_new_send_channel_cleanup() along the way; not really needed.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
48c0f5d56f migration/multifd: multifd_send_kick_main()
When a multifd sender thread hit errors, it always needs to kick the main
thread by kicking all the semaphores that it can be waiting upon.

Provide a helper for it and deduplicate the code.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Peter Xu
8888a552bf migration/multifd: Drop stale comment for multifd zero copy
We've already done that with multifd_flush_after_each_section, for multifd
in general.  Drop the stale "TODO-like" comment.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
William Roche
06152b89db migration: prevent migration when VM has poisoned memory
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:

Program terminated with signal SIGBUS, Bus error.
#0  _mm256_loadu_si256
#1  buffer_zero_avx2
#2  select_accel_fn
#3  buffer_is_zero
#4  save_zero_page
#5  ram_save_target_page_legacy
#6  ram_save_host_page
#7  ram_find_and_save_block
#8  ram_save_iterate
#9  qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start

To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.

Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:41:58 +08:00
Fabiano Rosas
44d0d456d7 migration: Centralize BH creation and dispatch
Now that the migration state reference counting is correct, further
wrap the bottom half dispatch process to avoid future issues.

Move BH creation and scheduling together and wrap the dispatch with an
intermediary function that will ensure we always keep the ref/unref
balanced.

Also move the responsibility of deleting the BH into the wrapper and
remove the now unnecessary pointers.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
699d9476a0 migration: Add a wrapper to qemu_bh_schedule
Wrap qemu_bh_schedule() to ensure we always hold a reference to the
current_migration object.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
9cf268965d migration: Reference migration state around loadvm_postcopy_handle_run_bh
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use. Even on this
load-side function, we might still use the MigrationState, e.g to
check for capabilities.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
59094cfa7a migration: Take reference to migration state around bg_migration_vm_start_bh
We need to hold a reference to the current_migration object around
async calls to avoid it been freed while still in use.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
27eb8499ed migration: Fix use-after-free of migration state object
We're currently allowing the process_incoming_migration_bh bottom-half
to run without holding a reference to the 'current_migration' object,
which leads to a segmentation fault if the BH is still live after
migration_shutdown() has dropped the last reference to
current_migration.

In my system the bug manifests as migrate_multifd() returning true
when it shouldn't and multifd_load_shutdown() calling
multifd_recv_terminate_threads() which crashes due to an uninitialized
multifd_recv_state.

Fix the issue by holding a reference to the object when scheduling the
BH and dropping it before returning from the BH. The same is already
done for the cleanup_bh at migrate_fd_cleanup_schedule().

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1969
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240119233922.32588-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Fabiano Rosas
0a5d1108ab migration/yank: Use channel features
Stop using outside knowledge about the io channels when registering
yank functions. Query for features instead.

The yank method for all channels used with migration code currently is
to call the qio_channel_shutdown() function, so query for
QIO_CHANNEL_FEATURE_SHUTDOWN. We could add a separate feature in the
future for indicating whether a channel supports yanking, but that
seems overkill at the moment.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20230911171320.24372-9-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Peter Xu
b0504edd40 migration: Drop unnecessary check in ram's pending_exact()
When the migration frameworks fetches the exact pending sizes, it means
this check:

  remaining_size < s->threshold_size

Must have been done already, actually at migration_iteration_run():

    if (must_precopy <= s->threshold_size) {
        qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);

That should be after one round of ram_state_pending_estimate().  It makes
the 2nd check meaningless and can be dropped.

To say it in another way, when reaching ->state_pending_exact(), we
unconditionally sync dirty bits for precopy.

Then we can drop migrate_get_current() there too.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Peter Xu
a8629e0c2f migration: Make threshold_size an uint64_t
It's always used to compare against another uint64_t.  Make it always clear
that it's never a negative.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Markus Armbruster
918f620d30 migration: Plug memory leak on HMP migrate error path
hmp_migrate() leaks @caps when qmp_migrate() fails.  Plug the leak
with g_autoptr().

Fixes: 967f2de5c9 (migration: Implement MigrateChannelList to hmp migration flow.) v8.2.0-rc0
Fixes: CID 1533125
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20240117140722.3979657-1-armbru@redhat.com
[peterx: fix CID number as reported by Peter Maydell]
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Paolo Bonzini
73b4987858 userfaultfd: use 1ULL to build ioctl masks
There is no need to use the Linux-internal __u64 type, 1ULL is
guaranteed to be wide enough.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20240117160313.175609-1-pbonzini@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-29 11:02:12 +08:00
Nick Briggs
44ce1b5d2f migration/rdma: define htonll/ntohll only if not predefined
Solaris has #defines for htonll and ntohll which cause syntax errors
when compiling code that attempts to (re)define these functions..

Signed-off-by: Nick Briggs <nicholas.h.briggs@gmail.com>
Link: https://lore.kernel.org/r/65a04a7d.497ab3.3e7bef1f@gateway.sonic.net
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:10 +08:00
Fabiano Rosas
e3b8ad5c13 migration: Report error in incoming migration
We're not currently reporting the errors set with migrate_set_error()
when incoming migration fails.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-5-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
6074f81625 migration/multifd: Change multifd_pages_init argument
The 'size' argument is actually the number of pages that fit in a
multifd packet. Change it to uint32_t and rename.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
9346fa1870 migration/multifd: Remove QEMUFile from where it is not needed
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
dca1bc7f24 migration/multifd: Remove MultiFDPages_t::packet_num
This was introduced by commit 34c55a94b1 ("migration: Create multipage
support") and never used.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Het Gala
0770ad438d migration: Simplify initial conditionals in migration for better readability
The inital conditional statements in qmp migration functions is harder
to understand than necessary. It is better to get all errors out of
the way in the beginning itself to have better readability and error
handling.

Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231205080039.197615-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Stefan Hajnoczi
a4a411fbaf Replace "iothread lock" with "BQL" in comments
The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
in their names. Update the code comments to use "BQL" instead of
"iothread lock".

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Message-id: 20240102153529.486531-5-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Peter Maydell
c8193acc07 migration 1st pull for 9.0
- We lost Juan and Leo in the maintainers file
 - Steven's suspend state fix
 - Steven's fix for coverity on migrate_mode
 - Avihai's migration cleanup series
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
 Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
 =sfuM
 -----END PGP SIGNATURE-----

Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging

migration 1st pull for 9.0

- We lost Juan and Leo in the maintainers file
- Steven's suspend state fix
- Steven's fix for coverity on migrate_mode
- Avihai's migration cleanup series

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+
# Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D
# =sfuM
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 04 Jan 2024 04:30:07 GMT
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu: (26 commits)
  migration: fix coverity migrate_mode finding
  migration/multifd: Remove unnecessary usage of local Error
  migration: Remove unnecessary usage of local Error
  migration: Fix migration_channel_read_peek() error path
  migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
  migration/multifd: Fix leaking of Error in TLS error flow
  migration/multifd: Simplify multifd_channel_connect() if else statement
  migration/multifd: Fix error message in multifd_recv_initial_packet()
  migration: Remove errp parameter in migration_fd_process_incoming()
  migration: Refactor migration_incoming_setup()
  migration: Remove nulling of hostname in migrate_init()
  migration: Remove migrate_max_downtime() declaration
  tests/qtest: postcopy migration with suspend
  tests/qtest: precopy migration with suspend
  tests/qtest: option to suspend during migration
  tests/qtest: migration events
  migration: preserve suspended for bg_migration
  migration: preserve suspended for snapshot
  migration: preserve suspended runstate
  migration: propagate suspended runstate
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-01-05 13:35:25 +00:00
Steve Sistare
b12635ff08 migration: fix coverity migrate_mode finding
Coverity diagnoses a possible out-of-range array index here ...

    static GSList *migration_blockers[MIG_MODE__MAX];

    fill_source_migration_info() {
        GSList *cur_blocker = migration_blockers[migrate_mode()];

... because it does not know that MIG_MODE__MAX will never be returned as
a migration mode.  To fix, assert so in migrate_mode().

Fixes: fa3673e497 ("migration: per-mode blockers")

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/1699907025-215450-1-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
3fc58efa93 migration/multifd: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in multifd.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-12-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
b6f4c0c788 migration: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in migration.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-11-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
4f8cf323e8 migration: Fix migration_channel_read_peek() error path
migration_channel_read_peek() calls qio_channel_readv_full() and handles
both cases of return value == 0 and return value < 0 the same way, by
calling error_setg() with errp. However, if return value < 0, errp is
already set, so calling error_setg() with errp will lead to an assert.

Fix it by handling these cases separately, calling error_setg() with
errp only in return value == 0 case.

Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-10-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
1d3886f837 migration/multifd: Remove error_setg() in migration_ioc_process_incoming()
If multifd_load_setup() fails in migration_ioc_process_incoming(),
error_setg() is called with errp. This will lead to an assert because in
that case errp already contains an error.

Fix it by removing the redundant error_setg().

Fixes: 6720c2b327 ("migration: check magic value for deciding the mapping of channels")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-9-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
6ae208ce96 migration/multifd: Fix leaking of Error in TLS error flow
If there is an error in multifd TLS handshake task,
multifd_tls_outgoing_handshake() retrieves the error with
qio_task_propagate_error() but never frees it.

Fix it by freeing the obtained Error.

In addition, the error is not reported at all, so report it with
migrate_set_error().

Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-8-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
a4395f5d3c migration/multifd: Simplify multifd_channel_connect() if else statement
The else branch in multifd_channel_connect() is redundant because when
the if branch is taken the function returns.

Simplify the code by removing the else branch.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-7-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c77b40859a migration/multifd: Fix error message in multifd_recv_initial_packet()
In multifd_recv_initial_packet(), if MultiFDInit_t->id is greater than
the configured number of multifd channels, an irrelevant error message
about multifd version is printed.

Change the error message to a relevant one about the channel id.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-6-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
b0cf3bfc69 migration: Remove errp parameter in migration_fd_process_incoming()
Errp parameter in migration_fd_process_incoming() is unused.
Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-5-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c606f3baa2 migration: Refactor migration_incoming_setup()
Commit 6720c2b327 ("migration: check magic value for deciding the
mapping of channels") extracted the only code that could fail in
migration_incoming_setup().

Now migration_incoming_setup() can't fail, so refactor it to return void
and remove errp parameter.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-4-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
80caba6955 migration: Remove nulling of hostname in migrate_init()
MigrationState->hostname is set to NULL in migrate_init(). This is
redundant because it is already freed and set to NULL in
migrade_fd_cleanup(). Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-3-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
17b9483baa migration: Remove migrate_max_downtime() declaration
migrate_max_downtime() has been removed long ago, but its declaration
was mistakenly left. Remove it.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-2-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
49a5020697 migration: preserve suspended for bg_migration
Do not wake a suspended guest during bg_migration, and restore the prior
state at finish rather than unconditionally running.  Allow the additional
state transitions that occur.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-9-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
58b105703e migration: preserve suspended for snapshot
Restoring a snapshot can break a suspended guest.  Snapshots suffer from
the same suspended-state issues that affect live migration, plus they must
handle an additional problematic scenario, which is that a running vm must
remain running if it loads a suspended snapshot.

To save, the existing vm_stop call now completely stops the suspended
state.  Finish with vm_resume to leave the vm in the state it had prior
to the save, correctly restoring the suspended state.

To load, if the snapshot is not suspended, then vm_stop + vm_resume
correctly handles all states, and leaves the vm in the state it had prior
to the load.  However, if the snapshot is suspended, restoration is
trickier.  First, call vm_resume to restore the state to suspended so the
current state matches the saved state.  Then, if the pre-load state is
running, call wakeup to resume running.

Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and
RUN_STATE_RESTORE_VM did not change runstate if the current state was
suspended, but now it does, so allow these transitions.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-8-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
b4e9ddccd1 migration: preserve suspended runstate
A guest that is migrated in the suspended state automaticaly wakes and
continues execution.  This is wrong; the guest should end migration in
the same state it started.  The root cause is that the outgoing migration
code automatically wakes the guest, then saves the RUNNING runstate in
global_state_store(), hence the incoming migration code thinks the guest is
running and continues the guest if autostart is true.

On the outgoing side, delete the call to qemu_system_wakeup_request().
Now that vm_stop completely stops a vm in the suspended state (from the
preceding patches), the existing call to vm_stop_force_state is sufficient
to correctly migrate all vmstate.

On the incoming side, call vm_start if the pre-migration state was running
or suspended.  For the latter, vm_start correctly restores the suspended
state, and a future system_wakeup monitor request will cause the vm to
resume running.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-7-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Steve Sistare
d3c86c99f3 migration: propagate suspended runstate
If the outgoing machine was previously suspended, propagate that to the
incoming side via global_state, so a subsequent vm_start restores the
suspended state.  To maintain backward and forward compatibility, reclaim
some space from the runstate member.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/1704312341-66640-6-git-send-email-steven.sistare@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Richard Henderson
a77ffe9595 migration: Constify VMState
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-67-richard.henderson@linaro.org>
2023-12-30 07:38:06 +11:00
Richard Henderson
2027001919 migration: Make VMStateDescription.subsections const
Allow the array of pointers to itself be const.
Propagate this through the copies of this field.

Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231221031652.119827-2-richard.henderson@linaro.org>
2023-12-29 11:17:30 +11:00
Stefan Hajnoczi
b0839e87af dirtylimit dirtyrate pull request 20231225
Nitpick about an unused parameter
 Please apply, thanks,
 Yong
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEaF0CINwmSCgVLlfC3/Ij1rP+y5wFAmWJVtkWHHlvbmcuaHVh
 bmdAc21hcnR4LmNvbQAKCRDf8iPWs/7LnLexEADjuQ7dtbv2vq1ksqDq6fct++FK
 sGPe6P9UAyJrB0qxW3jKRn2P4E/xiPHpmi+BbOOu8PppeZt86NFlzun681vMpkKu
 d3oAlwRtX8eklk3oub+yBXzkAfsGr+EHEWms/12ATsd6apItxi47zUUw/2aznujz
 TlZR7qOD1kjvPHOVnP07fQ3NYiPIITpzSqVgo3fxE+upmb+dnELIkyGTAvJY2Pjz
 Uh4CtG/sQq8OJJwUn1TFmhB1Ujw9oo8v/lOtIpsxfcKtQcAwyvgmyfndOMZ/h6O3
 ri3axypg8PyGyidYEh0JSoR3OVUIt5eoqPNp+sQTNeZsjTIwKjpBGUOLXbIA56nB
 Z/ZRbxSKj+JRD9qQZ2NODE1QurgEuT51aWkEeBEsx0OtUINcjZkwWeQmiasJQw1T
 4z4Hez0ipuET9MmeTbWkVh9rO3Bsxi9QxiXEiN6YPNqTqLhwomXU+MIlrCsZQzjB
 l4Jij9byeWL9ntx3qwtSWJGHV6Qql16UkPo+c2Co/q2rdx/FqiYrMgCtLee1yo5n
 lSCKmrVMLaLyQv7jnLY8txhU7/CXNnwij4YSgfi7nujOgM6L3uwYc9Tp5kcQHE/b
 o/Mr6oOi79Z+rdByk4pPUKaJV7FnbWMoUgjsqWPJgedGt0K9StRsXCGR1wYMUVdU
 R8Dv0IZyyGuSf6MhNw==
 =Y00D
 -----END PGP SIGNATURE-----

Merge tag 'dirtylimit-dirtyrate-pull-request-20231225' of https://github.com/newfriday/qemu into staging

dirtylimit dirtyrate pull request 20231225

Nitpick about an unused parameter
Please apply, thanks,
Yong

# -----BEGIN PGP SIGNATURE-----
#
# iQJKBAABCAA0FiEEaF0CINwmSCgVLlfC3/Ij1rP+y5wFAmWJVtkWHHlvbmcuaHVh
# bmdAc21hcnR4LmNvbQAKCRDf8iPWs/7LnLexEADjuQ7dtbv2vq1ksqDq6fct++FK
# sGPe6P9UAyJrB0qxW3jKRn2P4E/xiPHpmi+BbOOu8PppeZt86NFlzun681vMpkKu
# d3oAlwRtX8eklk3oub+yBXzkAfsGr+EHEWms/12ATsd6apItxi47zUUw/2aznujz
# TlZR7qOD1kjvPHOVnP07fQ3NYiPIITpzSqVgo3fxE+upmb+dnELIkyGTAvJY2Pjz
# Uh4CtG/sQq8OJJwUn1TFmhB1Ujw9oo8v/lOtIpsxfcKtQcAwyvgmyfndOMZ/h6O3
# ri3axypg8PyGyidYEh0JSoR3OVUIt5eoqPNp+sQTNeZsjTIwKjpBGUOLXbIA56nB
# Z/ZRbxSKj+JRD9qQZ2NODE1QurgEuT51aWkEeBEsx0OtUINcjZkwWeQmiasJQw1T
# 4z4Hez0ipuET9MmeTbWkVh9rO3Bsxi9QxiXEiN6YPNqTqLhwomXU+MIlrCsZQzjB
# l4Jij9byeWL9ntx3qwtSWJGHV6Qql16UkPo+c2Co/q2rdx/FqiYrMgCtLee1yo5n
# lSCKmrVMLaLyQv7jnLY8txhU7/CXNnwij4YSgfi7nujOgM6L3uwYc9Tp5kcQHE/b
# o/Mr6oOi79Z+rdByk4pPUKaJV7FnbWMoUgjsqWPJgedGt0K9StRsXCGR1wYMUVdU
# R8Dv0IZyyGuSf6MhNw==
# =Y00D
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 25 Dec 2023 05:18:01 EST
# gpg:                using RSA key 685D0220DC264828152E57C2DFF223D6B3FECB9C
# gpg:                issuer "yong.huang@smartx.com"
# gpg: Good signature from "Yong Huang <yong.huang@smartx.com>" [unknown]
# gpg: WARNING: The key's User ID is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 685D 0220 DC26 4828 152E  57C2 DFF2 23D6 B3FE CB9C

* tag 'dirtylimit-dirtyrate-pull-request-20231225' of https://github.com/newfriday/qemu:
  migration/dirtyrate: Remove an extra parameter

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-26 06:06:50 -05:00
Wafer
4918712fb1 migration/dirtyrate: Remove an extra parameter
vcpu_dirty_stat_collect() has an unused parameter so remove it.

Signed-off-by: Wafer <wafer@jaguarmicro.com>
Reviewed-by: Hyman Huang <yong.huang@smartx.com>
Message-Id: <20231204012230.4123-1-wafer@jaguarmicro.com>
2023-12-25 18:05:47 +08:00
Stefan Hajnoczi
b49f4755c7 block: remove AioContext locking
This is the big patch that removes
aio_context_acquire()/aio_context_release() from the block layer and
affected block layer users.

There isn't a clean way to split this patch and the reviewers are likely
the same group of people, so I decided to do it in one patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-ID: <20231205182011.1976568-7-stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21 22:49:27 +01:00
Stefan Hajnoczi
019f8c19df Migration patches for rc3:
- One more memleak regression fix from Het
 -----BEGIN PGP SIGNATURE-----
 
 iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl
 ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC
 oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J
 =T0/r
 -----END PGP SIGNATURE-----

Merge tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu into staging

Migration patches for rc3:

- One more memleak regression fix from Het

# -----BEGIN PGP SIGNATURE-----
#
# iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZWoLbRIccGV0ZXJ4QHJl
# ZGhhdC5jb20ACgkQO1/MzfOr1wahYwD+OsD7CaZYjkl9KSooRfblEenD6SdfhAdC
# oZc07f2UxocA/0s1keDBZUUcZOiGYPDFV5his4Jw4F+RRD1YIpVWZg4J
# =T0/r
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 01 Dec 2023 11:35:57 EST
# gpg:                using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706
# gpg:                issuer "peterx@redhat.com"
# gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [full]
# gpg:                 aka "Peter Xu <peterx@redhat.com>" [full]
# Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D  D1A9 3B5F CCCD F3AB D706

* tag 'migration-20231201-pull-request' of https://github.com/xzpeter/qemu:
  migration: Plug memory leak with migration URIs

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-04 08:01:24 -05:00
Het Gala
bc1d54ee51 migration: Plug memory leak with migration URIs
migrate_uri_parse() allocates memory to 'channel' if the user
opts for old syntax - uri, which is leaked because there is no
code for freeing 'channel'.
So, free channel to avoid memory leak in case where 'channels'
is empty and uri parsing is required.

Fixes: 5994024f ("migration: Implement MigrateChannelList to qmp migration flow")
Signed-off-by: Het Gala <het.gala@nutanix.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Tested-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Link: https://lore.kernel.org/r/20231129204301.131228-1-het.gala@nutanix.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2023-12-01 11:01:28 -05:00
Zongmin Zhou
41581265aa migration: free 'saddr' since be no longer used
Since socket_parse() will allocate memory for 'saddr',and its value
will pass to 'addr' that allocated by migrate_uri_parse(),
then 'saddr' will no longer used,need to free.
But due to 'saddr->u' is shallow copying the contents of the union,
the members of this union containing allocated strings,and will be used after that.
So just free 'saddr' itself without doing a deep free on the contents of the SocketAddress.

Fixes: 72a8192e22 ("migration: convert migration 'uri' into 'MigrateAddress'")
Signed-off-by: Zongmin Zhou<zhouzongmin@kylinos.cn>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231120031428.908295-1-zhouzongmin@kylinos.cn>
2023-11-30 09:51:24 +01:00
Fabiano Rosas
0a08c7947b migration/multifd: Stop setting p->ioc before connecting
This is being shadowed but the assignments at
multifd_channel_connect() and multifd_tls_channel_connect() .

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231110200241.20679-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-11-30 09:50:10 +01:00
Michael Tokarev
e3fc69343c migration/rdma.c: spelling fix: asume
Fixes: 67c31c9c1a "migration: Don't abuse qemu_file transferred for RDMA"
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-11-15 11:59:54 +03:00
Kevin Wolf
f5a3a270fe block: Mark bdrv_filter_bs() and callers GRAPH_RDLOCK
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_filter_bs() need to hold a reader lock for the graph because
it calls bdrv_filter_child(), which accesses bs->file/backing.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-4-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-07 19:14:19 +01:00
Stefan Hajnoczi
bb59f3548f vfio queue:
* Support for non 64b IOVA space
 * Introduction of a PCIIOMMUOps callback structure to ease future
   extensions
 * Fix for a buffer overrun when writing the VF token
 * PPC cleanups preparing ground for IOMMUFD support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
 7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
 91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
 a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
 sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
 cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
 Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
 FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
 LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
 YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
 Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
 Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
 =/Du0
 -----END PGP SIGNATURE-----

Merge tag 'pull-vfio-20231106' of https://github.com/legoater/qemu into staging

vfio queue:

* Support for non 64b IOVA space
* Introduction of a PCIIOMMUOps callback structure to ease future
  extensions
* Fix for a buffer overrun when writing the VF token
* PPC cleanups preparing ground for IOMMUFD support

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmVI+bIACgkQUaNDx8/7
# 7KHW4g/9FmgX0k2Elm1BAul3slJtuBT8/iHKfK19rhXICxhxS5xBWJA8FmosTWAT
# 91YqQJhOHARxLd9VROfv8Fq8sAo+Ys8bP3PTXh5satjY5gR9YtmMSVqvsAVLn7lv
# a/0xp7wPJt2UeKzvRNUqFXNr7yHPwxFxbJbmmAJbNte8p+TfE2qvojbJnu7BjJbg
# sTtS/vFWNJwtuNYTkMRoiZaUKEoEZ8LnslOqKUjgeO59g4i3Dq8e2JCmHANPFWUK
# cWmr7AqcXgXEnLSDWTtfN53bjcSCYkFVb4WV4Wv1/7hUF5jQ4UR0l3B64xWe0M3/
# Prak3bWOM/o7JwLBsgaWPngXA9V0WFBTXVF4x5qTwhuR1sSV8MxUvTKxI+qqiEzA
# FjU89oSZ+zXId/hEUuTL6vn1Th8/6mwD0L9ORchNOQUKzCjBzI4MVPB09nM3AdPC
# LGThlufsZktdoU2KjMHpc+gMIXQYsxkgvm07K5iZTZ5eJ4tV5KB0aPvTZppGUxe1
# YY9og9F3hxjDHQtEuSY2rzBQI7nrUpd1ZI5ut/3ZgDWkqD6aGRtMme4n4GsGsYb2
# Ht9+d2RL9S8uPUh+7rV8K/N3+vXgXRaEYTuAScKtflEbA7YnZA5nUdMng8x0kMTQ
# Y73XCd4UGWDfSSZsgaIHGkM/MRIHgmlrfcwPkWqWW9vF+92O6Hw=
# =/Du0
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 06 Nov 2023 22:35:30 HKT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [unknown]
# gpg:                 aka "Cédric Le Goater <clg@kaod.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* tag 'pull-vfio-20231106' of https://github.com/legoater/qemu: (22 commits)
  vfio/common: Move vfio_host_win_add/del into spapr.c
  vfio/spapr: Make vfio_spapr_create/remove_window static
  vfio/container: Move spapr specific init/deinit into spapr.c
  vfio/container: Move vfio_container_add/del_section_window into spapr.c
  vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
  util/uuid: Define UUID_STR_LEN from UUID_NONE string
  util/uuid: Remove UUID_FMT_LEN
  vfio/pci: Fix buffer overrun when writing the VF token
  util/uuid: Add UUID_STR_LEN definition
  hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  test: Add some tests for range and resv-mem helpers
  virtio-iommu: Consolidate host reserved regions and property set ones
  virtio-iommu: Implement set_iova_ranges() callback
  virtio-iommu: Record whether a probe request has been issued
  range: Introduce range_inverse_array()
  virtio-iommu: Introduce per IOMMUDevice reserved regions
  util/reserved-region: Add new ReservedRegion helpers
  range: Make range_compare() public
  virtio-iommu: Rename reserved_regions into prop_resv_regions
  vfio: Collect container iova range info
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-11-07 09:41:52 +08:00
Juan Quintela
0983125b40 migration: Unlock mutex in error case
We were not unlocking bitmap mutex on the error case.  To fix it
forever change to enclose the code with WITH_QEMU_LOCK_GUARD().
Coverity CID 1523750.

Fixes: a2326705e5 ("migration: Stop migration immediately in RDMA error paths")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231103074245.55166-1-quintela@redhat.com>
2023-11-03 10:48:37 +01:00
Cédric Le Goater
721da0396c util/uuid: Add UUID_STR_LEN definition
qemu_uuid_unparse() includes a trailing NUL when writing the uuid
string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a
define for this size and use it where required.

Cc: Fam Zheng <fam@euphon.net>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: "Denis V. Lunev" <den@openvz.org>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03 09:20:31 +01:00
Het Gala
967f2de5c9 migration: Implement MigrateChannelList to hmp migration flow.
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for hmp migration.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231023182053.8711-14-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-11-02 11:35:04 +01:00
Het Gala
5994024fd1 migration: Implement MigrateChannelList to qmp migration flow.
Integrate MigrateChannelList with all transport backends
(socket, exec and rdma) for both src and dest migration
endpoints for qmp migration.

For current series, limit the size of MigrateChannelList
to single element (single interface) as runtime check.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-13-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
d95533e1cd migration: modify migration_channels_and_uri_compatible() for new QAPI syntax
migration_channels_and_uri_compatible() check for transport mechanism
suitable for multifd migration gets executed when the caller calls old
uri syntax. It needs it to be run when using the modern MigrateChannel
QAPI syntax too.

After URI -> 'MigrateChannel' :
migration_channels_and_uri_compatible() ->
migration_channels_and_transport_compatible() passes object as argument
and check for valid transport mechanism.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-12-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
074dbce5fc migration: New migrate and migrate-incoming argument 'channels'
MigrateChannelList allows to connect accross multiple interfaces.
Add MigrateChannelList struct as argument to migration QAPIs.

We plan to include multiple channels in future, to connnect
multiple interfaces. Hence, we choose 'MigrateChannelList'
as the new argument over 'MigrateChannel' to make migration
QAPIs future proof.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-10-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Fabiano Rosas
02afba63e9 migration: Convert the file backend to the new QAPI syntax
Convert the file: URI to accept a FileMigrationArgs to be compatible
with the new migration QAPI.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-9-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
cbab4face5 migration: convert exec backend to accept MigrateAddress.
Exec transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for exec connection into strList struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-8-farosas@suse.de>
2023-11-02 11:35:04 +01:00
Het Gala
3fa9642ff7 migration: convert rdma backend to accept MigrateAddress
RDMA based transport backend for 'migrate'/'migrate-incoming' QAPIs
accept new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for RDMA connection into well defined InetSocketAddress struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-7-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Het Gala
34dfc5e407 migration: convert socket backend to accept MigrateAddress
Socket transport backend for 'migrate'/'migrate-incoming' QAPIs accept
new wire protocol of MigrateAddress struct.

It is achived by parsing 'uri' string and storing migration parameters
required for socket connection into well defined SocketAddress struct.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-6-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Het Gala
72a8192e22 migration: convert migration 'uri' into 'MigrateAddress'
This patch parses 'migrate' and 'migrate-incoming' QAPI's 'uri'
string containing migration connection related information
and stores them inside well defined 'MigrateAddress' struct.

Fabiano fixed for "file" transport.

Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com>
Signed-off-by: Het Gala <het.gala@nutanix.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231023182053.8711-4-farosas@suse.de>
Message-ID: <20231023182053.8711-5-farosas@suse.de>
2023-11-02 11:35:03 +01:00
Peter Xu
88577f3242 migration: Change ram_dirty_bitmap_reload() retval to bool
Now we have a Error** passed into the return path thread stack, which is
even clearer than an int retval.  Change ram_dirty_bitmap_reload() and the
callers to use a bool instead to replace errnos.

Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-5-peterx@redhat.com>
2023-11-02 11:35:03 +01:00
Peter Xu
f8c543e808 migration: Allow network to fail even during recovery
Normally the postcopy recover phase should only exist for a super short
period, that's the duration when QEMU is trying to recover from an
interrupted postcopy migration, during which handshake will be carried out
for continuing the procedure with state changes from PAUSED -> RECOVER ->
POSTCOPY_ACTIVE again.

Here RECOVER phase should be super small, that happens right after the
admin specified a new but working network link for QEMU to reconnect to
dest QEMU.

However there can still be case where the channel is broken in this small
RECOVER window.

If it happens, with current code there's no way the src QEMU can got kicked
out of RECOVER stage. No way either to retry the recover in another channel
when established.

This patch allows the RECOVER phase to fail itself too - we're mostly
ready, just some small things missing, e.g. properly kick the main
migration thread out when sleeping on rp_sem when we found that we're at
RECOVER stage.  When this happens, it fails the RECOVER itself, and
rollback to PAUSED stage.  Then the user can retry another round of
recovery.

To make it even stronger, teach QMP command migrate-pause to explicitly
kick src/dst QEMU out when needed, so even if for some reason the migration
thread didn't got kicked out already by a failing rethrn-path thread, the
admin can also kick it out.

This will be an super, super corner case, but still try to cover that.

One can try to test this with two proxy channels for migration:

  (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000
  (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock

So the migration channel will be:

                      (a)          (b)
  src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst

Then to make QEMU hang at RECOVER stage, one can do below:

  (1) stop the postcopy using QMP command postcopy-pause
  (2) kill the 2nd proxy (b)
  (3) try to recover the postcopy using /tmp/src.sock on src
  (4) src QEMU will go into RECOVER stage but won't be able to continue
      from there, because the channel is actually broken at (b)

Before this patch, step (4) will make src QEMU stuck in RECOVER stage,
without a way to kick the QEMU out or continue the postcopy again.  After
this patch, (4) will quickly fail qemu and bounce back to PAUSED stage.

Admin can also kick QEMU from (4) into PAUSED when needed using
migrate-pause when needed.

After bouncing back to PAUSED stage, one can recover again.

Reported-by: Xiaohui Li <xiaohli@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-3-peterx@redhat.com>
2023-11-02 11:35:03 +01:00
Peter Xu
7aa6070d09 migration: Refactor error handling in source return path
rp_state.error was a boolean used to show error happened in return path
thread.  That's not only duplicating error reporting (migrate_set_error),
but also not good enough in that we only do error_report() and set it to
true, we never can keep a history of the exact error and show it in
query-migrate.

To make this better, a few things done:

  - Use error_setg() rather than error_report() across the whole lifecycle
    of return path thread, keeping the error in an Error*.

  - With above, no need to have mark_source_rp_bad(), remove it, alongside
    with rp_state.error itself.

  - Use migrate_set_error() to apply that captured error to the global
    migration object when error occured in this thread.

  - Do the same when detected qemufile error in source return path

We need to re-export qemu_file_get_error_obj() to do the last one.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231017202633.296756-2-peterx@redhat.com>
2023-11-02 11:35:03 +01:00
Steve Sistare
fa3673e497 migration: per-mode blockers
Extend the blocker interface so that a blocker can be registered for
one or more migration modes.  The existing interfaces register a
blocker for all modes, and the new interfaces take a varargs list
of modes.

Internally, maintain a separate blocker list per mode.  The same Error
object may be added to multiple lists.  When a block is deleted, it is
removed from every list, and the Error is freed.

No functional change until a new mode is added.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-3-git-send-email-steven.sistare@oracle.com>
2023-11-01 16:13:59 +01:00
Steve Sistare
eea1e5c9d6 migration: mode parameter
Create a mode migration parameter that can be used to select alternate
migration algorithms.  The default mode is normal, representing the
current migration algorithm, and does not need to be explicitly set.

No functional change until a new mode is added, except that the mode is
shown by the 'info migrate' command.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <1698263069-406971-2-git-send-email-steven.sistare@oracle.com>
2023-11-01 16:13:58 +01:00
Peter Xu
3e5f3bcdc2 migration: Add tracepoints for downtime checkpoints
This patch is inspired by Joao Martin's patch here:

https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com

Add tracepoints for major downtime checkpoints on both src and dst.  They
share the same tracepoint with a string showing its stage.

Besides the checkpoints in the previous patch, this patch also added
destination checkpoints.

On src, we have these checkpoints added:

  - src-downtime-start: right before vm stops on src
  - src-vm-stopped: after vm is fully stopped
  - src-iterable-saved: after all iterables saved (END sections)
  - src-non-iterable-saved: after all non-iterable saved (FULL sections)
  - src-downtime-stop: migration fully completed

On dst, we have these checkpoints added:

  - dst-precopy-loadvm-completes: after loadvm all done for precopy
  - dst-precopy-bh-*: record BH steps to resume VM for precopy
  - dst-postcopy-bh-*: record BH steps to resume VM for postcopy

On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now.  We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.

With this patch, one can enable "vmstate_downtime*" tracepoints and it'll
enable all tracepoints for downtime measurements necessary.

Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they
service the same purpose, which was only for postcopy.  We then have
unified prefix for all downtime relevant tracepoints.

Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-6-peterx@redhat.com>
2023-11-01 16:13:58 +01:00
Peter Xu
93bdf888fa migration: migration_stop_vm() helper
Provide a helper for non-COLO use case of migration to stop a VM.  This
prepares for adding some downtime relevant tracepoints to migration, where
they may or may not apply to COLO.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-5-peterx@redhat.com>
2023-11-01 16:13:58 +01:00
Peter Xu
3c80f14272 migration: Add per vmstate downtime tracepoints
We have a bunch of savevm_section* tracepoints, they're good to analyze
migration stream, but not always suitable if someone would like to analyze
the migration downtime.  Two major problems:

  - savevm_section* tracepoints are dumping all sections, we only care
    about the sections that contribute to the downtime

  - They don't have an identifier to show the type of sections, so no way
    to filter downtime information either easily.

We can add type into the tracepoints, but instead of doing so, this patch
kept them untouched, instead of adding a bunch of downtime specific
tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
full picture of how the downtime is distributed across iterative and
non-iterative vmstate save/load.

Note that here both save() and load() need to be traced, because both of
them may contribute to the downtime.  The contribution is not a simple "add
them together", though: consider when the src is doing a save() of device1
while the dest can be load()ing for device2, so they can happen
concurrently.

Tracking both sides make sense because device load() and save() can be
imbalanced, one device can save() super fast, but load() super slow, vice
versa.  We can't figure that out without tracing both.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-4-peterx@redhat.com>
2023-11-01 16:13:58 +01:00
Peter Xu
e22ffad03a migration: Add migration_downtime_start|end() helpers
Unify the three users on recording downtimes with the same pair of helpers.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-3-peterx@redhat.com>
2023-11-01 16:13:58 +01:00
Peter Xu
62f5da7dd1 migration: Set downtime_start even for postcopy
Postcopy calculates its downtime separately.  It always sets
MigrationState.downtime properly, but not MigrationState.downtime_start.

Make postcopy do the same as other modes on properly recording the
timestamp when the VM is going to be stopped.  Drop the temporary variable
in postcopy_start() along the way.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231030163346.765724-2-peterx@redhat.com>
2023-11-01 16:13:58 +01:00
Peter Xu
caa91b3c44 migration: Check in savevm_state_handler_insert for dups
Before finally register one SaveStateEntry, we detect for duplicated
entries.  This could be helpful to notify us asap instead of get
silent migration failures which could be hard to diagnose.

For example, this patch will generate a message like this (if without
previous fixes on x2apic) as long as we wants to boot a VM instance
with "-smp 200,maxcpus=288,sockets=2,cores=72,threads=2" and QEMU will
bail out even before VM starts:

savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=apic, instance_id=0x0

Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-10-quintela@redhat.com>
2023-11-01 16:13:58 +01:00
Juan Quintela
485fb95546 migration: Hack to maintain backwards compatibility for ppc
Current code does:
- register pre_2_10_vmstate_dummy_icp with "icp/server" and instance
  dependinfg on cpu number
- for newer machines, it register vmstate_icp with "icp/server" name
  and instance 0
- now it unregisters "icp/server" for the 1st instance.

This is wrong at many levels:
- we shouldn't have two VMSTATEDescriptions with the same name
- In case this is the only solution that we can came with, it needs to
  be:
  * register pre_2_10_vmstate_dummy_icp
  * unregister pre_2_10_vmstate_dummy_icp
  * register real vmstate_icp

Created vmstate_replace_hack_for_ppc() with warnings left and right
that it is a hack.

CC: Cedric Le Goater <clg@kaod.org>
CC: Daniel Henrique Barboza <danielhb413@gmail.com>
CC: David Gibson <david@gibson.dropbear.id.au>
CC: Greg Kurz <groug@kaod.org>

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231020090731.28701-8-quintela@redhat.com>
2023-11-01 16:13:58 +01:00
Juan Quintela
be07a0ed22 qemu-file: Make qemu_fflush() return errors
This let us simplify code of this shape.

   qemu_fflush(f);
   int ret = qemu_file_get_error(f);
   if (ret) {
      return ret;
   }

into:

   int ret = qemu_fflush(f);
   if (ret) {
      return ret;
   }

I updated all callers where there is any error check.
qemu_fclose() don't need to check for f->last_error because
qemu_fflush() returns it at the beggining of the function.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Message-ID: <20231025091117.6342-13-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-10-31 08:44:33 +01:00
Juan Quintela
0f8596180a migration: Remove transferred atomic counter
After last commit, it is a write only variable.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-12-quintela@redhat.com>
2023-10-31 08:44:33 +01:00
Juan Quintela
897fd8bdce migration: Use migration_transferred_bytes()
There are only two differnces with the old value:

- the amount of QEMUFile that hasn't yet been flushed.  It can be
  discussed what is more exact, the new or the old one.
- the amount of transferred bytes that we forgot to account for (the
  newer is better, i.e. exact).

Notice that this two values are used to:
a - present to the user
b - calculate the rate_limit

So a few KB here and there is not going to make a difference.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-11-quintela@redhat.com>
2023-10-31 08:44:33 +01:00
Juan Quintela
fc55cf318a qemu-file: Simplify qemu_file_get_error()
If we pass a NULL error is the same that returning directly the value.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-10-quintela@redhat.com>
2023-10-31 08:44:33 +01:00
Juan Quintela
0743f41fd2 migration: migration_rate_limit_reset() don't need the QEMUFile
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-9-quintela@redhat.com>
2023-10-31 08:44:33 +01:00