Commit Graph

155 Commits

Author SHA1 Message Date
Peter Xu
8888a552bf migration/multifd: Drop stale comment for multifd zero copy
We've already done that with multifd_flush_after_each_section, for multifd
in general.  Drop the stale "TODO-like" comment.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05 14:42:10 +08:00
Fabiano Rosas
6074f81625 migration/multifd: Change multifd_pages_init argument
The 'size' argument is actually the number of pages that fit in a
multifd packet. Change it to uint32_t and rename.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-4-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
9346fa1870 migration/multifd: Remove QEMUFile from where it is not needed
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Fabiano Rosas
dca1bc7f24 migration/multifd: Remove MultiFDPages_t::packet_num
This was introduced by commit 34c55a94b1 ("migration: Create multipage
support") and never used.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-16 11:16:09 +08:00
Avihai Horon
3fc58efa93 migration/multifd: Remove unnecessary usage of local Error
According to Error API, usage of ERRP_GUARD() or a local Error instead
of errp is needed if errp is passed to void functions, where it is later
dereferenced to see if an error occurred.

There are several places in multifd.c that use local Error although it
is not needed. Change these places to use errp directly.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-12-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
6ae208ce96 migration/multifd: Fix leaking of Error in TLS error flow
If there is an error in multifd TLS handshake task,
multifd_tls_outgoing_handshake() retrieves the error with
qio_task_propagate_error() but never frees it.

Fix it by freeing the obtained Error.

In addition, the error is not reported at all, so report it with
migrate_set_error().

Fixes: 2964714015 ("migration/tls: add support for multifd tls-handshake")
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-8-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
a4395f5d3c migration/multifd: Simplify multifd_channel_connect() if else statement
The else branch in multifd_channel_connect() is redundant because when
the if branch is taken the function returns.

Simplify the code by removing the else branch.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20231231093016.14204-7-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Avihai Horon
c77b40859a migration/multifd: Fix error message in multifd_recv_initial_packet()
In multifd_recv_initial_packet(), if MultiFDInit_t->id is greater than
the configured number of multifd channels, an irrelevant error message
about multifd version is printed.

Change the error message to a relevant one about the channel id.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20231231093016.14204-6-avihaih@nvidia.com
Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-04 09:52:42 +08:00
Fabiano Rosas
0a08c7947b migration/multifd: Stop setting p->ioc before connecting
This is being shadowed but the assignments at
multifd_channel_connect() and multifd_tls_channel_connect() .

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Message-ID: <20231110200241.20679-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-11-30 09:50:10 +01:00
Juan Quintela
0f8596180a migration: Remove transferred atomic counter
After last commit, it is a write only variable.

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231025091117.6342-12-quintela@redhat.com>
2023-10-31 08:44:33 +01:00
Fabiano Rosas
175e63c982 migration/multifd: Stop checking p->quit in multifd_send_thread
We don't need to check p->quit in the multifd_send_thread() because it
is shadowed by the 'exiting' flag. Ever since that flag was added
p->quit became obsolete as a way to stop the thread.

Since p->quit is set at multifd_send_terminate_threads() under the
p->mutex lock, the thread will only see it once it loops, so 'exiting'
will always be seen first.

Note that setting p->quit at multifd_send_terminate_threads() still
makes sense because we need a way to inform multifd_send_pages() that
the channel has stopped.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012140651.13122-3-farosas@suse.de>
2023-10-20 08:51:41 +02:00
Fabiano Rosas
967e388987 migration/multifd: Clarify Error usage in multifd_channel_connect
The function is currently called from two sites, one always gives it a
NULL Error and the other always gives it a non-NULL Error.

In the non-NULL case, all it does it trace the error and return. One
of the callers already have tracing, add a tracepoint to the other and
stop passing the error into the function.

Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-4-farosas@suse.de>
2023-10-17 09:25:14 +02:00
Fabiano Rosas
ee8a7c9c46 migration/multifd: Unify multifd_send_thread error paths
The preferred usage of the Error type is to always set both the return
code and the error when a failure happens. As all code called from the
send thread follows this pattern, we'll always have the return code
and the error set at the same time.

Aside from the convention, in this piece of code this must be the
case, otherwise the if (ret != 0) would be exiting the thread without
calling multifd_send_terminate_threads() which is incorrect.

Unify both paths to make it clear that both are taken when there's an
error.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-3-farosas@suse.de>
2023-10-17 09:25:14 +02:00
Fabiano Rosas
0e92f64448 migration/multifd: Remove direct "socket" references
We're about to enable support for other transports in multifd, so
remove direct references to sockets.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231012134343.23757-2-farosas@suse.de>
2023-10-17 09:25:14 +02:00
Elena Ufimtseva
1618f55221 multifd: reset next_packet_len after sending pages
Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-5-elena.ufimtseva@oracle.com>
2023-10-17 09:25:13 +02:00
Elena Ufimtseva
68b6e00048 multifd: fix counters in multifd_send_thread
Previous commit cbec7eb768
"migration/multifd: Compute transferred bytes correctly"
removed accounting for packet_len in non-rdma
case, but the next_packet_size only accounts for pages, not for
the header packet (normal_pages * PAGE_SIZE) that is being sent
as iov[0]. The packet_len part should be added to account for
the size of MultiFDPacket and the array of the offsets.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231011184358.97349-4-elena.ufimtseva@oracle.com>
2023-10-17 09:25:13 +02:00
Fabiano Rosas
788fa68041 migration/multifd: Rename threadinfo.c functions
We're about to add more functions to this file so make it use the same
coding style as the rest of the code.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230607161306.31425-2-farosas@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-07-26 10:55:56 +02:00
Michael Tokarev
d8b71d96b3 migration: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-07-25 17:13:20 +03:00
Philippe Mathieu-Daudé
7d5b0d6864 bulk: Remove pointless QOM casts
Mechanical change running Coccinelle spatch with content
generated from the qom-cast-macro-clean-cocci-gen.py added
in the previous commit.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230601093452.38972-3-philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05 20:48:34 +02:00
Juan Quintela
cbec7eb768 migration/multifd: Compute transferred bytes correctly
In the past, we had to put the in the main thread all the operations
related with sizes due to qemu_file not beeing thread safe.  As now
all counters are atomic, we can update the counters just after the
do the write.  As an aditional bonus, we are able to use the right
value for the compression methods.  Right now we were assuming that
there were no compression at all.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230515195709.63843-17-quintela@redhat.com>
2023-05-18 18:41:46 +02:00
Juan Quintela
bd7ceaf6d5 migration: We don't need the field rate_limit_used anymore
Since previous commit, we calculate how much data we have send with
migration_transferred_bytes() so no need to maintain this counter and
remember to always update it.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20230515195709.63843-10-quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Juan Quintela
e1fde0e038 migration: Move rate_limit_max and rate_limit_used to migration_stats
These way we can make them atomic and use this functions from any
place.  I also moved all functions that use rate_limit to
migration-stats.

Functions got renamed, they are not qemu_file anymore.

qemu_file_rate_limit -> migration_rate_exceeded
qemu_file_set_rate_limit -> migration_rate_set
qemu_file_get_rate_limit -> migration_rate_get
qemu_file_reset_rate_limit -> migration_rate_reset
qemu_file_acct_rate_limit -> migration_rate_account.

Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20230515195709.63843-6-quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-18 18:40:51 +02:00
Lukas Straub
5d1d1fcf43 multifd: Add the ramblock to MultiFDRecvParams
This will be used in the next commits to add colo support to multifd.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-05-10 18:48:11 +02:00
Juan Quintela
0deb7e9b6c migration: Drop unused parameter for migration_tls_client_create()
It is not needed since we moved the accessor for tls properties to
options.c.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-05-03 11:24:20 +02:00
Juan Quintela
aff3f6606d migration: Rename ram_counters to mig_stats
migration_stats is just too long, and it is going to have more than
ram counters in the near future.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:20 +02:00
Juan Quintela
947701cc1a migration: Move ram_stats to its own file migration-stats.[ch]
There is already include/qemu/stats.h, so stats.h was a bad idea.
We want this file to not depend on anything else, we will move all the
migration counters/stats to this struct.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Juan Quintela
e232199aad multifd: We already account for this packet on the multifd thread
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
2023-05-03 11:24:19 +02:00
Juan Quintela
d2026ee117 multifd: Fix the number of channels ready
We don't wait in the sem when we are doing a sync_main.  Make it wait
there.  To make things clearer, we mark the channel ready at the
begining of the thread loop.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
2023-04-27 16:37:28 +02:00
Juan Quintela
b4bc342c76 migration: Move migrate_use_zero_copy_send() to options.c
Once that we are there, we rename the function to
migrate_zero_copy_send() to be consistent with all other capabilities.

We can remove the CONFIG_LINUX guard.  We already check that we can't
setup this capability in migrate_caps_check().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
51b07548f7 migration: Move migrate_use_multifd() to options.c
Once that we are there, we rename the function to migrate_multifd()
to be consistent with all other capabilities.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-04-24 15:01:46 +02:00
Juan Quintela
4291823694 migration: Make dirty_sync_missed_zero_copy atomic
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2023-04-24 11:28:57 +02:00
Juan Quintela
cf671116fa migration: Make multifd_bytes atomic
In the spirit of:

commit 394d323bc3451e4d07f13341cb8817fac8dfbadd
Author: Peter Xu <peterx@redhat.com>
Date:   Tue Oct 11 17:55:51 2022 -0400

    migration: Use atomic ops properly for page accountings

Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:28:57 +02:00
Juan Quintela
30fb22cda4 migration: Update atomic stats out of the mutex
Reviewed-by: David Edmondson <david.edmondson@oracle.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-04-24 11:28:56 +02:00
Juan Quintela
abce5fa16d migration: Merge ram_counters and ram_atomic_counters
Using MgrationStats as type for ram_counters mean that we didn't have
to re-declare each value in another struct. The need of atomic
counters have make us to create MigrationAtomicStats for this atomic
counters.

Create RAMStats type which is a merge of MigrationStats and
MigrationAtomicStats removing unused members.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>

---

Fix typos found by David Edmondson
2023-04-24 11:28:56 +02:00
Wei Wang
ff1585d1d8 migration/multifd: correct multifd_send_thread to trace the flags
The p->flags could be updated via the send_prepare callback, e.g. OR-ed
with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the
local "flags" before the send_prepare callback could only get partial of
p->flags. Fix it by moving the assignment of p->flags to the local flags
after the callback, so that the correct flags can be traced.

Fixes: ab7cbb0b9a ("multifd: Make no compression operations into its own structure")
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-03-16 16:07:07 +01:00
Leonardo Bras
cfc3bcf373 migration/multifd: Move load_cleanup inside incoming_state_destroy
Currently running migration_incoming_state_destroy() without first running
multifd_load_cleanup() will cause a yank error:

qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.
(core dumped)

The above error happens in the target host, when multifd is being used
for precopy, and then postcopy is triggered and the migration finishes.
This will crash the VM in the target host.

To avoid that, move multifd_load_cleanup() inside
migration_incoming_state_destroy(), so that the load cleanup becomes part
of the incoming state destroying process.

Running multifd_load_cleanup() twice can become an issue, though, but the
only scenario it could be ran twice is on process_incoming_migration_bh().
So removing this extra call is necessary.

On the other hand, this multifd_load_cleanup() call happens way before the
migration_incoming_state_destroy() and having this happening before
dirty_bitmap_mig_before_vm_start() and vm_start() may be a need.

So introduce a new function multifd_load_shutdown() that will mainly stop
all multifd threads and close their QIOChannels. Then use this function
instead of multifd_load_cleanup() to make sure nothing else is received
before dirty_bitmap_mig_before_vm_start().

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13 03:45:40 +01:00
Leonardo Bras
10351fbad1 migration/multifd: Join all multifd threads in order to avoid leaks
Current approach will only join threads that are still running.

For the threads not joined, resources or private memory are always kept in
the process space and never reclaimed before process end, and this risks
serious memory leaks.

This should usually not represent a big problem, since multifd migration
is usually just ran at most a few times, and after it succeeds there is
not much to be done before exiting the process.

Yet still, it should not hurt performance to join all of them.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13 03:45:34 +01:00
Leonardo Bras
d926f3bb2a migration/multifd: Remove unnecessary assignment on multifd_load_cleanup()
Before assigning "p->quit = true" for every multifd channel,
multifd_load_cleanup() will call multifd_recv_terminate_threads() which
already does the same assignment, while protected by a mutex.

So there is no point doing the same assignment again.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13 03:45:28 +01:00
Leonardo Bras
e5bac1f525 migration/multifd: Change multifd_load_cleanup() signature and usage
Since it's introduction in commit f986c3d256 ("migration: Create multifd
migration threads"), multifd_load_cleanup() never returned any value
different than 0, neither set up any error on errp.

Even though, on process_incoming_migration_bh() an if clause uses it's
return value to decide on setting autostart = false, which will never
happen.

In order to simplify the codebase, change multifd_load_cleanup() signature
to 'void multifd_load_cleanup(void)', and for every usage remove error
handling or decision made based on return value != 0.

Fixes: b5eea99ec2 ("migration: Add yank feature")
Reported-by: Li Xiaohui <xiaohli@redhat.com>
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-13 03:44:44 +01:00
Peter Xu
d6f74fd12e migration: Rework multi-channel checks on URI
The whole idea of multi-channel checks was not properly done, IMHO.

Currently we check multi-channel in a lot of places, but actually that's
not needed because we only need to check it right after we get the URI and
that should be it.

If the URI check succeeded, we should never need to check it again because
we must have it.  If it check fails, we should fail immediately on either
the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after
the connection established.

Neither should we fail any set capabiliities like what we used to do here:

5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19)

Because logically the URI will only be set later after the capability is
set, so it doesn't make a lot of sense to check the URI type when setting
the capability, because we're checking the cap with an old URI passed in,
and that may not even be the URI we're going to use later.

This patch mostly reverted all such checks for before, dropping the
variable migrate_allow_multi_channels and helpers.  Instead, add a common
helper to check URI for multi-channels for either qmp_migrate and
qmp_migrate_incoming and that should do all the proper checks.  The failure
will only trigger with the "migrate" or "migrate_incoming" command, or when
user specified "-incoming xxx" where "xxx" is not "defer".

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11 16:51:09 +01:00
Li Zhang
bca762c2b9 multifd: Remove some redundant code
Clean up some unnecessary code

Signed-off-by: Li Zhang <lizhang@suse.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11 16:51:09 +01:00
Li Zhang
e3f37b2ce6 multifd: cleanup the function multifd_channel_connect
Cleanup multifd_channel_connect

Signed-off-by: Li Zhang <lizhang@suse.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-11 16:51:09 +01:00
Jiang Jiacheng
1b1f4ab69c migration: save/delete migration thread info
To support query migration thread infomation, save and delete
thread(live_migration and multifdsend) information at thread
creation and finish.

Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:57 +01:00
Zhenzhong Duan
ebfc578715 multifd: Fix flush of zero copy page send request
Make IO channel flush call after the inflight request has been drained
in multifd thread, or else we may missed to flush the inflight request.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:57 +01:00
Zhenzhong Duan
ddbe628c97 multifd: Fix a race on reading MultiFDPages_t.block
In multifd_queue_page() MultiFDPages_t.block is checked twice.
Between the two checks, MultiFDPages_t.block may be reset to NULL
by multifd thread. This lead to the 2nd check always true then a
redundant page submitted to multifd thread again.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:57 +01:00
manish.mishra
6720c2b327 migration: check magic value for deciding the mapping of channels
Current logic assumes that channel connections on the destination side are
always established in the same order as the source and the first one will
always be the main channel followed by the multifid or post-copy
preemption channel. This may not be always true, as even if a channel has a
connection established on the source side it can be in the pending state on
the destination side and a newer connection can be established first.
Basically causing out of order mapping of channels on the destination side.
Currently, all channels except post-copy preempt send a magic number, this
patch uses that magic number to decide the type of channel. This logic is
applicable only for precopy(multifd) live migration, as mentioned, the
post-copy preempt channel does not send any magic number. Also, tls live
migrations already does tls handshake before creating other channels, so
this issue is not possible with tls, hence this logic is avoided for tls
live migrations. This patch uses read peek to check the magic number of
channels so that current data/control stream management remains
un-effected.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: manish.mishra <manish.mishra@nutanix.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2023-02-06 19:22:57 +01:00
Peter Xu
23b7576d78 migration: Use atomic ops properly for page accountings
To prepare for thread-safety on page accountings, at least below counters
need to be accessed only atomically, they are:

        ram_counters.transferred
        ram_counters.duplicate
        ram_counters.normal
        ram_counters.postcopy_bytes

There are a lot of other counters but they won't be accessed outside
migration thread, then they're still safe to be accessed without atomic
ops.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-12-15 10:30:37 +01:00
Juan Quintela
d6f45eba2b multifd: Create page_count fields into both MultiFD{Recv,Send}Params
We were recalculating it left and right.  We plan to change that
values on next patches.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-12-15 10:30:37 +01:00
Juan Quintela
ddec20f858 multifd: Create page_size fields into both MultiFD{Recv,Send}Params
We were calling qemu_target_page_size() left and right.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
2022-12-15 10:30:37 +01:00
Leonardo Bras
4cc47b4395 migration/multifd/zero-copy: Create helper function for flushing
Move flushing code from multifd_send_sync_main() to a new helper, and call
it in multifd_send_sync_main().

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-11-21 11:56:12 +01:00
Leonardo Bras
d59c40cc48 migration/multifd: Report to user when zerocopy not working
Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20 12:15:09 +01:00
Peter Xu
85a8578ea5 migration: Add helpers to detect TLS capability
Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls().  No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185513.27421-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-07-20 12:15:08 +01:00
Daniel P. Berrangé
bc698c367d migration: rename qemu_file_update_transfer to qemu_file_acct_rate_limit
The qemu_file_update_transfer name doesn't give a clear guide on what
its purpose is, and how it differs from the qemu_file_credit_transfer
method. The latter is specifically for accumulating for total migration
traffic, while the former is specifically for accounting in thue rate
limit calculations. The new name give better guidance on its usage.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-06-22 19:33:43 +01:00
Leonardo Bras
5b1d9bab2d multifd: Implement zero copy write in multifd migration (multifd-zero-copy)
Implement zero copy send on nocomp_send_write(), by making use of QIOChannel
writev + flags & flush interface.

Change multifd_send_sync_main() so flush_zero_copy() can be called
after each iteration in order to make sure all dirty pages are sent before
a new iteration is started. It will also flush at the beginning and at the
end of migration.

Also make it return -1 if flush_zero_copy() fails, in order to cancel
the migration process, and avoid resuming the guest in the target host
without receiving all current RAM.

This will work fine on RAM migration because the RAM pages are not usually freed,
and there is no problem on changing the pages content between writev_zero_copy() and
the actual sending of the buffer, because this change will dirty the page and
cause it to be re-sent on a next iteration anyway.

A lot of locked memory may be needed in order to use multifd migration
with zero-copy enabled, so disabling the feature should be necessary for
low-privileged users trying to perform multifd migrations.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-9-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16 13:56:24 +01:00
Leonardo Bras
b7dbdd8e76 multifd: Send header packet without flags if zero-copy-send is enabled
Since d48c3a0445 ("multifd: Use a single writev on the send side"),
sending the header packet and the memory pages happens in the same
writev, which can potentially make the migration faster.

Using channel-socket as example, this works well with the default copying
mechanism of sendmsg(), but with zero-copy-send=true, it will cause
the migration to often break.

This happens because the header packet buffer gets reused quite often,
and there is a high chance that by the time the MSG_ZEROCOPY mechanism get
to send the buffer, it has already changed, sending the wrong data and
causing the migration to abort.

It means that, as it is, the buffer for the header packet is not suitable
for sending with MSG_ZEROCOPY.

In order to enable zero copy for multifd, send the header packet on an
individual write(), without any flags, and the remanining pages with a
writev(), as it was happening before. This only changes how a migration
with zero-copy-send=true works, not changing any current behavior for
migrations with zero-copy-send=false.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-8-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16 13:56:24 +01:00
Leonardo Bras
33d70973a3 multifd: multifd_send_sync_main now returns negative on error
Even though multifd_send_sync_main() currently emits error_reports, it's
callers don't really check it before continuing.

Change multifd_send_sync_main() to return -1 on error and 0 on success.
Also change all it's callers to make use of this change and possibly fail
earlier.

(This change is important to next patch on  multifd zero copy
implementation, to make it sure an error in zero-copy flush does not go
unnoticed.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220513062836.965425-7-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16 13:56:24 +01:00
Leonardo Bras
d2fafb6a68 migration: Add migrate_use_tls() helper
A lot of places check parameters.tls_creds in order to evaluate if TLS is
in use, and sometimes call migrate_get_current() just for that test.

Add new helper function migrate_use_tls() in order to simplify testing
for TLS usage.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220513062836.965425-6-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-05-16 13:56:24 +01:00
Peter Xu
f444eeda71 migration: Move migrate_allow_multifd and helpers into migration.c
This variable, along with its helpers, is used to detect whether multiple
channel will be supported for migration.  In follow up patches, there'll be
other capability that requires multi-channels.  Hence move it outside multifd
specific code and make it public.  Meanwhile rename it from "multifd" to
"multi_channels" to show its real meaning.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-5-peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Peter Xu
7f692ec79a migration: Drop multifd tls_hostname cache
The hostname is cached N times, N equals to the multifd channels.

Drop that cache because after previous patch we've got s->hostname
being alive for the whole lifecycle of migration procedure.

Cc: Juan Quintela <quintela@redhat.com>
Cc: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220331150857.74406-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-04-21 19:36:46 +01:00
Juan Quintela
8c0ec0b2b0 multifd: Rename pages_used to normal_pages
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
faf60935df multifd: recv side only needs the RAMBlock host address
So we can remove the MultiFDPages.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
cf2d4aa8a2 multifd: Use normal pages array on the recv side
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

Rename num_normal_pages to total_normal_pages (peter)
2022-01-28 15:38:23 +01:00
Juan Quintela
815956f039 multifd: Use normal pages array on the send side
We are only sending normal pages through multifd channels.
Later on this series, we are going to also send zero pages.
We are going to detect if a page is zero or non zero in the multifd
channel thread, not on the main thread.

So we receive an array of pages page->offset[N]

And we will end with:

p->normal[N - zero_pages]
p->zero[zero_pages].

In this patch, we just copy all the pages in offset to normal.

for (i = 0; i < pages->num; i++) {
    p->narmal[p->normal_num] = pages->offset[i];
    p->normal_num++:
}

Later in the series this becomes:

for (i = 0; i < pages->num; i++) {
    if (buffer_is_zero(page->offset[i])) {
        p->zerol[p->zero_num] = pages->offset[i];
        p->zero_num++:
    } else {
        p->narmal[p->normal_num] = pages->offset[i];
        p->normal_num++:
    }
}

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

---

Improving comment (dave)
Renaming num_normal_pages to total_normal_pages (peter)
2022-01-28 15:38:23 +01:00
Juan Quintela
c27779a215 multifd: Unfold "used" variable by its value
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
d48c3a0445 multifd: Use a single writev on the send side
Until now, we wrote the packet header with write(), and the rest of the
pages with writev().  Just increase the size of the iovec and do a
single writev().

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
468fcb5dd0 multifd: Remove send_write() method
Everything use now iov's.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
226468ba3d multifd: Move iov from pages to params
This will allow us to reduce the number of system calls on the next patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>
2022-01-28 15:38:23 +01:00
Juan Quintela
04e1140494 migration: All this fields are unsigned
So printing it as %d is wrong.  Notice that for the channel id, that
is an uint8_t, but I changed it anyways for consistency.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
2022-01-28 15:38:22 +01:00
Li Zhang
077fbb5942 multifd: Shut down the QIO channels to avoid blocking the send threads when they are terminated.
When doing live migration with multifd channels 8, 16 or larger number,
the guest hangs in the presence of the network errors such as missing TCP ACKs.

At sender's side:
The main thread is blocked on qemu_thread_join, migration_fd_cleanup
is called because one thread fails on qio_channel_write_all when
the network problem happens and other send threads are blocked on sendmsg.
They could not be terminated. So the main thread is blocked on qemu_thread_join
to wait for the threads terminated.

(gdb) bt
0  0x00007f30c8dcffc0 in __pthread_clockjoin_ex () at /lib64/libpthread.so.0
1  0x000055cbb716084b in qemu_thread_join (thread=0x55cbb881f418) at ../util/qemu-thread-posix.c:627
2  0x000055cbb6b54e40 in multifd_save_cleanup () at ../migration/multifd.c:542
3  0x000055cbb6b4de06 in migrate_fd_cleanup (s=0x55cbb8024000) at ../migration/migration.c:1808
4  0x000055cbb6b4dfb4 in migrate_fd_cleanup_bh (opaque=0x55cbb8024000) at ../migration/migration.c:1850
5  0x000055cbb7173ac1 in aio_bh_call (bh=0x55cbb7eb98e0) at ../util/async.c:141
6  0x000055cbb7173bcb in aio_bh_poll (ctx=0x55cbb7ebba80) at ../util/async.c:169
7  0x000055cbb715ba4b in aio_dispatch (ctx=0x55cbb7ebba80) at ../util/aio-posix.c:381
8  0x000055cbb7173ffe in aio_ctx_dispatch (source=0x55cbb7ebba80, callback=0x0, user_data=0x0) at ../util/async.c:311
9  0x00007f30c9c8cdf4 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
10 0x000055cbb71851a2 in glib_pollfds_poll () at ../util/main-loop.c:232
11 0x000055cbb718521c in os_host_main_loop_wait (timeout=42251070366) at ../util/main-loop.c:255
12 0x000055cbb7185321 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
13 0x000055cbb6e6ba27 in qemu_main_loop () at ../softmmu/runstate.c:726
14 0x000055cbb6ad6fd7 in main (argc=68, argv=0x7ffc0c578888, envp=0x7ffc0c578ab0) at ../softmmu/main.c:50

To make sure that the send threads could be terminated, IO channels should be
shut down to avoid waiting IO.

Signed-off-by: Li Zhang <lizhang@suse.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
01102a2ef6 multifd: Fill offset and block for reception
We were using the iov directly, but we will need this info on the
following patch.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
40a4bfe9d3 multifd: remove used parameter from send_recv_pages() method
It is already there as p->pages->num.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
02fb81043e multifd: remove used parameter from send_prepare() method
It is already there as p->pages->num.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
1943c11a62 multifd: The variable is only used inside the loop
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
18ede636bc multifd: Add missing documention
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
90a3d2f9d5 multifd: Rename used field to num
We will need to split it later in zero_num (number of zero pages) and
normal_num (number of normal pages).  This name is better.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Juan Quintela
144fa06b34 migration: Never call twice qemu_target_page_size()
Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-12-15 10:31:42 +01:00
Li Zhijian
5ad15e8614 migration: allow enabling mutilfd for specific protocol only
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago)

And change the default to true so that in '-incoming defer' case, user is able
to change multifd capability.

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Li Zhijian
b7acd65707 migration: allow multifd for socket protocol only
To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org>
CC: Li Zhijian <lizhijian@cn.fujitsu.com>
Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago)

multifd with unsupported protocol will cause a segment fault.
(gdb) bt
 #0  0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190
 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync
(ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at
../io/channel-socket.c:145
 #2  0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168
 #3  0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124
 #4  0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541
 #5  0x00007f7fe9b5b3f9 in ?? ()
 #6  0x0000000000000000 in ?? ()

It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and
multifd setup() though there are so many other places using migrate_use_multifd().

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Lukas Straub
e9ab82b858 multifd: Unconditionally unregister yank function
To: qemu-devel <qemu-devel@nongnu.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela
 <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares
 Passos <lsoaresp@redhat.com>
Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago)

[[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]]
Unconditionally unregister yank function in multifd_load_cleanup().
If it is not unregistered here, it will leak and cause a crash
in yank_unregister_instance(). Now if the ioc is still in use
afterwards, it will only lead to qemu not being able to recover
from a hang related to that ioc.

After checking the code, i am pretty sure that ref is always 1
when arriving here. So all this currently does is remove the
unneeded check.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Lukas Straub
20171ea895 multifd: Implement yank for multifd send side
To: qemu-devel <qemu-devel@nongnu.org>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela
 <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares
 Passos <lsoaresp@redhat.com>
Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago)

[[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]]
When introducing yank functionality in the migration code I forgot
to cover the multifd send side.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-10-19 08:39:04 +02:00
Peter Xu
18711405b5 migration: Introduce migration_ioc_[un]register_yank()
There're plenty of places in migration/* that checks against either socket or
tls typed ioc for yank operations.  Provide two helpers to hide all these
information.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-4-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26 12:44:54 +01:00
Dr. David Alan Gilbert
a59136f3b1 migration/socket: Close the listener at the end
Delay closing the listener until the cleanup hook at the end; mptcp
needs the listener to stay open while the other paths come in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-5-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08 19:36:19 +01:00
Leonardo Bras
7de2e85653 yank: Unregister function when using TLS migration
After yank feature was introduced in migration, whenever migration
is started using TLS, the following error happens in both source and
destination hosts:

(qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance:
Assertion `QLIST_EMPTY(&entry->yankfns)' failed.

This happens because of a missing yank_unregister_function() when using
qio-channel-tls.

Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform
yank_unregister_function() in channel_close() and multifd_load_cleanup().

Also, inside migration_channel_connect() and
migration_channel_process_incoming() move yank_register_function() so
it only runs once on a TLS migration.

Fixes: b5eea99ec2 ("migration: Add yank feature", 2021-01-13)
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326
Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Peter Xu <peterx@redhat.com>

--
Changes since v2:
- Dropped all references to ioc->master
- yank_register_function() and yank_unregister_function() now only run
  once in a TLS migration.

Changes since v1:
- Cast p->c to QIOChannelTLS into multifd_load_cleanup()
Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com>

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-08 18:50:03 +01:00
David Hildenbrand
c1668bde5c migration/multifd: Print used_length of memory block
We actually want to print the used_length, against which we check.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-10-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-13 18:21:14 +01:00
Lukas Straub
1a92d6d500 yank: Remove dependency on qiochannel
Remove dependency on qiochannel by removing yank_generic_iochannel and
letting migration and chardev use their own yank function for
iochannel.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20ff143fc2db23e27cd41d38043e481376c9cec1.1616521341.git.lukasstraub2@web.de>
2021-04-01 15:27:44 +04:00
Hao Wang
fca676429c migration/tls: add error handling in multifd_tls_handshake_thread
If any error happens during multifd send thread creating (e.g. channel broke
because new domain is destroyed by the dst), multifd_tls_handshake_thread
may exit silently, leaving main migration thread hanging (ram_save_setup ->
multifd_send_sync_main -> qemu_sem_wait(&p->sem_sync)).
Fix that by adding error handling in multifd_tls_handshake_thread.

Signed-off-by: Hao Wang <wanghao232@huawei.com>
Message-Id: <20210209104237.2250941-3-wanghao232@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15 20:01:55 +00:00
Hao Wang
a339149afa migration/tls: fix inverted semantics in multifd_channel_connect
Function multifd_channel_connect() return "true" to indicate failure,
which is rather confusing. Fix that.

Signed-off-by: Hao Wang <wanghao232@huawei.com>
Message-Id: <20210209104237.2250941-2-wanghao232@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-15 20:01:55 +00:00
Lukas Straub
b5eea99ec2 migration: Add yank feature
Register yank functions on sockets to shut them down.

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13 10:21:17 +01:00
Chuan Zheng
9e8424088c multifd/tls: fix memoryleak of the QIOChannelSocket object when cancelling migration
When creating new tls client, the tioc->master will be referenced which results in socket
leaking after multifd_save_cleanup if we cancel migration.
Fix it by do object_unref() after tls client creation.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1605104763-118687-1-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 15:52:20 +00:00
Chuan Zheng
a1af605bd5 migration/multifd: fix hangup with TLS-Multifd due to blocking handshake
The qemu main loop could hang up forever when we enable TLS+Multifd.
The Src multifd_send_0 invokes tls handshake, it sends hello to sever
and wait response.
However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1.
Both of Src and Dst main qemu loop are blocking and waiting for reponse which
results in hanging up forever.

Src: (multifd_send_0)                                              Dst: (multifd_recv_1)
multifd_channel_connect                                            migration_channel_process_incoming
  multifd_tls_channel_connect                                        migration_tls_channel_process_incoming
    multifd_tls_channel_connect                                        qio_channel_tls_handshake_task
       qio_channel_tls_handshake                                         gnutls_handshake
          qio_channel_tls_handshake_task                                       ...
            qcrypto_tls_session_handshake                                      ...
              gnutls_handshake                                                 ...
                   ...                                                         ...
                recvmsg (Blocking I/O waiting for response)                recvmsg (Blocking I/O waiting for response)

Fix this by offloadinig handshake work to a background thread.

Reported-by: Yan Jin <jinyan12@huawei.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Message-Id: <1604643893-8223-1-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12 14:35:29 +00:00
Peter Maydell
92d0950267 Trivial Patches Pull request 20200928
-----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl9xqZQSHGxhdXJlbnRA
 dml2aWVyLmV1AAoJEPMMOL0/L748nFQQAJL/5ydfeo+t9RVTfatMI6bRP+loqQzc
 Y/t/xYgOfCQsMjXRU5HHNre+FtltqP0lVU2Ey2zAon7MjJDXfQOCv6aOK+vsOGXi
 FITC6vZ050VrVy7iPfXaJR5aIbkJme4NXLgJj9mqaFZELgoTAMCuCkGN+km3n/Uw
 pf0lI43VSkLt3pvHvGjy2UT51OjH6/LxaXcgY2w67nIH+KcLgxdh7fXlD+5Gxdug
 458gbMbtqAPb6qNV7jBbrgUMRx5hpUKoa5QvL0DWkIsboemPJGsTlw0nhON5ZPQ7
 XYNPyb9ELPYE5V8I9Ki+ESzsWFMVMdu0Hj/MNbnSDdg+2uR0xgCSIRnnuEwpSmLB
 jbhKa8b3B3nPFFNQAqQ6FPpOW76PpwFVKRAPT3p3rnDqrtXLUJpGdGzTz8ltMZry
 pOxdbSkuEl+79D1i5Lt5mfCqRNqOjYk1awPO4K/JdmhJxk9dFmY5X21edaIe6lN+
 GiZlE43fF+GM3HelplnTCIwlRAjhUX/PRSDBkeLuPYj0qFob27MFauMcsGvC28FI
 CQIY3CmIFCmzf8c1DUE3TVYWpJj0e+OnKU02D89/FF4M4TOGTa1/CtpNcpSDgE5A
 TCEw4cyEG1LEvDtw4DRdLBKVDnFW8XiUPz2xVC87/dZSC88CTllLnxtsaSfDHuui
 0Y0BBZ3MJxgs
 =t5Q1
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging

Trivial Patches Pull request 20200928

# gpg: Signature made Mon 28 Sep 2020 10:15:00 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/trivial-branch-for-5.2-pull-request:
  docs/system/deprecated: Move lm32 and unicore32 to the right section
  migration/multifd: Remove superfluous semicolons
  timer: Fix timer_mod_anticipate() documentation
  vhost-vdpa: remove useless variable
  Add *.pyc back to the .gitignore file
  virtio: vdpa: omit check return of g_malloc
  meson: fix static flag summary
  vhost-vdpa: fix indentation in vdpa_ops

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-09-28 14:03:09 +01:00
Chuan Zheng
894f021411 migration/tls: add trace points for multifd-tls
add trace points for multifd-tls for debug.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-7-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Chuan Zheng
2964714015 migration/tls: add support for multifd tls-handshake
Similar like migration main thread, we need to do handshake
for each multifd thread.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-6-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Chuan Zheng
03c7a42d0d migration/tls: extract cleanup function for common-use
multifd channel cleanup is need if multifd handshake failed,
let's extract it.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <1600139042-104593-5-git-send-email-zhengchuan@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Chuan Zheng
8e5fa05932 migration/tls: add tls_hostname into MultiFDSendParams
Since multifd creation is async with migration_channel_connect, we should
pass the hostname from MigrationState to MultiFDSendParams.

Signed-off-by: Chuan Zheng <zhengchuan@huawei.com>
Signed-off-by: Yan Jin <jinyan12@huawei.com>
Message-Id: <1600139042-104593-4-git-send-email-zhengchuan@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25 12:45:58 +01:00
Philippe Mathieu-Daudé
df55509470 migration/multifd: Remove superfluous semicolons
checkpatch.pl report superfluous semicolons since commit
ee0f3c09e0, but this one was missed:

  scripts/checkpatch.pl d32ca5ad798~..d32ca5ad798
  ERROR: superfluous trailing semicolon
  #498: FILE: migration/multifd.c:308:
  +    ram_counters.transferred += transferred;;
  total: 1 errors, 1 warnings, 2073 lines checked

Fixes: d32ca5ad79 ("multifd: Split multifd code into its own file")
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20200921040231.437653-1-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-23 19:13:56 +02:00
Stefan Hajnoczi
d73415a315 qemu/atomic.h: rename atomic_ to qatomic_
clang's C11 atomic_fetch_*() functions only take a C11 atomic type
pointer argument. QEMU uses direct types (int, etc) and this causes a
compiler error when a QEMU code calls these functions in a source file
that also included <stdatomic.h> via a system header file:

  $ CC=clang CXX=clang++ ./configure ... && make
  ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid)

Avoid using atomic_*() names in QEMU's atomic.h since that namespace is
used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h
and <stdatomic.h> can co-exist. I checked /usr/include on my machine and
searched GitHub for existing "qatomic_" users but there seem to be none.

This patch was generated using:

  $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \
    sort -u >/tmp/changed_identifiers
  $ for identifier in $(</tmp/changed_identifiers); do
        sed -i "s%\<$identifier\>%q$identifier%g" \
            $(git grep -I -l "\<$identifier\>")
    done

I manually fixed line-wrap issues and misaligned rST tables.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-09-23 16:07:44 +01:00
zhaolichang
3a4452d896 migration/: fix some comment spelling errors
I found that there are many spelling errors in the comments of qemu,
so I used the spellcheck tool to check the spelling errors
and finally found some spelling errors in the migration folder.

Signed-off-by: zhaolichang <zhaolichang@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20200917075029.313-3-zhaolichang@huawei.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-09-17 20:36:32 +02:00
Laurent Vivier
7e89a1401a migration: fix multifd_send_pages() next channel
multifd_send_pages() loops around the available channels,
the next channel to use between two calls to multifd_send_pages() is stored
inside a local static variable, next_channel.

It works well, except if the number of channels decreases between two calls
to multifd_send_pages(). In this case, the loop can try to access the
data of a channel that doesn't exist anymore.

The problem can be triggered if we start a migration with a given number of
channels and then we cancel the migration to restart it with a lower number.
This ends generally with an error like:
qemu-system-ppc64: .../util/qemu-thread-posix.c:77: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.

This patch fixes the error by capping next_channel with the current number
of channels before using it.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20200617113154.593233-1-lvivier@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-17 17:48:39 +01:00
Pan Nengyuan
13f2cb21e5 migration/multifd: Do error_free after migrate_set_error to avoid memleaks
When error happen in multifd_send_thread, it use error_copy to set migrate error in
multifd_send_terminate_threads(). We should call error_free after it.

Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup.

The leak stack:
Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0)
    #1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d)
    #2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61
    #3 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #4 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
    #5 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
    #6 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
    #7 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
    #8 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
    #9 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
    #10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
    #11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)

Indirect leak of 52 byte(s) in 1 object(s) allocated from:
    #0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28)
    #1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9)
    #2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c)
    #3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0)
    #4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c)
    #5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65
    #6 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109
    #7 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569
    #8 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207
    #9 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171
    #10 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257
    #11 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657
    #12 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519
    #13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd)
    #14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2)

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com>
Message-Id: <20200506095416.26099-3-pannengyuan@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07 17:40:24 +01:00