mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Zhao Liu	35e83a9f61	migration/option: Fix missing ERRP_GUARD() for error_prepend() As the comment in qapi/error, passing @errp to error_prepend() requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: ... * - It should not be passed to error_prepend(), error_vprepend() or * error_append_hint(), because that doesn't work with &error_fatal. * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user can't see this additional information, because exit() happens in error_setg earlier than information is added [1]. The migrate_params_check() passes @errp to error_prepend() without ERRP_GUARD(), and it could be called from migration_object_init(), where the passed @errp points to @error_fatal. Therefore, the error message echoed in error_prepend() will be lost because of the above issue. To fix this, add missing ERRP_GUARD() at the beginning of this function. [1]: Issue description in the commit message of commit `ae7c80a7bd` ("error: New macro ERRP_GUARD()"). Cc: Peter Xu <peterx@redhat.com> Cc: Fabiano Rosas <farosas@suse.de> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Peter Xu <peterx@redhat.com> Message-ID: <20240311033822.3142585-28-zhao1.liu@linux.intel.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-03-12 11:45:45 +01:00
Peter Maydell	7d4e29ef80	QAPI patches patches for 2024-03-04 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmXlaSISHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTdZ8P/iMgqLoAFkCCjwfkUc/rqZUezK52Ynr7 LYwOPI/xcYD7EnVogdRgFgjWFNoivQLP5yKsU/eRTk29pwdDzTscFm/0ztTQX/Gb ypWV+GBcu5J8mKbp1KF5w68aDD8Bat4WRfEgDQ1DV7v6CoMiUzTiF3CGXkYzqK5Y kYNq97vdEkBFvFdOl/7scs/XXN2jG27egDhMp68RTxnPHlXZiAO9/2Bul3uVe3x0 fzQ2ViYv0qLnjE/PwENDqqE3Thv3Sxp5iEeQQ6GWi07EVh07UtHpOM3RYyrTU0Sb VrTApSrg0oxlkOuR0CBd9Fi+timtbokBL0DWyUpXNTfIEZfLtA9H+8riUg3EOcDp r7a4SI/27VdPxX6Kc6zA3bi+/j1o7CLTW2LGEwuZs52nmixoo1HTWPIFdyh13g/V QjNbun0fViHb0FVLiyDlXF/7Y+EWUWIyqwwGqbvve1DyUHQmo3CUQAKGOpkeKSBe 4eGciVDgpBoKhtw9Kv6LCDj2cwZKC8DxBMibf7GHkOnAsX2mnyuHcey7HvYNCoF+ yYz7oIEXdlL2eWqg7CfBZK7lniCDln50RI4Ll1v+J4r1v1kRZGMLesTYXCdNc4ku yb4kpU4t22/RODffLE7K+fc3Onwze3fcfxlZMN66F+wFtk4KdPR2aQBE66bB8J99 vuSKlTbT4cGL =s9AR -----END PGP SIGNATURE----- Merge tag 'pull-qapi-2024-03-04' of https://repo.or.cz/qemu/armbru into staging QAPI patches patches for 2024-03-04 # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmXlaSISHGFybWJydUBy # ZWRoYXQuY29tAAoJEDhwtADrkYZTdZ8P/iMgqLoAFkCCjwfkUc/rqZUezK52Ynr7 # LYwOPI/xcYD7EnVogdRgFgjWFNoivQLP5yKsU/eRTk29pwdDzTscFm/0ztTQX/Gb # ypWV+GBcu5J8mKbp1KF5w68aDD8Bat4WRfEgDQ1DV7v6CoMiUzTiF3CGXkYzqK5Y # kYNq97vdEkBFvFdOl/7scs/XXN2jG27egDhMp68RTxnPHlXZiAO9/2Bul3uVe3x0 # fzQ2ViYv0qLnjE/PwENDqqE3Thv3Sxp5iEeQQ6GWi07EVh07UtHpOM3RYyrTU0Sb # VrTApSrg0oxlkOuR0CBd9Fi+timtbokBL0DWyUpXNTfIEZfLtA9H+8riUg3EOcDp # r7a4SI/27VdPxX6Kc6zA3bi+/j1o7CLTW2LGEwuZs52nmixoo1HTWPIFdyh13g/V # QjNbun0fViHb0FVLiyDlXF/7Y+EWUWIyqwwGqbvve1DyUHQmo3CUQAKGOpkeKSBe # 4eGciVDgpBoKhtw9Kv6LCDj2cwZKC8DxBMibf7GHkOnAsX2mnyuHcey7HvYNCoF+ # yYz7oIEXdlL2eWqg7CfBZK7lniCDln50RI4Ll1v+J4r1v1kRZGMLesTYXCdNc4ku # yb4kpU4t22/RODffLE7K+fc3Onwze3fcfxlZMN66F+wFtk4KdPR2aQBE66bB8J99 # vuSKlTbT4cGL # =s9AR # -----END PGP SIGNATURE----- # gpg: Signature made Mon 04 Mar 2024 06:24:34 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * tag 'pull-qapi-2024-03-04' of https://repo.or.cz/qemu/armbru: migration: simplify exec migration functions qapi: New strv_from_str_list() qapi: New QAPI_LIST_LENGTH() docs/devel/writing-monitor-commands: Minor improvements docs/devel/writing-monitor-commands: Repair a decade of rot qapi: Reject "Returns" section when command doesn't return anything qga/qapi-schema: Fix guest-set-memory-blocks documentation qga/qapi-schema: Tweak documentation of fsfreeze commands qga/qapi-schema: Clean up "Returns" sections qga/qapi-schema: Delete useless "Returns" sections qga/qapi-schema: Move error documentation to new "Errors" sections qapi/yank: Tweak @yank's error description for consistency qapi: Clean up "Returns" sections qapi: Delete useless "Returns" sections qapi: Move error documentation to new "Errors" sections qapi: New documentation section tag "Errors" qapi: Slightly clearer error message for invalid "Returns" section qapi: Memorize since & returns sections Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-03-05 11:20:15 +00:00
Peter Maydell	c90cfb5294	Migartion pull request for 20240304 - Bryan's fix on multifd compression level API - Fabiano's mapped-ram series (base + multifd only) - Steve's amend on cpr document in qapi/ -----BEGIN PGP SIGNATURE----- iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZeUjKhIccGV0ZXJ4QHJl ZGhhdC5jb20ACgkQO1/MzfOr1wbv5QD/ZexBUsmZA5qyxgGvZ2yvlUBEGNOvtmKY kRdiYPU7khMA/0N43rn4LcqKCoq4+T+EAnYizGjIyhH/7BRUyn4DUxgO =AeEn -----END PGP SIGNATURE----- Merge tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu into staging Migartion pull request for 20240304 - Bryan's fix on multifd compression level API - Fabiano's mapped-ram series (base + multifd only) - Steve's amend on cpr document in qapi/ # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZeUjKhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbv5QD/ZexBUsmZA5qyxgGvZ2yvlUBEGNOvtmKY # kRdiYPU7khMA/0N43rn4LcqKCoq4+T+EAnYizGjIyhH/7BRUyn4DUxgO # =AeEn # -----END PGP SIGNATURE----- # gpg: Signature made Mon 04 Mar 2024 01:26:02 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-next-pull-request' of https://gitlab.com/peterx/qemu: (27 commits) migration/multifd: Document two places for mapped-ram tests/qtest/migration: Add a multifd + mapped-ram migration test migration/multifd: Add mapped-ram support to fd: URI migration/multifd: Support incoming mapped-ram stream format migration/multifd: Support outgoing mapped-ram stream format migration/multifd: Prepare multifd sync for mapped-ram migration migration/multifd: Add incoming QIOChannelFile support migration/multifd: Add outgoing QIOChannelFile support migration/multifd: Add a wrapper for channels_created migration/multifd: Allow receiving pages without packets migration/multifd: Allow multifd without packets migration/multifd: Decouple recv method from pages migration/multifd: Rename MultiFDSend\|RecvParams::data to compress_data tests/qtest/migration: Add tests for mapped-ram file-based migration migration/ram: Add incoming 'mapped-ram' migration migration/ram: Add outgoing 'mapped-ram' migration migration: Add mapped-ram URI compatibility check migration/ram: Introduce 'mapped-ram' migration capability migration/qemu-file: add utility methods for working with seekable channels io: fsync before closing a file channel ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # migration/ram.c	2024-03-05 11:19:58 +00:00
Steve Sistare	018d5fb1f9	migration: simplify exec migration functions Simplify the exec migration code by using list utility functions. As a side effect, this also fixes a minor memory leak. On function return, "g_auto(GStrv) argv" frees argv and each element, which is wrong, because the function does not own the individual elements. To compensate, the code uses g_steal_pointer which NULLs argv and prevents the destructor from running, but argv is leaked. Fixes: `cbab4face5` ("migration: convert exec backend ...") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20240227153321.467343-4-armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2024-03-04 07:12:40 +01:00
Peter Xu	1a6e217c35	migration/multifd: Document two places for mapped-ram Add two documentations for mapped-ram migration on two spots that may not be extremely clear. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240301091524.39900-1-peterx@redhat.com Cc: Prasad Pandit <ppandit@redhat.com> [peterx: fix two English errors per Prasad] Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-04 08:31:11 +08:00
Fabiano Rosas	decdc76772	migration/multifd: Add mapped-ram support to fd: URI If we receive a file descriptor that points to a regular file, there's nothing stopping us from doing multifd migration with mapped-ram to that file. Enable the fd: URI to work with multifd + mapped-ram. Note that the fds passed into multifd are duplicated because we want to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The original fd doesn't need to be duplicated because monitor_get_fd() transfers ownership to the caller. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	a49d15a38d	migration/multifd: Support incoming mapped-ram stream format For the incoming mapped-ram migration we need to read the ramblock headers, get the pages bitmap and send the host address of each non-zero page to the multifd channel thread for writing. Usage on HMP is: (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_incoming file:migfile (the ram.h include needs to move because we've been previously relying on it being included from migration.c. Now file.h will start including multifd.h before migration.o is processed) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-22-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	f427d90b98	migration/multifd: Support outgoing mapped-ram stream format The new mapped-ram stream format uses a file transport and puts ram pages in the migration file at their respective offsets and can be done in parallel by using the pwritev system call which takes iovecs and an offset. Add support to enabling the new format along with multifd to make use of the threading and page handling already in place. This requires multifd to stop sending headers and leaving the stream format to the mapped-ram code. When it comes time to write the data, we need to call a version of qio_channel_write that can take an offset. Usage on HMP is: (qemu) stop (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_set_parameter max-bandwidth 0 (qemu) migrate_set_parameter multifd-channels 8 (qemu) migrate file:migfile Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-21-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	9d01778af8	migration/multifd: Prepare multifd sync for mapped-ram migration The mapped-ram migration can be performed live or non-live, but it is always asynchronous, i.e. the source machine and the destination machine are not migrating at the same time. We only need some pieces of the multifd sync operations. multifd_send_sync_main() ------------------------ Issued by the ram migration code on the migration thread, causes the multifd send channels to synchronize with the migration thread and makes the sending side emit a packet with the MULTIFD_FLUSH flag. With mapped-ram we want to maintain the sync on the sending side because that provides ordering between the rounds of dirty pages when migrating live. MULTIFD_FLUSH ------------- On the receiving side, the presence of the MULTIFD_FLUSH flag on a packet causes the receiving channels to start synchronizing with the main thread. We're not using packets with mapped-ram, so there's no MULTIFD_FLUSH flag and therefore no channel sync on the receiving side. multifd_recv_sync_main() ------------------------ Issued by the migration thread when the ram migration flag RAM_SAVE_FLAG_MULTIFD_FLUSH is received, causes the migration thread on the receiving side to start synchronizing with the recv channels. Due to compatibility, this is also issued when RAM_SAVE_FLAG_EOS is received. For mapped-ram we only need to synchronize the channels at the end of migration to avoid doing cleanup before the channels have finished their IO. Make sure the multifd syncs are only issued at the appropriate times. Note that due to pre-existing backward compatibility issues, we have the multifd_flush_after_each_section property that can cause a sync to happen at EOS. Since the EOS flag is needed on the stream, allow mapped-ram to just ignore it. Also emit an error if any other unexpected flags are found on the stream. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-20-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	2dd7ee7a51	migration/multifd: Add incoming QIOChannelFile support On the receiving side we don't need to differentiate between main channel and threads, so whichever channel is defined first gets to be the main one. And since there are no packets, use the atomic channel count to index into the params array. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-19-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	b7b03eb614	migration/multifd: Add outgoing QIOChannelFile support Allow multifd to open file-backed channels. This will be used when enabling the mapped-ram migration stream format which expects a seekable transport. The QIOChannel read and write methods will use the preadv/pwritev versions which don't update the file offset at each call so we can reuse the fd without re-opening for every channel. Contrary to the socket migration, the file migration doesn't need an asynchronous channel creation process, so expose multifd_channel_connect() and call it directly. Note that this is just setup code and multifd cannot yet make use of the file channels. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-18-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	a8a3e7102c	migration/multifd: Add a wrapper for channels_created We'll need to access multifd_send_state->channels_created from outside multifd.c, so introduce a helper for that. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-17-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	d117ed0699	migration/multifd: Allow receiving pages without packets Currently multifd does not need to have knowledge of pages on the receiving side because all the information needed is within the packets that come in the stream. We're about to add support to mapped-ram migration, which cannot use packets because it expects the ramblock section in the migration file to contain only the guest pages data. Add a data structure to transfer pages between the ram migration code and the multifd receiving threads. We don't want to reuse MultiFDPages_t for two reasons: a) multifd threads don't really need to know about the data they're receiving. b) the receiving side has to be stopped to load the pages, which means we can experiment with larger granularities than page size when transferring data. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-16-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	06833d83f8	migration/multifd: Allow multifd without packets For the upcoming support to the new 'mapped-ram' migration stream format, we cannot use multifd packets because each write into the ramblock section in the migration file is expected to contain only the guest pages. They are written at their respective offsets relative to the ramblock section header. There is no space for the packet information and the expected gains from the new approach come partly from being able to write the pages sequentially without extraneous data in between. The new format also simply doesn't need the packets and all necessary information can be taken from the standard migration headers with some (future) changes to multifd code. Use the presence of the mapped-ram capability to decide whether to send packets. This only moves code under multifd_use_packets(), it has no effect for now as mapped-ram cannot yet be enabled with multifd. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-15-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	9db1912513	migration/multifd: Decouple recv method from pages Next patches will abstract the type of data being received by the channels, so do some cleanup now to remove references to pages and dependency on 'normal_num'. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-14-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	402dd7ac1c	migration/multifd: Rename MultiFDSend\|RecvParams::data to compress_data Use a more specific name for the compression data so we can use the generic for the multifd core code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-13-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	2f6b8826a5	migration/ram: Add incoming 'mapped-ram' migration Add the necessary code to parse the format changes for the 'mapped-ram' capability. One of the more notable changes in behavior is that in the 'mapped-ram' case ram pages are restored in one go rather than constantly looping through the migration stream. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-11-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	c2d5c4a7cb	migration/ram: Add outgoing 'mapped-ram' migration Implement the outgoing migration side for the 'mapped-ram' capability. A bitmap is introduced to track which pages have been written in the migration file. Pages are written at a fixed location for every ramblock. Zero pages are ignored as they'd be zero in the destination migration as well. The migration stream is altered to put the dirty pages for a ramblock after its header instead of having a sequential stream of pages that follow the ramblock headers. Without mapped-ram (current): With mapped-ram (new): --------------------- -------------------------------- \| ramblock 1 header \| \| ramblock 1 header \| --------------------- -------------------------------- \| ramblock 2 header \| \| ramblock 1 mapped-ram header \| --------------------- -------------------------------- \| ... \| \| padding to next 1MB boundary \| --------------------- \| ... \| \| ramblock n header \| -------------------------------- --------------------- \| ramblock 1 pages \| \| RAM_SAVE_FLAG_EOS \| \| ... \| --------------------- -------------------------------- \| stream of pages \| \| ramblock 2 header \| \| (iter 1) \| -------------------------------- \| ... \| \| ramblock 2 mapped-ram header \| --------------------- -------------------------------- \| RAM_SAVE_FLAG_EOS \| \| padding to next 1MB boundary \| --------------------- \| ... \| \| stream of pages \| -------------------------------- \| (iter 2) \| \| ramblock 2 pages \| \| ... \| \| ... \| --------------------- -------------------------------- \| ... \| \| ... \| --------------------- -------------------------------- \| RAM_SAVE_FLAG_EOS \| -------------------------------- \| ... \| -------------------------------- where: - ramblock header: the generic information for a ramblock, such as idstr, used_len, etc. - ramblock mapped-ram header: the new information added by this feature: bitmap of pages written, bitmap size and offset of pages in the migration file. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-10-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	8d9e0d4100	migration: Add mapped-ram URI compatibility check The mapped-ram migration format needs a channel that supports seeking to be able to write each page to an arbitrary offset in the migration stream. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-9-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	4ed49feb44	migration/ram: Introduce 'mapped-ram' migration capability Add a new migration capability 'mapped-ram'. The core of the feature is to ensure that RAM pages are mapped directly to offsets in the resulting migration file instead of being streamed at arbitrary points. The reasons why we'd want such behavior are: - The resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a VM with, say, 1G of RAM can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory. - It paves the way to implement O_DIRECT-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. - It allows the usage of multifd so we can write RAM pages to the migration file in parallel. For now, enabling the capability has no effect. The next couple of patches implement the core functionality. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	7f5b50a401	migration/qemu-file: add utility methods for working with seekable channels Add utility methods that will be needed when implementing 'mapped-ram' migration capability. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	4aac6b1e9b	migration/multifd: Cleanup multifd_recv_sync_main Some minor cleanups and documentation for multifd_recv_sync_main. Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Bryan Zhang	b4014a2bf5	migration: Properly apply migration compression level parameters Some glue code was missing, so that using `qmp_migrate_set_parameters` to set `multifd-zstd-level` or `multifd-zlib-level` did not work. This commit adds the glue code to fix that. Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Link: https://lore.kernel.org/r/20240301035901.4006936-2-bryan.zhang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 14:14:55 +08:00
Richard Henderson	5d2203691e	migration: Remove qemu_host_page_size Replace with the maximum of the real host page size and the target page size. This is an exact replacement. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: Helge Deller <deller@gmx.de> Message-Id: <20240102015808.132373-12-richard.henderson@linaro.org>	2024-02-29 11:35:36 -10:00
Cédric Le Goater	9425ef3f99	migration: Use migrate_has_error() in close_return_path_on_source() close_return_path_on_source() retrieves the migration error from the the QEMUFile '->to_dst_file' to know if a shutdown is required. This shutdown is required to exit the return-path thread. Avoid relying on '->to_dst_file' and use migrate_has_error() instead. (using to_dst_file is a heuristic to infer whether rp_state.from_dst_file might be stuck on a recvmsg(). Using a generic method for detecting errors is more reliable. We also want to reduce dependency on QEMUFile::last_error) Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> [added some words about the motivation for this patch] Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240226203122.22894-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Fabiano Rosas	22b04245f0	migration: Join the return path thread before releasing to_dst_file The return path thread might hang at a blocking system call. Before joining the thread we might need to issue a shutdown() on the socket file descriptor to release it. To determine whether the shutdown() is necessary we look at the QEMUFile error. Make sure we only clean up the QEMUFile after the return path has been waited for. This fixes a hang when qemu_savevm_state_setup() produced an error that was detected by migration_detect_error(). That skips migration_completion() so close_return_path_on_source() would get stuck waiting for the RP thread to terminate. Reported-by: Cédric Le Goater <clg@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240226203122.22894-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Fabiano Rosas	63f64d77f0	migration: Fix qmp_query_migrate mbps value The QMP command query_migrate might see incorrect throughput numbers if it runs after we've set the migration completion status but before migration_calculate_complete() has updated s->total_time and s->mbps. The migration status would show COMPLETED, but the throughput value would be the one from the last iteration and not the one from the whole migration. This will usually be a larger value due to the time period being smaller (one iteration). Move migration_calculate_complete() earlier so that the status MIGRATION_STATUS_COMPLETED is only emitted after the final counters update. Keep everything under the BQL so the QMP thread sees the updates as atomic. Rename migration_calculate_complete to migration_completion_end to reflect its new purpose of also updating s->state. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240226143335.14282-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	cbdafc1b34	migration: options incompatible with cpr Fail the migration request if options are set that are incompatible with cpr. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-15-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	9867d4ddd0	migration: stop vm for cpr When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	4af667f87c	migration: notifier error checking Check the status returned by migration notifiers for event type MIG_EVENT_PRECOPY_SETUP, and report errors. None of the notifiers return an error status at this time. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-10-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	bf78a046b9	migration: refactor migrate_fd_connect failures Move common code for the error path in migrate_fd_connect to a shared fail label. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-9-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	6835f5a1bc	migration: per-mode notifiers Keep a separate list of migration notifiers for each migration mode. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	5663dd3f1a	migration: MigrationNotifyFunc Define MigrationNotifyFunc to improve type safety and simplify migration notifiers. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	c763a23e41	migration: remove postcopy_after_devices postcopy_after_devices and migration_in_postcopy_after_devices are no longer used, so delete them. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	9d9babf78d	migration: MigrationEvent for notifiers Passing MigrationState to notifiers is unsound because they could access unstable migration state internals or even modify the state. Instead, pass the minimal info needed in a new MigrationEvent struct, which could be extended in the future if needed. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	3e7757301c	migration: convert to NotifierWithReturn Change all migration notifiers to type NotifierWithReturn, so notifiers can return an error status in a future patch. For now, pass NULL for the notifier error parameter, and do not check the return value. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-4-git-send-email-steven.sistare@oracle.com [peterx: dropped unexpected update to roms/seabios-hppa] Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	d91f33c72e	migration: remove error from notifier data Remove the error object from opaque data passed to notifiers. Use the new error parameter passed to the notifier instead. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Steve Sistare	be19d836cd	notify: pass error to notifier with return Pass an error object as the third parameter to "notifier with return" notifiers, so clients no longer need to bundle an error object in the opaque data. The new parameter is used in a later patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Peter Xu	c9a7e83c9d	migration/multifd: Drop unnecessary helper to destroy IOC Both socket_send_channel_destroy() and multifd_send_channel_destroy() are unnecessary wrappers to destroy an IOC, as the only thing to do is to release the final IOC reference. We have plenty of code that destroys an IOC using direct unref() already; keep that style. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Peter Xu	72b90b9687	migration/multifd: Cleanup outgoing_args in state destroy outgoing_args is a global cache of socket address to be reused in multifd. Freeing the cache in per-channel destructor is more or less a hack. Move it to multifd_send_cleanup_state() so it only get checked once. Use a small helper to do so because it's internal of socket.c. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Peter Xu	770de49c00	migration/multifd: Make multifd_channel_connect() return void It never fails, drop the retval and also the Error**. Suggested-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Peter Xu	0518b5d8d3	migration/multifd: Drop registered_yank With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equal meaning of "p->c != NULL". Drop registered_yank by checking p->c instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Peter Xu	9221e3c6a2	migration/multifd: Cleanup TLS iochannel referencing Commit `a1af605bd5` ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") introduced a thread for TLS channels, which will resolve the issue on blocking the main thread. However in the same commit p->c is slightly abused just to be able to pass over the pointer "p" into the thread. That's the major reason we'll need to conditionally free the io channel in the fault paths. To clean it up, using a separate structure to pass over both "p" and "tioc" in the tls handshake thread. Then we can make it a rule that p->c will never be set until the channel is completely setup. With that, we can drop the tricky conditional unref of the io channel in the error path. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Fabiano Rosas	d13f0026c7	migration/multifd: Release recv sem_sync earlier Now that multifd_recv_terminate_threads() is called only once, release the recv side sem_sync earlier like we do for the send side. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Fabiano Rosas	11dd7be575	migration/multifd: Remove p->quit from recv side Like we did on the sending side, replace the p->quit per-channel flag with a global atomic 'exiting' flag. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-28 11:31:28 +08:00
Fabiano Rosas	93fa9dc2e0	migration/multifd: Add a synchronization point for channel creation It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight. This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash. Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads. A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel. Reported-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-07 09:53:18 +08:00
Fabiano Rosas	2576ae488e	migration/multifd: Unify multifd and TLS connection paths During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished. This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated. Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS. Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-07 09:53:18 +08:00
Fabiano Rosas	dd904bc13f	migration/multifd: Move multifd_send_setup into migration thread We currently have an unfavorable situation around multifd channels creation and the migration thread execution. We create the multifd channels with qio_channel_socket_connect_async -> qio_task_run_in_thread, but only connect them at the multifd_new_send_channel_async callback, called from qio_task_complete, which is registered as a glib event. So at multifd_send_setup() we create the channels, but they will only be actually usable after the whole multifd_send_setup() calling stack returns back to the main loop. Which means that the migration thread is already up and running without any possibility for the multifd channels to be ready on time. We currently rely on the channels-ready semaphore blocking multifd_send_sync_main() until channels start to come up and release it. However there have been bugs recently found when a channel's creation fails and multifd_send_cleanup() is allowed to run while other channels are still being created. Let's start to organize this situation by moving the multifd_send_setup() call into the migration thread. That way we unblock the main-loop to dispatch the completion callbacks and actually have a chance of getting the multifd channels ready for when the migration thread needs them. The next patches will deal with the synchronization aspects. Note that this takes multifd_send_setup() out of the BQL. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-07 09:53:18 +08:00
Fabiano Rosas	bd8b0a8f82	migration/multifd: Move multifd_send_setup error handling in to the function Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-07 09:53:18 +08:00
Fabiano Rosas	a2a63c4abd	migration/multifd: Remove p->running We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: `2964714015` ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-07 09:53:18 +08:00

1 2 3 4 5 ...

2211 Commits