mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Markus Armbruster	fdac62dbd3	migration: Rephrase message on failure to save / load Xen device state Functions that use an Error **errp parameter to return errors should not also report them to the user, because reporting is the caller's job. When the caller does, the error is reported twice. When it doesn't (because it recovered from the error), there is no error to report, i.e. the report is bogus. qmp_xen_save_devices_state() and qmp_xen_load_devices_state() violate this principle: they call qemu_save_device_state() and qemu_loadvm_state(), which call error_report_err(). I wish I could clean this up now, but migration's error reporting is too complicated (confused?) for me to mess with it. Instead, I'm merely improving the error reported by qmp_xen_load_devices_state() and qmp_xen_load_devices_state() to the QMP core from An IO error has occurred to saving Xen device state failed and loading Xen device state failed respectively. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240513141703.549874-6-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: Peter Xu <peterx@redhat.com>	2024-05-27 12:42:52 +02:00
Fabiano Rosas	eef0bae3a7	migration: Remove block migration The block migration has been considered obsolete since QEMU 8.2 in favor of the more flexible storage migration provided by the blockdev-mirror driver. Two releases have passed so now it's time to remove it. Deprecation commit `66db46ca83` ("migration: Deprecate block migration"). Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>	2024-05-08 09:20:58 -03:00
Richard Henderson	1e0a7549e1	Error reporting patches for 2024-04-24 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmYouloSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTzLwP+wQjCWJHpTB+uQ3+U5Tb77BUJxuEjDMj txNIJBXHOo7erxTSCieLuQICm8e30z62QAK4nVStyMDcyGh1KfwdSDAxBFnuLpA2 7X5bXbvCrm4vXVASRTV1zKCYDlIXFfrMWLvN5KgM90RsodLcy0szlXg+qYyoIM3Z 8zp0Ug0fQPFHiOAQJi9ZTOsCYJBhZc2sbzgQEmf/g6q9bJaZHzPEHvVT4AQhTAtn 7BIJY+vGDZNZwbP/0obWy2lai3kbGak8OXpwq/bewdrxeRmvqmM7sk+V/P2tXQD+ kZe0/HWuDoO5J8L3KHiJnBJ0KCk8fbo4I0T6v9vf55Sj8K0r7O9sykgXXWv8q0lO GrQa0YcyWAckI41stYQpwEpIlRanuZv/p8OZFJIqsTAfaw7RlbIBYA9xZCUnTton FbHO/t2BLfo8eO9/xRD4r1u6vMbVozImPETuUMPyLHzlrdw2thxddKQNInHYYZ2U SvvaByceEP2UywOnOflZhVL2dIhhnrBztiW2Vqod1fQHpfBAcJn909PZIlPZyMkr gUnABI/rtC/lW3pBee6HmfzJ6Fah0e0XCpCY20qFe27Bi/z3xKi5NWYuyAUG5csp CuTsc4pXfPVj5Z+Mk4pyY8PK5k4jSa7vAVLCLTNzXJLZlJTb6yuf0HsJ7768nHDc hSEIjLwQWYtw =r8Rv -----END PGP SIGNATURE----- Merge tag 'pull-error-2024-04-24' of https://repo.or.cz/qemu/armbru into staging Error reporting patches for 2024-04-24 # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmYouloSHGFybWJydUBy # ZWRoYXQuY29tAAoJEDhwtADrkYZTzLwP+wQjCWJHpTB+uQ3+U5Tb77BUJxuEjDMj # txNIJBXHOo7erxTSCieLuQICm8e30z62QAK4nVStyMDcyGh1KfwdSDAxBFnuLpA2 # 7X5bXbvCrm4vXVASRTV1zKCYDlIXFfrMWLvN5KgM90RsodLcy0szlXg+qYyoIM3Z # 8zp0Ug0fQPFHiOAQJi9ZTOsCYJBhZc2sbzgQEmf/g6q9bJaZHzPEHvVT4AQhTAtn # 7BIJY+vGDZNZwbP/0obWy2lai3kbGak8OXpwq/bewdrxeRmvqmM7sk+V/P2tXQD+ # kZe0/HWuDoO5J8L3KHiJnBJ0KCk8fbo4I0T6v9vf55Sj8K0r7O9sykgXXWv8q0lO # GrQa0YcyWAckI41stYQpwEpIlRanuZv/p8OZFJIqsTAfaw7RlbIBYA9xZCUnTton # FbHO/t2BLfo8eO9/xRD4r1u6vMbVozImPETuUMPyLHzlrdw2thxddKQNInHYYZ2U # SvvaByceEP2UywOnOflZhVL2dIhhnrBztiW2Vqod1fQHpfBAcJn909PZIlPZyMkr # gUnABI/rtC/lW3pBee6HmfzJ6Fah0e0XCpCY20qFe27Bi/z3xKi5NWYuyAUG5csp # CuTsc4pXfPVj5Z+Mk4pyY8PK5k4jSa7vAVLCLTNzXJLZlJTb6yuf0HsJ7768nHDc # hSEIjLwQWYtw # =r8Rv # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Apr 2024 12:52:58 AM PDT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [undefined] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * tag 'pull-error-2024-04-24' of https://repo.or.cz/qemu/armbru: qapi: Inline and remove QERR_PROPERTY_VALUE_BAD definition qapi: Inline and remove QERR_MIGRATION_ACTIVE definition qapi: Correct error message for 'vcpu_dirty_limit' parameter qapi: Inline and remove QERR_INVALID_PARAMETER_TYPE definition qapi: Inline QERR_INVALID_PARAMETER_TYPE definition (constant value) qapi: Inline and remove QERR_INVALID_PARAMETER definition qapi: Inline and remove QERR_DEVICE_NO_HOTPLUG definition qapi: Inline and remove QERR_DEVICE_HAS_NO_MEDIUM definition qapi: Inline and remove QERR_BUS_NO_HOTPLUG definition error: Drop superfluous #include "qapi/qmp/qerror.h" Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-04-24 09:22:42 -07:00
Philippe Mathieu-Daudé	005807860d	qapi: Inline and remove QERR_MIGRATION_ACTIVE definition Address the comment added in commit `4629ed1e98` ("qerror: Finally unused, clean up"), from 2015: /* * These macros will go away, please don't use * in new code, and do not add new ones! */ Mechanical transformation using sed, manually removing the definition in include/qapi/qmp/qerror.h. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240312141343.3168265-10-armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> [Straightforward conflict with commit `aeaafb1e59` (migration: export migration_is_running) resolved]	2024-04-24 09:50:58 +02:00
Cédric Le Goater	e4fa064d56	migration: Add Error** argument to .load_setup() handler This will be useful to report errors at a higher level, mostly in VFIO today. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-9-clg@redhat.com [peterx: drop comment for ERRP_GUARD, per Markus] Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Cédric Le Goater	01c3ac681b	migration: Add Error** argument to .save_setup() handler The purpose is to record a potential error in the migration stream if qemu_savevm_state_setup() fails. Most of the current .save_setup() handlers can be modified to use the Error argument instead of managing their own and calling locally error_report(). Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Cc: John Snow <jsnow@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-8-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Cédric Le Goater	057a20099b	migration: Add Error argument to qemu_savevm_state_setup() This prepares ground for the changes coming next which add an Error argument to the .save_setup() handler. Callers of qemu_savevm_state_setup() now handle the error and fail earlier setting the migration state from MIGRATION_STATUS_SETUP to MIGRATION_STATUS_FAILED. In qemu_savevm_state(), move the cleanup to preserve the error reported by .save_setup() handlers. Since the previous behavior was to ignore errors at this step of migration, this change should be examined closely to check that cleanups are still correctly done. Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-7-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Cédric Le Goater	6138d43ab2	migration: Add Error argument to vmstate_save() This will prepare ground for future changes adding an Error argument to qemu_savevm_state_setup(). Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-6-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-23 18:36:01 -04:00
Wei Wang	7afbdada7e	migration/postcopy: ensure preempt channel is ready before loading states Before loading the guest states, ensure that the preempt channel has been ready to use, as some of the states (e.g. via virtio_load) might trigger page faults that will be handled through the preempt channel. So yield to the main thread in the case that the channel create event hasn't been dispatched. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: `9358982744` ("migration: Send requested page directly in rp-return thread") Originally-by: Lei Wang <lei4.wang@intel.com> Link: https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced249@intel.com/T/ Signed-off-by: Lei Wang <lei4.wang@intel.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Link: https://lore.kernel.org/r/20240405034056.23933-1-wei.w.wang@intel.com [peterx: add a todo section, add Fixes and copy stable for 8.0+] Signed-off-by: Peter Xu <peterx@redhat.com>	2024-04-07 14:40:25 -04:00
Steve Sistare	aeaafb1e59	migration: export migration_is_running Delete the MigrationState parameter from migration_is_running and move it to the public API in misc.h. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-11 16:28:59 -04:00
Cédric Le Goater	e6e08e8323	migration: Do not call PRECOPY_NOTIFY_SETUP notifiers in case of error When commit `bd2270608f` ("migration/ram.c: add a notifier chain for precopy") added PRECOPY_NOTIFY_SETUP notifiers at the end of qemu_savevm_state_setup(), it didn't take into account a possible error in the loop calling vmstate_save() or .save_setup() handlers. Check ret value before calling the notifiers. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240304122844.1888308-10-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-11 14:41:40 -04:00
Avihai Horon	4e1871c450	migration: Don't serialize devices in qemu_savevm_state_iterate() Commit `90697be889` ("live migration: Serialize vmstate saving in stage 2") introduced device serialization in qemu_savevm_state_iterate(). The rationale behind it was to first complete migration of slower changing block devices and only then migrate the RAM, to avoid sending fast changing RAM pages over and over. This commit was added a long time ago, and while it was useful back then, it is not the case anymore: 1. Block migration is deprecated, see commit `66db46ca83` ("migration: Deprecate block migration"). 2. Today there are other iterative devices besides RAM and block, such as VFIO, which are registered for migration after RAM. With current serialization behavior, a fast changing device can block other devices from sending their data, which may prevent migration from converging in some cases. The issue described in item 2 was observed in several VFIO migration scenarios with switchover-ack capability enabled, where some workload on the VM prevented RAM from ever reaching a hard zero, thus blocking VFIO initial pre-copy data from being sent. Hence, destination could not ack switchover and migration could not converge. Fix that by not serializing iterative devices in qemu_savevm_state_iterate(). Note that this still doesn't fully prevent device starvation. As correctly pointed out by Peter [1], a fast changing device might constantly consume all allocated bandwidth and block the following devices. However, this scenario is more likely to happen only if max-bandwidth is low. [1] https://lore.kernel.org/qemu-devel/Zd6iw9dBhW6wKNxx@x1n/ Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240304105339.20713-2-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-11 14:41:40 -04:00
Fabiano Rosas	4ed49feb44	migration/ram: Introduce 'mapped-ram' migration capability Add a new migration capability 'mapped-ram'. The core of the feature is to ensure that RAM pages are mapped directly to offsets in the resulting migration file instead of being streamed at arbitrary points. The reasons why we'd want such behavior are: - The resulting file will have a bounded size, since pages which are dirtied multiple times will always go to a fixed location in the file, rather than constantly being added to a sequential stream. This eliminates cases where a VM with, say, 1G of RAM can result in a migration file that's 10s of GBs, provided that the workload constantly redirties memory. - It paves the way to implement O_DIRECT-enabled save/restore of the migration stream as the pages are ensured to be written at aligned offsets. - It allows the usage of multifd so we can write RAM pages to the migration file in parallel. For now, enabling the capability has no effect. The next couple of patches implement the core functionality. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-8-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-03-01 15:42:04 +08:00
Fabiano Rosas	44d0d456d7	migration: Centralize BH creation and dispatch Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues. Move BH creation and scheduling together and wrap the dispatch with an intermediary function that will ensure we always keep the ref/unref balanced. Also move the responsibility of deleting the BH into the wrapper and remove the now unnecessary pointers. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-01-29 11:02:12 +08:00
Fabiano Rosas	9cf268965d	migration: Reference migration state around loadvm_postcopy_handle_run_bh We need to hold a reference to the current_migration object around async calls to avoid it been freed while still in use. Even on this load-side function, we might still use the MigrationState, e.g to check for capabilities. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>	2024-01-29 11:02:12 +08:00
Peter Maydell	c8193acc07	migration 1st pull for 9.0 - We lost Juan and Leo in the maintainers file - Steven's suspend state fix - Steven's fix for coverity on migrate_mode - Avihai's migration cleanup series -----BEGIN PGP SIGNATURE----- iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+ Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D =sfuM -----END PGP SIGNATURE----- Merge tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu into staging migration 1st pull for 9.0 - We lost Juan and Leo in the maintainers file - Steven's suspend state fix - Steven's fix for coverity on migrate_mode - Avihai's migration cleanup series # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZZY0TxIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wbSxgEAoM5g3wkc22lpAlRpU+hJUqT9NVOVQSK+ # Fk7XJYTdSgABAKzykA6hAmU5Kj+yVI6jI874SVZbs2FWpFs4osvsKk4D # =sfuM # -----END PGP SIGNATURE----- # gpg: Signature made Thu 04 Jan 2024 04:30:07 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown] # gpg: aka "Peter Xu <peterx@redhat.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-20240104-pull-request' of https://gitlab.com/peterx/qemu: (26 commits) migration: fix coverity migrate_mode finding migration/multifd: Remove unnecessary usage of local Error migration: Remove unnecessary usage of local Error migration: Fix migration_channel_read_peek() error path migration/multifd: Remove error_setg() in migration_ioc_process_incoming() migration/multifd: Fix leaking of Error in TLS error flow migration/multifd: Simplify multifd_channel_connect() if else statement migration/multifd: Fix error message in multifd_recv_initial_packet() migration: Remove errp parameter in migration_fd_process_incoming() migration: Refactor migration_incoming_setup() migration: Remove nulling of hostname in migrate_init() migration: Remove migrate_max_downtime() declaration tests/qtest: postcopy migration with suspend tests/qtest: precopy migration with suspend tests/qtest: option to suspend during migration tests/qtest: migration events migration: preserve suspended for bg_migration migration: preserve suspended for snapshot migration: preserve suspended runstate migration: propagate suspended runstate ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-01-05 13:35:25 +00:00
Steve Sistare	58b105703e	migration: preserve suspended for snapshot Restoring a snapshot can break a suspended guest. Snapshots suffer from the same suspended-state issues that affect live migration, plus they must handle an additional problematic scenario, which is that a running vm must remain running if it loads a suspended snapshot. To save, the existing vm_stop call now completely stops the suspended state. Finish with vm_resume to leave the vm in the state it had prior to the save, correctly restoring the suspended state. To load, if the snapshot is not suspended, then vm_stop + vm_resume correctly handles all states, and leaves the vm in the state it had prior to the load. However, if the snapshot is suspended, restoration is trickier. First, call vm_resume to restore the state to suspended so the current state matches the saved state. Then, if the pre-load state is running, call wakeup to resume running. Prior to these changes, the vm_stop to RUN_STATE_SAVE_VM and RUN_STATE_RESTORE_VM did not change runstate if the current state was suspended, but now it does, so allow these transitions. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1704312341-66640-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-01-04 09:52:42 +08:00
Richard Henderson	a77ffe9595	migration: Constify VMState Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-67-richard.henderson@linaro.org>	2023-12-30 07:38:06 +11:00
Richard Henderson	2027001919	migration: Make VMStateDescription.subsections const Allow the array of pointers to itself be const. Propagate this through the copies of this field. Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-2-richard.henderson@linaro.org>	2023-12-29 11:17:30 +11:00
Stefan Hajnoczi	b49f4755c7	block: remove AioContext locking This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-ID: <20231205182011.1976568-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-12-21 22:49:27 +01:00
Cédric Le Goater	721da0396c	util/uuid: Add UUID_STR_LEN definition qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a define for this size and use it where required. Cc: Fam Zheng <fam@euphon.net> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-11-03 09:20:31 +01:00
Peter Xu	3e5f3bcdc2	migration: Add tracepoints for downtime checkpoints This patch is inspired by Joao Martin's patch here: https://lore.kernel.org/r/20230926161841.98464-1-joao.m.martins@oracle.com Add tracepoints for major downtime checkpoints on both src and dst. They share the same tracepoint with a string showing its stage. Besides the checkpoints in the previous patch, this patch also added destination checkpoints. On src, we have these checkpoints added: - src-downtime-start: right before vm stops on src - src-vm-stopped: after vm is fully stopped - src-iterable-saved: after all iterables saved (END sections) - src-non-iterable-saved: after all non-iterable saved (FULL sections) - src-downtime-stop: migration fully completed On dst, we have these checkpoints added: - dst-precopy-loadvm-completes: after loadvm all done for precopy - dst-precopy-bh-: record BH steps to resume VM for precopy - dst-postcopy-bh-: record BH steps to resume VM for postcopy On dst side, we don't have a good way to trace total time consumed by iterable or non-iterable for now. We can mark it by 1st time receiving a FULL / END section, but rather than that let's just rely on the other tracepoints added for vmstates to back up the information. With this patch, one can enable "vmstate_downtime*" tracepoints and it'll enable all tracepoints for downtime measurements necessary. Drop loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service the same purpose, which was only for postcopy. We then have unified prefix for all downtime relevant tracepoints. Co-developed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-6-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Peter Xu	3c80f14272	migration: Add per vmstate downtime tracepoints We have a bunch of savevm_section* tracepoints, they're good to analyze migration stream, but not always suitable if someone would like to analyze the migration downtime. Two major problems: - savevm_section* tracepoints are dumping all sections, we only care about the sections that contribute to the downtime - They don't have an identifier to show the type of sections, so no way to filter downtime information either easily. We can add type into the tracepoints, but instead of doing so, this patch kept them untouched, instead of adding a bunch of downtime specific tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a full picture of how the downtime is distributed across iterative and non-iterative vmstate save/load. Note that here both save() and load() need to be traced, because both of them may contribute to the downtime. The contribution is not a simple "add them together", though: consider when the src is doing a save() of device1 while the dest can be load()ing for device2, so they can happen concurrently. Tracking both sides make sense because device load() and save() can be imbalanced, one device can save() super fast, but load() super slow, vice versa. We can't figure that out without tracing both. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-4-peterx@redhat.com>	2023-11-01 16:13:58 +01:00
Peter Xu	caa91b3c44	migration: Check in savevm_state_handler_insert for dups Before finally register one SaveStateEntry, we detect for duplicated entries. This could be helpful to notify us asap instead of get silent migration failures which could be hard to diagnose. For example, this patch will generate a message like this (if without previous fixes on x2apic) as long as we wants to boot a VM instance with "-smp 200,maxcpus=288,sockets=2,cores=72,threads=2" and QEMU will bail out even before VM starts: savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=apic, instance_id=0x0 Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231020090731.28701-10-quintela@redhat.com>	2023-11-01 16:13:58 +01:00
Juan Quintela	485fb95546	migration: Hack to maintain backwards compatibility for ppc Current code does: - register pre_2_10_vmstate_dummy_icp with "icp/server" and instance dependinfg on cpu number - for newer machines, it register vmstate_icp with "icp/server" name and instance 0 - now it unregisters "icp/server" for the 1st instance. This is wrong at many levels: - we shouldn't have two VMSTATEDescriptions with the same name - In case this is the only solution that we can came with, it needs to be: * register pre_2_10_vmstate_dummy_icp * unregister pre_2_10_vmstate_dummy_icp * register real vmstate_icp Created vmstate_replace_hack_for_ppc() with warnings left and right that it is a hack. CC: Cedric Le Goater <clg@kaod.org> CC: Daniel Henrique Barboza <danielhb413@gmail.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Greg Kurz <groug@kaod.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231020090731.28701-8-quintela@redhat.com>	2023-11-01 16:13:58 +01:00
Juan Quintela	be07a0ed22	qemu-file: Make qemu_fflush() return errors This let us simplify code of this shape. qemu_fflush(f); int ret = qemu_file_get_error(f); if (ret) { return ret; } into: int ret = qemu_fflush(f); if (ret) { return ret; } I updated all callers where there is any error check. qemu_fclose() don't need to check for f->last_error because qemu_fflush() returns it at the beggining of the function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-13-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Juan Quintela	e9c0eed7c2	qemu-file: Remove _noflush from qemu_file_transferred_noflush() qemu_file_transferred() don't exist anymore, so we can reuse the name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231025091117.6342-7-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-10-31 08:44:33 +01:00
Marc-André Lureau	4d8bdc2ae0	migration: rename vmstate_save_needed->vmstate_section_needed The function is used on save at this point. The following commits will use it on load. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231024084043.2926316-5-marcandre.lureau@redhat.com>	2023-10-30 17:44:03 +01:00
Fiona Ebner	930e239d11	migration: hold the BQL during setup This is intended to be a semantic revert of commit `9b09503752` ("migration: run setup callbacks out of big lock"). There have been so many changes since that commit (e.g. a new setup callback dirty_bitmap_save_setup() that also needs to be adapted now), it's easier to do the revert manually. For snapshots, the bdrv_writev_vmstate() function is used during setup (in QIOChannelBlock backing the QEMUFile), but not holding the BQL while calling it could lead to an assertion failure. To understand how, first note the following: 1. Generated coroutine wrappers for block layer functions spawn the coroutine and use AIO_WAIT_WHILE()/aio_poll() to wait for it. 2. If the host OS switches threads at an inconvenient time, it can happen that a bottom half scheduled for the main thread's AioContext is executed as part of a vCPU thread's aio_poll(). An example leading to the assertion failure is as follows: main thread: 1. A snapshot-save QMP command gets issued. 2. snapshot_save_job_bh() is scheduled. vCPU thread: 3. aio_poll() for the main thread's AioContext is called (e.g. when the guest writes to a pflash device, as part of blk_pwrite which is a generated coroutine wrapper). 4. snapshot_save_job_bh() is executed as part of aio_poll(). 3. qemu_savevm_state() is called. 4. qemu_mutex_unlock_iothread() is called. Now qemu_get_current_aio_context() returns 0x0. 5. bdrv_writev_vmstate() is executed during the usual savevm setup via qemu_fflush(). But this function is a generated coroutine wrapper, so it uses AIO_WAIT_WHILE. There, the assertion assert(qemu_get_current_aio_context() == qemu_get_aio_context()); will fail. To fix it, ensure that the BQL is held during setup. While it would only be needed for snapshots, adapting migration too avoids additional logic for conditional locking/unlocking in the setup callbacks. Writing the header could (in theory) also trigger qemu_fflush() and thus bdrv_writev_vmstate(), so the locked section also covers the qemu_savevm_state_header() call, even for migration for consistency. The section around multifd_send_sync_main() needs to be unlocked to avoid a deadlock. In particular, the multifd_save_setup() function calls socket_send_channel_create() using multifd_new_send_channel_async() as a callback and then waits for the callback to signal via the channels_ready semaphore. The connection happens via qio_task_run_in_thread(), but the callback is only executed via qio_task_thread_result() which is scheduled for the main event loop. Without unlocking the section, the main thread would never get to process the task result and the callback meaning there would be no signal via the channels_ready semaphore. The comment in ram_init_bitmaps() was introduced by `4987783400` ("migration: fix incorrect memory_global_dirty_log_start outside BQL") and is removed, because it referred to the qemu_mutex_lock_iothread() call. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231013105839.415989-1-f.ebner@proxmox.com>	2023-10-17 09:25:13 +02:00
Nikolay Borisov	2aae1eb8da	migration: Add the configuration vmstate to the json writer Make the migration json writer part of MigrationState struct, allowing the 'configuration' object be serialized to json. This will facilitate the parsing of the 'configuration' object in the next patch that fixes analyze-migration.py for arm. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231009184326.15777-2-farosas@suse.de>	2023-10-17 09:14:32 +02:00
Peter Xu	0e99bb8f54	migration: Allow RECOVER->PAUSED convertion for dest qemu There's a bug on dest that if a double fault triggered on dest qemu (a network issue during postcopy-recover), we won't set PAUSED correctly because we assumed we always came from ACTIVE. Fix that by always overwriting the state to PAUSE. We could also check for these two states, but maybe it's an overkill. We did the same on the src QEMU to unconditionally switch to PAUSE anyway. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231004220240.167175-10-peterx@redhat.com>	2023-10-11 11:17:02 +02:00
Tejus GK	848a050342	migration: Update error description outside migration.c A few code paths exist in the source code,where a migration is marked as failed via MIGRATION_STATUS_FAILED, but the failure happens outside of migration.c In such cases, an error_report() call is made, however the current MigrationState is never updated with the error description, and hence clients like libvirt never know the actual reason for the failure. This patch covers such cases outside of migration.c and updates the error description at the appropriate places. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231003065538.244752-3-tejus.gk@nutanix.com>	2023-10-04 10:54:40 +02:00
Tejus GK	969298f9d7	migration/vmstate: Introduce vmstate_save_state_with_err Currently, a few code paths exist in the function vmstate_save_state_v, which ultimately leads to a migration failure. However, an update in the current MigrationState for the error description is never done. vmstate.c somehow doesn't seem to allow the use of migrate_set_error due to some dependencies for unit tests. Hence, this patch introduces a new function vmstate_save_state_with_err, which will eventually propagate the error message to savevm.c where a migrate_set_error call can be eventually done. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231003065538.244752-2-tejus.gk@nutanix.com>	2023-10-04 10:54:40 +02:00
Avihai Horon	08fc4cb517	migration: Add .save_prepare() handler to struct SaveVMHandlers Add a new .save_prepare() handler to struct SaveVMHandlers. This handler is called early, even before migration starts, and can be used by devices to perform early checks. Refactor migrate_init() to be able to return errors and call .save_prepare() from there. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	f543aa222d	migration: Move more initializations to migrate_init() Initialization of mig_stats, compression_counters and VFIO bytes transferred is hard-coded in migration code path and snapshot code path. Make the code cleaner by initializing them in migrate_init(). Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Avihai Horon	38c482b477	migration: Add migration prefix to functions in target.c The functions in target.c are not static, yet they don't have a proper migration prefix. Add such prefix. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-09-11 08:34:06 +02:00
Juan Quintela	cf786549ce	migration: Change qemu_file_transferred to noflush We do a qemu_fclose() just after that, that also does a qemu_fflush(), so remove one qemu_fflush(). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230530183941.7223-3-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	fc95c63b60	qemu-file: Rename qemu_file_transferred_ fast -> noflush Fast don't say much. Noflush indicates more clearly that it is like qemu_file_transferred but without the flush. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230530183941.7223-2-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Michael Tokarev	d8b71d96b3	migration: spelling fixes Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-07-25 17:13:20 +03:00
Avihai Horon	808642a2f6	vfio/migration: Reset bytes_transferred properly Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	1b4adb10f8	migration: Implement switchover ack logic Implement switchover ack logic. This prevents the source from stopping the VM and completing the migration until an ACK is received from the destination that it's OK to do so. To achieve this, a new SaveVMHandlers handler switchover_ack_needed() and a new return path message MIG_RP_MSG_SWITCHOVER_ACK are added. The switchover_ack_needed() handler is called during migration setup in the destination to check if switchover ack is used by the migrated device. When switchover is approved by all migrated devices in the destination that support this capability, the MIG_RP_MSG_SWITCHOVER_ACK return path message is sent to the source to notify it that it's OK to do switchover. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Vladimir Sementsov-Ogievskiy	c33f1829f8	migration: never fail in global_state_store() Actually global_state_store() can never fail. Let's get rid of extra error paths. To make things clear, use new runstate_get() and use same approach for global_state_store() and global_state_store_running(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230517123752.21615-3-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-06-02 01:03:19 +02:00
Juan Quintela	e1fde0e038	migration: Move rate_limit_max and rate_limit_used to migration_stats These way we can make them atomic and use this functions from any place. I also moved all functions that use rate_limit to migration-stats. Functions got renamed, they are not qemu_file anymore. qemu_file_rate_limit -> migration_rate_exceeded qemu_file_set_rate_limit -> migration_rate_set qemu_file_get_rate_limit -> migration_rate_get qemu_file_reset_rate_limit -> migration_rate_reset qemu_file_acct_rate_limit -> migration_rate_account. Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230515195709.63843-6-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-05-18 18:40:51 +02:00
Juan Quintela	6da835d42a	qemu-file: Remove total from qemu_file_total_transferred_*() Function is already quite long. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20230508130909.65420-7-quintela@redhat.com>	2023-05-15 13:46:14 +02:00
Juan Quintela	61abf1ebdc	qemu-file: Make total_transferred an uint64_t Change all the functions that use it. It was already passed as uint64_t. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20230504113841.23130-8-quintela@redhat.com>	2023-05-05 02:01:59 +02:00
Juan Quintela	aff3f6606d	migration: Rename ram_counters to mig_stats migration_stats is just too long, and it is going to have more than ram counters in the near future. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:20 +02:00
Juan Quintela	947701cc1a	migration: Move ram_stats to its own file migration-stats.[ch] There is already include/qemu/stats.h, so stats.h was a bad idea. We want this file to not depend on anything else, we will move all the migration counters/stats to this struct. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:19 +02:00
Peter Xu	12c81e5ae9	migration/vmstate-dump: Dump array size too as "num" For VMS_ARRAY typed vmsd fields, also dump the number of entries in the array in -vmstate-dump. Without such information, vmstate static checker can report false negatives of incompatible vmsd on VMS_ARRAY typed fields, when the src/dst do not have the same type of array defined. It's because in the checker we only check against size of fields within a VMSD field. One example: e1000e used to have a field defined as a boolean array with 5 entries, then removed it and replaced it with UNUSED (in `31e3f318c8`): - VMSTATE_BOOL_ARRAY(core.eitr_intr_pending, E1000EState, - E1000E_MSIX_VEC_NUM), + VMSTATE_UNUSED(E1000E_MSIX_VEC_NUM), It's a legal replacement but vmstate static checker is not happy with it, because it checks only against the "size" field between the two fields (here one is BOOL_ARRAY, the other is UNUSED): For BOOL_ARRAY: { "field": "core.eitr_intr_pending", "version_id": 0, "field_exists": false, "size": 1 }, For UNUSED: { "field": "unused", "version_id": 0, "field_exists": false, "size": 5 }, It's not the script to blame because there's just not enough information dumped to show the total size of the entry for an array. Add it. Note that this will not break old vmstate checker because the field will just be ignored. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-27 10:18:25 +02:00
Peter Xu	74c38cf7fd	migration: Allow postcopy_ram_supported_by_host() to report err Instead of print it to STDERR, bring the error upwards so that it can be reported via QMP responses. E.g.: { "execute": "migrate-set-capabilities" , "arguments": { "capabilities": [ { "capability": "postcopy-ram", "state": true } ] } } { "error": { "class": "GenericError", "desc": "Postcopy is not supported: Host backend files need to be TMPFS or HUGETLBFS only" } } Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-27 10:18:25 +02:00
Juan Quintela	9d4b1e5f22	migration: Move migrate_use_block() to options.c Once that we are there, we rename the function to migrate_block() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00

1 2 3 4 5 ...

360 Commits