mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Cédric Le Goater	6ede082daf	util/uuid: Add UUID_STR_LEN definition qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Add a define for this size and use it where required. Cc: Fam Zheng <fam@euphon.net> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Cédric Le Goater <clg@redhat.com> (cherry picked from commit `721da0396c`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-11-09 16:39:13 +03:00
Juan Quintela	95b3854bf7	migration: Non multifd migration don't care about multifd flushes RDMA was having trouble because migrate_multifd_flush_after_each_section() can only be true or false, but we don't want to send any flush when we are not in multifd migration. CC: Fabiano Rosas <farosas@suse.de Fixes: `294e5a4034` ("multifd: Only flush once each full round of memory") Reported-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231011205548.10571-2-quintela@redhat.com> (cherry picked from commit `d4f34485ca`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-19 14:52:59 +03:00
Peter Xu	b6170717ea	migration/qmp: Fix crash on setting tls-authz with null QEMU will crash if anyone tries to set tls-authz (which is a type StrOrNull) with 'null' value. Fix it in the easy way by converting it to qstring just like the other two tls parameters. Cc: qemu-stable@nongnu.org # v4.0+ Fixes: `d2f1d29b95` ("migration: add support for a "tls-authz" migration parameter") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230905162335.235619-2-peterx@redhat.com> (cherry picked from commit `86dec715a7`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-12 00:57:19 +03:00
Fabiano Rosas	4ade907b30	migration: Move return path cleanup to main migration thread Now that the return path thread is allowed to finish during a paused migration, we can move the cleanup of the QEMUFiles to the main migration thread. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-9-farosas@suse.de> (cherry picked from commit `36e9aab3c5`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	dec7785fab	migration: Replace the return path retry logic Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup before the cleanup code in the return path thread has had a chance to run, leading to the new file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration \| qmp \| return path --------------------------+-----------------------------+--------------------------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error = -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume = state == PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res = qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file = NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp = ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ------------------------------------------------------------------------------------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by waiting for the thread to finish before pausing the migration. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-8-farosas@suse.de> (cherry picked from commit `ef796ee93b`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	1ad3fa152c	migration: Consolidate return path closing code We'll start calling the await_return_path_close_on_source() function from other parts of the code, so move all of the related checks and tracepoints into it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-7-farosas@suse.de> (cherry picked from commit `d50f5dc075`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	d37260b9f0	migration: Remove redundant cleanup of postcopy_qemufile_src This file is owned by the return path thread which is already doing cleanup. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-6-farosas@suse.de> (cherry picked from commit `b3b101157d`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	73393af917	migration: Fix possible race when shutting down to_dst_file It's not safe to call qemu_file_shutdown() on the to_dst_file without first checking for the file's presence under the lock. The cleanup of this file happens at postcopy_pause() and migrate_fd_cleanup() which are not necessarily running in the same thread as migrate_fd_cancel(). Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-5-farosas@suse.de> (cherry picked from commit `7478fb0df9`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	f5480c4d82	migration: Fix possible races when shutting down the return path We cannot call qemu_file_shutdown() on the return path file without taking the file lock. The return path thread could be running it's cleanup code and have just cleared the from_dst_file pointer. Checking ms->to_dst_file for errors could also race with migrate_fd_cleanup() which clears the to_dst_file pointer. Protect both accesses by taking the file lock. This was caught by inspection, it should be rare, but the next patches will start calling this code from other places, so let's do the correct thing. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-4-farosas@suse.de> (cherry picked from commit `639decf529`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	cc3a33400c	migration: Fix possible race when setting rp_state.error We don't need to set the rp_state.error right after a shutdown because qemu_file_shutdown() always sets the QEMUFile error, so the return path thread would have seen it and set the rp error itself. Setting the error outside of the thread is also racy because the thread could clear it after we set it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-3-farosas@suse.de> (cherry picked from commit `28a8347281`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Peter Xu	0b246f8e9e	migration: Fix race that dest preempt thread close too early We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-2-farosas@suse.de> (cherry picked from commit `cf02f29e1e`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-10-03 02:00:54 +03:00
Fabiano Rosas	86d7b08d71	block-migration: Ensure we don't crash during migration cleanup We can fail the blk_insert_bs() at init_blk_migration(), leaving the BlkMigDevState without a dirty_bitmap and BlockDriverState. Account for the possibly missing elements when doing cleanup. Fix the following crashes: Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359 359 BlockDriverState *bs = bitmap->bs; #0 0x0000555555ec83ef in bdrv_release_dirty_bitmap (bitmap=0x0) at ../block/dirty-bitmap.c:359 #1 0x0000555555bba331 in unset_dirty_tracking () at ../migration/block.c:371 #2 0x0000555555bbad98 in block_migration_cleanup_bmds () at ../migration/block.c:681 Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault. 0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073 7073 QLIST_FOREACH_SAFE(blocker, &bs->op_blockers[op], list, next) { #0 0x0000555555e971ff in bdrv_op_unblock (bs=0x0, op=BLOCK_OP_TYPE_BACKUP_SOURCE, reason=0x0) at ../block.c:7073 #1 0x0000555555e9734a in bdrv_op_unblock_all (bs=0x0, reason=0x0) at ../block.c:7095 #2 0x0000555555bbae13 in block_migration_cleanup_bmds () at ../migration/block.c:690 Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-id: 20230731203338.27581-1-farosas@suse.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> (cherry picked from commit `f187609f27`) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-09-21 19:35:19 +03:00
Juan Quintela	697c4c86ab	migration/rdma: Split qemu_fopen_rdma() into input/output functions This is how everything else in QEMUFile is structured. As a bonus they are three less lines of code. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20230530183941.7223-17-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	ac6f48e15d	qemu-file: Make qemu_file_get_error_obj() static It was not used outside of qemu_file.c anyways. Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20230530183941.7223-21-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	8c5ee0bfb8	qemu-file: Simplify qemu_file_shutdown() Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20230530183941.7223-20-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	9ccf83f486	qemu_file: Make qemu_file_is_writable() static It is not used outside of qemu_file, and it shouldn't. Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230530183941.7223-19-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	cf786549ce	migration: Change qemu_file_transferred to noflush We do a qemu_fclose() just after that, that also does a qemu_fflush(), so remove one qemu_fflush(). Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230530183941.7223-3-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Juan Quintela	fc95c63b60	qemu-file: Rename qemu_file_transferred_ fast -> noflush Fast don't say much. Noflush indicates more clearly that it is like qemu_file_transferred but without the flush. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230530183941.7223-2-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Wei Wang	82137e6c8c	migration: enforce multifd and postcopy preempt to be set before incoming qemu_start_incoming_migration needs to check the number of multifd channels or postcopy ram channels to configure the backlog parameter (i.e. the maximum length to which the queue of pending connections for sockfd may grow) of listen(). So enforce the usage of postcopy-preempt and multifd as below: - need to use "-incoming defer" on the destination; and - set_capability and set_parameter need to be done before migrate_incoming Otherwise, disable the use of the features and report error messages to remind users to adjust the commands. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-ID: <20230606101910.20456-2-wei.w.wang@intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Tejus GK	908927db28	migration: Update error description whenever migration fails There are places in migration.c where the migration is marked failed with MIGRATION_STATUS_FAILED, but the failure reason is never updated. Hence libvirt doesn't know why the migration failed when it queries for it. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Tejus GK <tejus.gk@nutanix.com> Message-ID: <20230621130940.178659-2-tejus.gk@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	15699cf542	migration: Extend query-migrate to provide dirty page limit info Extend query-migrate to provide throttle time and estimated ring full time with dirty-limit capability enabled, through which we can observe if dirty limit take effect during live migration. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-ID: <168733225273.5845.15871826788879741674-8@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	acac51ba24	migration: Implement dirty-limit convergence algo Implement dirty-limit convergence algo for live migration, which is kind of like auto-converge algo but using dirty-limit instead of cpu throttle to make migration convergent. Enable dirty page limit if dirty_rate_high_cnt greater than 2 when dirty-limit capability enabled, Disable dirty-limit if migration be canceled. Note that "set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit" commands are not allowed during dirty-limit live migration. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-ID: <168733225273.5845.15871826788879741674-7@git.sr.ht> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	310ad5625e	migration: Put the detection logic before auto-converge checking This commit is prepared for the implementation of dirty-limit convergence algo. The detection logic of throttling condition can apply to both auto-converge and dirty-limit algo, putting it's position before the checking logic for auto-converge feature. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-ID: <168733225273.5845.15871826788879741674-6@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	bb9993c672	migration: Refactor auto-converge capability logic Check if block migration is running before throttling guest down in auto-converge way. Note that this modification is kind of like code clean, because block migration does not depend on auto-converge capability, so the order of checks can be adjusted. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <168618975839.6361.17407633874747688653-5@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	dc62395557	migration: Introduce dirty-limit capability Introduce migration dirty-limit capability, which can be turned on before live migration and limit dirty page rate durty live migration. Introduce migrate_dirty_limit function to help check if dirty-limit capability enabled during live migration. Meanwhile, refactor vcpu_dirty_rate_stat_collect so that period can be configured instead of hardcoded. dirty-limit capability is kind of like auto-converge but using dirty limit instead of traditional cpu-throttle to throttle guest down. To enable this feature, turn on the dirty-limit capability before live migration using migrate-set-capabilities, and set the parameters "x-vcpu-dirty-limit-period", "vcpu-dirty-limit" suitably to speed up convergence. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <168618975839.6361.17407633874747688653-4@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	09f9ec9913	qapi/migration: Introduce vcpu-dirty-limit parameters Introduce "vcpu-dirty-limit" migration parameter used to limit dirty page rate during live migration. "vcpu-dirty-limit" and "x-vcpu-dirty-limit-period" are two dirty-limit-related migration parameters, which can be set before and during live migration by qmp migrate-set-parameters. This two parameters are used to help implement the dirty page rate limit algo of migration. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <168618975839.6361.17407633874747688653-3@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Hyman Huang(黄勇)	4d80785719	qapi/migration: Introduce x-vcpu-dirty-limit-period parameter Introduce "x-vcpu-dirty-limit-period" migration experimental parameter, which is in the range of 1 to 1000ms and used to make dirtyrate calculation period configurable. Currently with the "x-vcpu-dirty-limit-period" varies, the total time of live migration changes, test results show the optimal value of "x-vcpu-dirty-limit-period" ranges from 500ms to 1000 ms. "x-vcpu-dirty-limit-period" should be made stable once it proves best value can not be determined with developer's experiments. Signed-off-by: Hyman Huang(黄勇) <yong.huang@smartx.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <168618975839.6361.17407633874747688653-2@git.sr.ht> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Fabiano Rosas	01ec0f3a92	migration/multifd: Protect accesses to migration_threads This doubly linked list is common for all the multifd and migration threads so we need to avoid concurrent access. Add a mutex to protect the data from concurrent access. This fixes a crash when removing two MigrationThread objects from the list at the same time during cleanup of multifd threads. Fixes: `671326201d` ("migration: Introduce interface query-migrationthreads") Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230607161306.31425-3-farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Fabiano Rosas	788fa68041	migration/multifd: Rename threadinfo.c functions We're about to add more functions to this file so make it use the same coding style as the rest of the code. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230607161306.31425-2-farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-07-26 10:55:56 +02:00
Michael Tokarev	d8b71d96b3	migration: spelling fixes Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-07-25 17:13:20 +03:00
David Hildenbrand	f161c88a03	migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() virtio-mem wants to know whether it should not mess with the RAMBlock content (e.g., discard RAM, preallocate memory) on incoming migration. So let's expose that function as migrate_ram_is_ignored() in migration/misc.h Message-ID: <20230706075612.67404-4-david@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2023-07-12 09:25:37 +02:00
Laszlo Ersek	aaf26bd382	migration: unexport migrate_fd_error() The only migrate_fd_error() call sites are in "migration/migration.c", which is also where we define migrate_fd_error(). Make the function static, and remove its declaration from "migration/migration.h". Cc: Juan Quintela <quintela@redhat.com> (maintainer:Migration) Cc: Leonardo Bras <leobras@redhat.com> (reviewer:Migration) Cc: Peter Xu <peterx@redhat.com> (reviewer:Migration) Cc: qemu-trivial@nongnu.org Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-07-08 07:24:38 +03:00
Laszlo Ersek	8c69ae9eff	migration: factor out "resume_requested" in qmp_migrate() It cuts back on those awkward, duplicated !(has_resume && resume) expressions. Cc: Juan Quintela <quintela@redhat.com> (maintainer:Migration) Cc: Leonardo Bras <leobras@redhat.com> (reviewer:Migration) Cc: Peter Xu <peterx@redhat.com> (reviewer:Migration) Cc: qemu-trivial@nongnu.org Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-07-08 07:24:38 +03:00
Avihai Horon	808642a2f6	vfio/migration: Reset bytes_transferred properly Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	538ef4fe2f	migration: Enable switchover ack capability Now that switchover ack logic has been implemented, enable the capability. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	1b4adb10f8	migration: Implement switchover ack logic Implement switchover ack logic. This prevents the source from stopping the VM and completing the migration until an ACK is received from the destination that it's OK to do so. To achieve this, a new SaveVMHandlers handler switchover_ack_needed() and a new return path message MIG_RP_MSG_SWITCHOVER_ACK are added. The switchover_ack_needed() handler is called during migration setup in the destination to check if switchover ack is used by the migrated device. When switchover is approved by all migrated devices in the destination that support this capability, the MIG_RP_MSG_SWITCHOVER_ACK return path message is sent to the source to notify it that it's OK to do switchover. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Avihai Horon	6574232fff	migration: Add switchover ack capability Migration downtime estimation is calculated based on bandwidth and remaining migration data. This assumes that loading of migration data in the destination takes a negligible amount of time and that downtime depends only on network speed. While this may be true for RAM, it's not necessarily true for other migrated devices. For example, loading the data of a VFIO device in the destination might require from the device to allocate resources, prepare internal data structures and so on. These operations can take a significant amount of time which can increase migration downtime. This patch adds a new capability "switchover ack" that prevents the source from stopping the VM and completing the migration until an ACK is received from the destination that it's OK to do so. This can be used by migrated devices in various ways to reduce downtime. For example, a device can send initial precopy metadata to pre-allocate resources in the destination and use this capability to make sure that the pre-allocation is completed before the source VM is stopped, so it will have full effect. This new capability relies on the return path capability to communicate from the destination back to the source. The actual implementation of the capability will be added in the following patches. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Tested-by: YangHang Liu <yanghliu@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>	2023-06-30 06:02:51 +02:00
Philippe Mathieu-Daudé	de6cd7599b	meson: Replace softmmu_ss -> system_ss We use the user_ss[] array to hold the user emulation sources, and the softmmu_ss[] array to hold the system emulation ones. Hold the latter in the 'system_ss[]' array for parity with user emulation. Mechanical change doing: $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230613133347.82210-10-philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé	c7b64948f8	meson: Replace CONFIG_SOFTMMU -> CONFIG_SYSTEM_ONLY Since we might have user emulation with softmmu, use the clearer 'CONFIG_SYSTEM_ONLY' key to check for system emulation. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230613133347.82210-9-philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-06-20 10:01:30 +02:00
Steve Sistare	b0182e537e	exec/memory: Introduce RAM_NAMED_FILE flag migrate_ignore_shared() is an optimization that avoids copying memory that is visible and can be mapped on the target. However, a memory-backend-ram or a memory-backend-memfd block with the RAM_SHARED flag set is not migrated when migrate_ignore_shared() is true. This is wrong, because the block has no named backing store, and its contents will be lost. To fix, ignore shared memory iff it is a named file. Define a new flag RAM_NAMED_FILE to distinguish this case. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1686151116-253260-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-13 11:28:58 +02:00
Philippe Mathieu-Daudé	7d5b0d6864	bulk: Remove pointless QOM casts Mechanical change running Coccinelle spatch with content generated from the qom-cast-macro-clean-cocci-gen.py added in the previous commit. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230601093452.38972-3-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>	2023-06-05 20:48:34 +02:00
Fiona Ebner	3a8b81f2e6	migration: stop tracking ram writes when cancelling background migration Currently, it is only done when the iteration finishes successfully. Not cleaning up the userfaultfd write protection can lead to symptoms/issues such as the process hanging in memmove or GDB not being able to attach. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-Id: <20230526115908.196171-1-f.ebner@proxmox.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy	a4c6275aa1	migration: restore vmstate on migration failure 1. Otherwise failed migration just drops guest-panicked state, which is not good for management software. 2. We do keep different paused states like guest-panicked during migration with help of global_state state. 3. We do restore running state on source when migration is cancelled or failed. 4. "postmigrate" state is documented as "guest is paused following a successful 'migrate'", so originally it's only for successful path and we never documented current behavior. Let's restore paused states like guest-panicked in case of cancel or fail too. Allow same transitions like for inmigrate state. This commit changes the behavior that was introduced by commit `42da5550d6` "migration: set state to post-migrate on failure" and provides a bit different fix on related https://bugzilla.redhat.com/show_bug.cgi?id=1355683 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230517123752.21615-6-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy	f4584076fc	migration: switch from .vm_was_running to .vm_old_state No logic change here, only refactoring. That's a preparation for next commit where we finally restore the stopped vm state on migration failure or cancellation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230517123752.21615-5-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-06-02 01:03:19 +02:00
Vladimir Sementsov-Ogievskiy	c33f1829f8	migration: never fail in global_state_store() Actually global_state_store() can never fail. Let's get rid of extra error paths. To make things clear, use new runstate_get() and use same approach for global_state_store() and global_state_store_running(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230517123752.21615-3-vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-06-02 01:03:19 +02:00
Stefan Hajnoczi	60f782b6b7	aio: remove aio_disable_external() API All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-05-30 17:37:26 +02:00
Richard Henderson	b5c0d842d6	migration: Build migration_files once The items in migration_files are built for libmigration and included info softmmu_ss from there; no need to also include them directly. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-23 16:51:18 -07:00
Richard Henderson	7ba7db9fa1	migration/xbzrle: Use i386 host/cpuinfo.h Perform the function selection once, and only if CONFIG_AVX512_OPT is enabled. Centralize the selection to xbzrle.c, instead of spreading the init across 3 files. Remove xbzrle-bench.c. The benefit of being able to benchmark the different implementations is less important than not peeking into the internals of the implementation. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-23 16:51:18 -07:00
Richard Henderson	1b48d0abdf	migration/xbzrle: Shuffle function order Place the CONFIG_AVX512BW_OPT block at the top, which will aid function selection in the next patch. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-23 16:51:18 -07:00
Richard Henderson	146f515110	Migration Pull request Hi Based on latest reviewed parts of migration: - Disable colo (vladimir) - Migration atomic counters (juan) Please apply. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy 1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P 7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7 atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ 1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA= =3x32 -----END PGP SIGNATURE----- Merge tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu into staging Migration Pull request Hi Based on latest reviewed parts of migration: - Disable colo (vladimir) - Migration atomic counters (juan) Please apply. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmRmXJUACgkQ9IfvGFhy # 1yNRAxAAjDYJELL34Qovt/WE9qKhYJEvIUGTl1IMWJ22YMFnqIFKRdka57dWoU3P # 7EK1BHmokEEtzGT7Fe1ecERXsOwQIJDIkDTJ5g8Oc8Jt1iqY1AC8h5T+LghijCar # mbZ6qWHaSjsg2lmek/xc9quymzFGGK36PSyB5WkaLRviKQn4RIkEDpUaWny7nDbA # Q8zJJpBqNFqKfC5/DN0ePa3QQscXQJhey3nxqFd8hYp8RFNIV5UJVW5Lf6ombtK7 # atgdWC4ckkfO2z3OsghKeo/UiMFWpPktgBVVMhDLmk+P/E6czc2gfzD6SCvrPKTj # XowI8hro22HVmq9bEY8PtbjMOfpxrAxer+tM2KR/0O9l3UzUacFsi7KGqCJ1/trQ # 1tSDjlgyczb8GOgLwwxj8XE+jPHPfVrzCNfDqrBKBNxz6nnZSdZUwhV5mG8FdVtm # oVVV96BIrNXLl/lIxYIFD/Zyvl8/lrSWQdLkEHTzihYQeXaQfyvPVbV/dOLT4sii # YUuGCuEhF+DW/qz43G1krwq5/bfxsiZoQzrMV/Odtf0wYQKkabA3KNBIda/vxBCR # dsLQ7QtmOwKmCzjqw4LUov9vDNYOYr98o7ZqwJ3qeKL4QgFwtEZUFO3VW6UR8fnF # arVXiTn9wVlkTpu4sT5hLm9400iadhX4Fppji7Ce0tUpLbWbghA= # =3x32 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 18 May 2023 10:12:53 AM PDT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [undefined] # gpg: aka "Juan Quintela <quintela@trasno.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * tag 'migration-20230518-pull-request' of https://gitlab.com/juan.quintela/qemu: migration: Fix duplicated included in meson.build migration/multifd: Compute transferred bytes correctly migration: We don't need the field rate_limit_used anymore migration: Use migration_transferred_bytes() to calculate rate_limit migration: Add a trace for migration_transferred_bytes migration: Move migration_total_bytes() to migration-stats.c migration: Move rate_limit_max and rate_limit_used to migration_stats qemu-file: Account for rate_limit usage on qemu_fflush() migration: Don't use INT64_MAX for unlimited rate migration: process_incoming_migration_co(): move colo part to colo migration: split migration_incoming_co configure: add --disable-colo-proxy option Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-05-18 11:07:06 -07:00

1 2 3 4 5 ...

1902 Commits