mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Juan Quintela	b02c7fc9ef	migration: Pass migrate_caps_check() the old and new caps We used to pass the old capabilities array and the new capabilities as a list. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 11:29:02 +02:00
Juan Quintela	0cec2056ff	migration: rename enabled_capabilities to capabilities It is clear from the context what that means, and such a long name with the extra long names of the capabilities make very difficilut to stay inside the 80 columns limit. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 11:29:01 +02:00
Peter Xu	ae30b9b289	migration/postcopy: Detect file system on dest host Postcopy requires the memory support userfaultfd to work. Right now we check it but it's a bit too late (when switching to postcopy migration). Do that early right at enabling of postcopy. Note that this is still only a best effort because ramblocks can be dynamically created. We can add check in hostmem creations and fail if postcopy enabled, but maybe that's too aggressive. Still, we have chance to fail the most obvious where we know there's an existing unsupported ramblock. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:29:01 +02:00
Eric Blake	403d18ae38	migration: Handle block device inactivation failures better Consider what happens when performing a migration between two host machines connected to an NFS server serving multiple block devices to the guest, when the NFS server becomes unavailable. The migration attempts to inactivate all block devices on the source (a necessary step before the destination can take over); but if the NFS server is non-responsive, the attempt to inactivate can itself fail. When that happens, the destination fails to get the migrated guest (good, because the source wasn't able to flush everything properly): (qemu) qemu-kvm: load of migration failed: Input/output error at which point, our only hope for the guest is for the source to take back control. With the current code base, the host outputs a message, but then appears to resume: (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) (src qemu)info status VM status: running but a second migration attempt now asserts: (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. Whether the guest is recoverable on the source after the first failure is debatable, but what we do not want is to have qemu itself fail due to an assertion. It looks like the problem is as follows: In migration.c:migration_completion(), the source sets 'inactivate' to true (since COLO is not enabled), then tries savevm.c:qemu_savevm_state_complete_precopy() with a request to inactivate block devices. In turn, this calls block.c:bdrv_inactivate_all(), which fails when flushing runs up against the non-responsive NFS server. With savevm failing, we are now left in a state where some, but not all, of the block devices have been inactivated; but migration_completion() then jumps to 'fail' rather than 'fail_invalidate' and skips an attempt to reclaim those those disks by calling bdrv_activate_all(). Even if we do attempt to reclaim disks, we aren't taking note of failure there, either. Thus, we have reached a state where the migration engine has forgotten all state about whether a block device is inactive, because we did not set s->block_inactive in enough places; so migration allows the source to reach vm_start() and resume execution, violating the block layer invariant that the guest CPUs should not be restarted while a device is inactive. Note that the code in migration.c:migrate_fd_cancel() will also try to reactivate all block devices if s->block_inactive was set, but because we failed to set that flag after the first failure, the source assumes it has reclaimed all devices, even though it still has remaining inactivated devices and does not try again. Normally, qmp_cont() will also try to reactivate all disks (or correctly fail if the disks are not reclaimable because NFS is not yet back up), but the auto-resumption of the source after a migration failure does not go through qmp_cont(). And because we have left the block layer in an inconsistent state with devices still inactivated, the later migration attempt is hitting the assertion failure. Since it is important to not resume the source with inactive disks, this patch marks s->block_inactive before attempting inactivation, rather than after succeeding, in order to prevent any vm_start() until it has successfully reactivated all devices. See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:29:00 +02:00
Juan Quintela	8c0cda8fa0	migration: Rename normal to normal_pages Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of full pages transferred. The name "normal" refered to the fact that they were sent without any optimization (compression, xbzrle, zero_page, ...). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:29:00 +02:00
Juan Quintela	1a386e8de5	migration: Rename duplicate to zero_pages Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of pages composed of the same character, here comes the name "duplicated". But since years ago, it refers to the number of zero_pages. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:59 +02:00
Juan Quintela	3c764f9b2b	migration: Make postcopy_requests atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:59 +02:00
Juan Quintela	536b5a4e56	migration: Make dirty_sync_count atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	296a4ac2aa	migration: Make downtime_bytes atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	b013b5d1f3	migration: Make precopy_bytes atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	4291823694	migration: Make dirty_sync_missed_zero_copy atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:57 +02:00
Juan Quintela	cf671116fa	migration: Make multifd_bytes atomic In the spirit of: commit 394d323bc3451e4d07f13341cb8817fac8dfbadd Author: Peter Xu <peterx@redhat.com> Date: Tue Oct 11 17:55:51 2022 -0400 migration: Use atomic ops properly for page accountings Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:57 +02:00
Juan Quintela	30fb22cda4	migration: Update atomic stats out of the mutex Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:56 +02:00
Juan Quintela	abce5fa16d	migration: Merge ram_counters and ram_atomic_counters Using MgrationStats as type for ram_counters mean that we didn't have to re-declare each value in another struct. The need of atomic counters have make us to create MigrationAtomicStats for this atomic counters. Create RAMStats type which is a merge of MigrationStats and MigrationAtomicStats removing unused members. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Fix typos found by David Edmondson	2023-04-24 11:28:56 +02:00
李皆俊	8ebb6ecc37	migration: remove extra whitespace character for code style Fix code style. Signed-off-by: 李皆俊 <a_lijiejun@163.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:55 +02:00
Paolo Bonzini	4592eaf387	postcopy-ram: do not use qatomic_mb_read It does not even pair with a qatomic_mb_set(), so it is clearer to use load-acquire in this case; they are synonyms. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-20 11:17:35 +02:00
Paolo Bonzini	394b9407e4	migration: mark mixed functions that can suspend There should be no paths from a coroutine_fn to aio_poll, however in practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine() path. By marking mixed functions, we can track accurately the call paths that execute entirely in coroutine context, and find more missing coroutine_fn markers. This results in more accurate checks that coroutine code does not end up blocking. If the marking were extended transitively to all functions that call these ones, static analysis could be done much more efficiently. However, this is a start and makes it possible to use vrc's path-based searches to find potential bugs where coroutine_fns call blocking functions. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-20 11:17:35 +02:00
Juan Quintela	28ef5339c3	migration: fix ram_state_pending_exact() I removed that bit on commit: commit `c8df4a7aef` Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* Fixes: `c8df4a7aef` Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 22:47:50 +02:00
Lukas Straub	37502df32c	migration/ram.c: Fix migration with compress enabled Since `ec6f3ab9`, migration with compress enabled was broken, because the compress threads use a dummy QEMUFile which just acts as a buffer and that commit accidentally changed it to use the outgoing migration channel instead. Fix this by using the dummy file again in the compress threads. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:51:34 +02:00
Peter Xu	06064a6715	migration: Recover behavior of preempt channel creation for pre-7.2 In 8.0 devel window we reworked preempt channel creation, so that there'll be no race condition when the migration channel and preempt channel got established in the wrong order in commit `5655aab079`. However no one noticed that the change will also be not compatible with older qemus, majorly 7.1/7.2 versions where preempt mode started to be supported. Leverage the same pre-7.2 flag introduced in the previous patch to recover the behavior hopefully before 8.0 releases, so we don't break migration when we migrate from 8.0 to older qemu binaries. Fixes: `5655aab079` ("migration: Postpone postcopy preempt channel to be after main") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:44:56 +02:00
Peter Xu	6621883f93	migration: Fix potential race on postcopy_qemufile_src postcopy_qemufile_src object should be owned by one thread, either the main thread (e.g. when at the beginning, or at the end of migration), or by the return path thread (when during a preempt enabled postcopy migration). If that's not the case the access to the object might be racy. postcopy_preempt_shutdown_file() can be potentially racy, because it's called at the end phase of migration on the main thread, however during which the return path thread hasn't yet been recycled; the recycle happens in await_return_path_close_on_source() which is after this point. It means, logically it's posslbe the main thread and the return path thread are both operating on the same qemufile. While I don't think qemufile is thread safe at all. postcopy_preempt_shutdown_file() used to be needed because that's where we send EOS to dest so that dest can safely shutdown the preempt thread. To avoid the possible race, remove this only place that a race can happen. Instead we figure out another way to safely close the preempt thread on dest. The core idea during postcopy on deciding "when to stop" is that dest will send a postcopy SHUT message to src, telling src that all data is there. Hence to shut the dest preempt thread maybe better to do it directly on dest node. This patch proposed such a way that we change postcopy_prio_thread_created into PreemptThreadStatus, so that we kick the preempt thread on dest qemu by a sequence of: mis->preempt_thread_status = PREEMPT_THREAD_QUIT; qemu_file_shutdown(mis->postcopy_qemufile_dst); While here shutdown() is probably so far the easiest way to kick preempt thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status to make sure it's not a network failure but a willingness to quit the thread. We could have avoided that extra status but just rely on migration status. The problem is postcopy_ram_incoming_cleanup() is just called early enough so we're still during POSTCOPY_ACTIVE no matter what.. So just make it simple to have the status introduced. One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of postcopy preempt. Fixes: `9358982744` ("migration: Send requested page directly in rp-return thread") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:44:38 +02:00
Paolo Bonzini	2c5451ca52	migration/block: replace uses of blk_nb_sectors that do not check result Uses of blk_nb_sectors must check whether the result is negative. Otherwise, underflow can happen. Fortunately, alloc_aio_bitmap() and bmds_aio_inflight() both have an alternative way to retrieve the number of sectors in the file. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20230407153303.391121-6-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-04-11 16:40:53 +02:00
Richard Henderson	cc37d98bfb	*: Add missing includes of qemu/error-report.h This had been pulled in via qemu/plugin.h from hw/core/cpu.h, but that will be removed. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org> [AJB: add various additional cases shown by CI] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org> Reviewed-by: Emilio Cota <cota@braap.org>	2023-03-22 15:06:57 +00:00
Steve Sistare	fa76c854ae	migration: fix populate_vfio_info Include CONFIG_DEVICES so that populate_vfio_info is instantiated for CONFIG_VFIO. Without it, the 'info migrate' command never returns info about vfio. Fixes: `43bd0bf30f` ("migration: Move populate_vfio_info() into a separate file") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Wei Wang	ff1585d1d8	migration/multifd: correct multifd_send_thread to trace the flags The p->flags could be updated via the send_prepare callback, e.g. OR-ed with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the local "flags" before the send_prepare callback could only get partial of p->flags. Fix it by moving the assignment of p->flags to the local flags after the callback, so that the correct flags can be traced. Fixes: `ab7cbb0b9a` ("multifd: Make no compression operations into its own structure") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Li Zhijian	bf0274192a	migration/rdma: Remove deprecated variable rdma_return_path It's no longer needed since commit `44bcfd45e9` ("migration/rdma: destination: create the return patch after the first accept") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino	1776b70f55	migration/xbzrle: fix out-of-bounds write with axv512 xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its outer loop, causing out-of-bounds writes: $ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw $ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle ==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430 WRITE of size 1 at 0x62100000b100 thread T0 #0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831 #1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275 #2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153 #3 0x7fb2fb65a58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d) #4 0x7fb2fb65a333 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333) #5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79) #6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94) #7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218 #8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) #9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d) 0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100) allocated by thread T0 here: #0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153 #1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0) Fix that by performing the overflow check in the inner loop, instead. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino	d84a78d15d	migration/xbzrle: use ctz64 to avoid undefined result __builtin_ctzll() produces undefined results when the argument is 0. This can be seen through test-xbzrle, which produces the following warning: ../migration/xbzrle.c:265: runtime error: passing zero to ctz(), which is not a valid argument Replace __builtin_ctzll() with our ctz64() wrapper which properly handles 0. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Dr. David Alan Gilbert	a5382214d8	migration/rdma: Fix return-path case The RDMA code has return-path handling code, but it's only enabled if postcopy is enabled; if the 'return-path' migration capability is enabled, the return path is NOT setup but the core migration code still tries to use it and breaks. Enable the RDMA return path if either postcopy or the return-path capability is enabled. bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Peter Xu	a5d35dc7e0	migration: Wait on preempt channel in preempt thread QEMU main thread will wait until dest preempt channel established during processing the LISTEN command (within the whole postcopy PACKAGED data), by waiting on the semaphore postcopy_qemufile_dst_done. That's racy, because it's possible that the dest QEMU main thread hasn't yet accept()ed the new connection when processing the LISTEN event. The sem_wait() will yield the main thread without being able to run anything else including the accept() of the new socket, which can cause deadlock within the main thread. To avoid the race, move the "wait channel" from main thread to the preempt thread right at the start. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: `5655aab079` ("migration: Postpone postcopy preempt channel to be after main") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
John Berberian, Jr	c31772ad68	Fix exec migration on Windows (w32+w64). * Use cmd instead of /bin/sh on Windows. * Try to auto-detect cmd.exe's path, but default to a hard-coded path. Note that this will require that gspawn-win[32\|64]-helper.exe and gspawn-win[32\|64]-helper-console.exe are included in the Windows binary distributions (cc: Stefan Weil). Signed-off-by: "John Berberian, Jr" <jeb.study@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-02 17:06:27 +01:00
Markus Armbruster	43aef7e632	migration/colo: Improve an x-colo-lost-heartbeat error message The QERR_ macros are leftovers from the days of "rich" error objects. We've been trying to reduce their remaining use. Get rid of a use of QERR_FEATURE_DISABLED, and improve the somewhat imprecise error message (qemu) x_colo_lost_heartbeat Error: The feature 'colo' is not enabled to Error: VM is not in COLO mode Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-12-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-02-23 14:10:17 +01:00
Markus Armbruster	6f1e91f716	error: Drop superfluous #include "qapi/qmp/qerror.h" Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>	2023-02-23 13:56:14 +01:00
Juan Quintela	24beea4efe	migration: Rename res_{postcopy,precopy}_only Once that res_compatible is removed, they don't make sense anymore. We remove the _only preffix. And to make things clearer we rename them to must_precopy and can_postcopy. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Juan Quintela	24f254ed79	migration: Remove unused res_compatible Nothing assigns to it after previous commit. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Juan Quintela	abbbd04da2	migration: In case of postcopy, the memory ends in res_postcopy_only So remove last assignation of res_compatible. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Philippe Mathieu-Daudé	163b8663b8	migration/block: Convert remaining DPRINTF() debug macro to trace events Finish the conversion from commit `fe80c0241d` ("migration: using trace_ to replace DPRINTF"). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 19:09:25 +01:00
Avihai Horon	c7a7db4b51	migration/qemu-file: Add qemu_file_get_to_fd() Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 19:09:25 +01:00
Juan Quintela	7b548761e5	ram: Document migration ram flags 0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now. Bigger usable flag is 0x200, noticing that. We can reuse RAM_SAVe_FLAG_FULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:47 +01:00
Leonardo Bras	cfc3bcf373	migration/multifd: Move load_cleanup inside incoming_state_destroy Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:40 +01:00
Leonardo Bras	10351fbad1	migration/multifd: Join all multifd threads in order to avoid leaks Current approach will only join threads that are still running. For the threads not joined, resources or private memory are always kept in the process space and never reclaimed before process end, and this risks serious memory leaks. This should usually not represent a big problem, since multifd migration is usually just ran at most a few times, and after it succeeds there is not much to be done before exiting the process. Yet still, it should not hurt performance to join all of them. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:34 +01:00
Leonardo Bras	d926f3bb2a	migration/multifd: Remove unnecessary assignment on multifd_load_cleanup() Before assigning "p->quit = true" for every multifd channel, multifd_load_cleanup() will call multifd_recv_terminate_threads() which already does the same assignment, while protected by a mutex. So there is no point doing the same assignment again. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:28 +01:00
Leonardo Bras	e5bac1f525	migration/multifd: Change multifd_load_cleanup() signature and usage Since it's introduction in commit `f986c3d256` ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:44:44 +01:00
Peter Xu	5655aab079	migration: Postpone postcopy preempt channel to be after main Postcopy with preempt-mode enabled needs two channels to communicate. The order of channel establishment is not guaranteed. It can happen that the dest QEMU got the preempt channel connection request before the main channel is established, then the migration may make no progress even during precopy due to the wrong order. To fix it, create the preempt channel only if we know the main channel is established. For a general postcopy migration, we delay it until postcopy_start(), that's where we already went through some part of precopy on the main channel. To make sure dest QEMU has already established the channel, we wait until we got the first PONG received. That's something we do at the start of precopy when postcopy enabled so it's guaranteed to happen sooner or later. For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare() where we'll have round trips of data on bitmap synchronizations, which means the main channel must have been established. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	b28fb58227	migration: Add a semaphore to count PONGs This is mostly useless, but useful for us to know whether the main channel is correctly established without changing the migration protocol. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	fc063a7b8a	migration: Cleanup postcopy_preempt_setup() Since we just dropped the only case where postcopy_preempt_setup() can return an error, it doesn't need a retval anymore because it never fails. Move the preempt check to the caller, preparing it to be used elsewhere to do nothing but as simple as kicking the async connection. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	d6f74fd12e	migration: Rework multi-channel checks on URI The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: `5ad15e8614` ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
ling xu	04ffce137b	AVX512 support for xbzrle_encode_buffer This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer function to accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark for this feature are added. Compared with C version of xbzrle_encode_buffer function, avx512 version can achieve 50%-70% performance improvement on benchmarking. In addition, if dirty data is randomly located in 4K page, the avx512 version can achieve almost 140% performance gain. Signed-off-by: ling xu <ling1.xu@intel.com> Co-authored-by: Zhou Zhao <zhou.zhao@intel.com> Co-authored-by: Jun Jin <jun.i.jin@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	e264705012	migration: I messed state_pending_exact/estimate I called the helper function from the wrong top level function. This code was introduced in: commit `c8df4a7aef` Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. Thanks to Avihai Horon <avihaih@nvidia.com> for finding it. Fixes:c8df4a7aeffcb46020f610526eea621fa5b0cd47 When we introduced that patch, we enden calling state_pending_estimate() helper from qemu_savevm_statepending_exact() and state_pending_exact() helper from qemu_savevm_statepending_estimate() This patch fixes it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	4010ba388d	migration: Make ram_save_target_page() a pointer We are going to create a new function for multifd latest in the series. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-11 16:51:09 +01:00

1 2 3 4 5 ...

1730 Commits