mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Juan Quintela	3f461a0c0b	migration: Drop unused parameter for migration_tls_get_creds() It is not needed since we moved the accessor for tls properties to options.c. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-05-03 11:24:20 +02:00
Juan Quintela	5690756d7c	migration/rdma: Unfold last user of acct_update_position() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:20 +02:00
Juan Quintela	c61d2faa93	migration/rdma: Split the zero page case from acct_update_position Now that we have atomic counters, we can do it on the place that we need it, no need to do it inside ram.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:20 +02:00
Juan Quintela	96820df24e	migration: Rename RAMStats to MigrationAtomicStats It is lousely based on MigrationStats, but that name is taken, so this is the best one that I came with. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> --- If you have any good suggestion for the name, I am all ears.	2023-05-03 11:24:20 +02:00
Juan Quintela	aff3f6606d	migration: Rename ram_counters to mig_stats migration_stats is just too long, and it is going to have more than ram counters in the near future. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:20 +02:00
Juan Quintela	947701cc1a	migration: Move ram_stats to its own file migration-stats.[ch] There is already include/qemu/stats.h, so stats.h was a bad idea. We want this file to not depend on anything else, we will move all the migration counters/stats to this struct. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:19 +02:00
Juan Quintela	e232199aad	multifd: We already account for this packet on the multifd thread Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de>	2023-05-03 11:24:19 +02:00
Richard Henderson	dc165fcd4e	migration/xbzrle: Use __attribute__((target)) for avx512 Use the attribute, which is supported by clang, instead of the #pragma, which is not supported and, for some reason, also not detected by the meson probe, so we fail by -Werror. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230501210555.289806-1-richard.henderson@linaro.org>	2023-05-02 13:05:45 -07:00
Juan Quintela	73208a336e	migration: Make dirty_bytes_last_sync atomic As we set its value, it needs to be operated with atomics. We rename it from remaining to better reflect its meaning. Statistics always return the real reamaining bytes. This was used to store how much pages where dirty on the previous generation, so we can calculate the expected downtime as: dirty_bytes_last_sync / current_bandwith. If we use the actual remaining bytes, we would see a very small value at the end of the iteration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> --- I am open to use ram_bytes_remaining() in its only use and be more "optimistic" about the downtime. Don't use __nocheck() functions. Use stat64_get() now that it exists.	2023-04-27 16:39:54 +02:00
Juan Quintela	72f8e58707	migration: Make dirty_pages_rate atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> --- Don't use __nocheck() variants Use stat64_get()	2023-04-27 16:39:49 +02:00
Juan Quintela	294e5a4034	multifd: Only flush once each full round of memory We need to add a new flag to mean to flush at that point. Notice that we still flush at the end of setup and at the end of complete stages. Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> --- Add missing qemu_fflush(), now it passes all tests always. In the previous version, the check that changes the default value to false got lost in some rebase. Get it back.	2023-04-27 16:37:28 +02:00
Juan Quintela	b05292c237	multifd: Protect multifd_send_sync_main() calls We only need to do that on the ram_save_iterate() call on sending and on destination when we get a RAM_SAVE_FLAG_EOS. In setup() and complete() we need to synch in both new and old cases, so don't add a check there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> --- Remove the wrappers that we take out on patch 5.	2023-04-27 16:37:28 +02:00
Juan Quintela	77c259a4cb	multifd: Create property multifd-flush-after-each-section We used to flush all channels at the end of each RAM section sent. That is not needed, so preparing to only flush after a full iteration through all the RAM. Default value of the property is false. But we return "true" in migrate_multifd_flush_after_each_section() until we implement the code in following patches. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> --- Rename each-iteration to after-each-section Rename multifd-sync-after-each-section to multifd-flush-after-each-section Move to machine-8.0 (peter)	2023-04-27 16:37:28 +02:00
Juan Quintela	f9436522c8	migration: Move migration_properties to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	b804b35b1c	migration: Create migrate_block_bitmap_mapping() function Notice that we changed the test of ->has_block_bitmap_mapping for the test that block_bitmap_mapping is not NULL. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Make it return const (vladimir)	2023-04-27 16:37:28 +02:00
Juan Quintela	1f2f366c32	migration: Create migrate_tls_hostname() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Moved the type to const char * (vladimir)	2023-04-27 16:37:28 +02:00
Juan Quintela	2eb0308bbd	migration: Create migrate_tls_authz() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Moved the type to const char * (vladimir)	2023-04-27 16:37:28 +02:00
Juan Quintela	d5c3e1959c	migration: Create migrate_tls_creds() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Moved the type to const char * (vladimir)	2023-04-27 16:37:28 +02:00
Juan Quintela	b1a8795654	migration: Remove MigrationState from block_cleanup_parameters() This makes the function more regular with everything else. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	b7b73122dd	migration: Move block_cleanup_parameters() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	87c2290109	migration: Move migrate_set_block_incremental() to options.c Once there, make it more regular and remove the need for MigrationState parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	f5da8ba477	migration: Create migrate_downtime_limit() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	8f9c532756	migration: Make all functions check have the same format Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	61a174e227	migration: Create migrate_params_init() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 16:37:28 +02:00
Juan Quintela	d2026ee117	multifd: Fix the number of channels ready We don't wait in the sem when we are doing a sync_main. Make it wait there. To make things clearer, we mark the channel ready at the begining of the thread loop. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-27 16:37:28 +02:00
Peter Xu	12c81e5ae9	migration/vmstate-dump: Dump array size too as "num" For VMS_ARRAY typed vmsd fields, also dump the number of entries in the array in -vmstate-dump. Without such information, vmstate static checker can report false negatives of incompatible vmsd on VMS_ARRAY typed fields, when the src/dst do not have the same type of array defined. It's because in the checker we only check against size of fields within a VMSD field. One example: e1000e used to have a field defined as a boolean array with 5 entries, then removed it and replaced it with UNUSED (in `31e3f318c8`): - VMSTATE_BOOL_ARRAY(core.eitr_intr_pending, E1000EState, - E1000E_MSIX_VEC_NUM), + VMSTATE_UNUSED(E1000E_MSIX_VEC_NUM), It's a legal replacement but vmstate static checker is not happy with it, because it checks only against the "size" field between the two fields (here one is BOOL_ARRAY, the other is UNUSED): For BOOL_ARRAY: { "field": "core.eitr_intr_pending", "version_id": 0, "field_exists": false, "size": 1 }, For UNUSED: { "field": "unused", "version_id": 0, "field_exists": false, "size": 5 }, It's not the script to blame because there's just not enough information dumped to show the total size of the entry for an array. Add it. Note that this will not break old vmstate checker because the field will just be ignored. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-27 10:18:25 +02:00
Peter Xu	74c38cf7fd	migration: Allow postcopy_ram_supported_by_host() to report err Instead of print it to STDERR, bring the error upwards so that it can be reported via QMP responses. E.g.: { "execute": "migrate-set-capabilities" , "arguments": { "capabilities": [ { "capability": "postcopy-ram", "state": true } ] } } { "error": { "class": "GenericError", "desc": "Postcopy is not supported: Host backend files need to be TMPFS or HUGETLBFS only" } } Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-27 10:18:25 +02:00
Juan Quintela	09d6c96584	migration: Move qmp_migrate_set_parameters() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-27 10:18:25 +02:00
Juan Quintela	10d4703be5	migration: Move migrate_use_tls() to options.c Once there, rename it to migrate_tls() and make it return bool for consistency. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Fix typos found by fabiano	2023-04-27 10:18:25 +02:00
Leonardo Bras	b405dfff1e	migration: Disable postcopy + multifd migration Since the introduction of multifd, it's possible to perform a multifd migration and finish it using postcopy. A bug introduced by yank (fixed on `cfc3bcf373`) was previously preventing a successful use of this migration scenario, and now thing should be working on most scenarios. But since there is not enough testing/support nor any reported users for this scenario, we should disable this combination before it may cause any problems for users. Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-27 10:18:25 +02:00
Juan Quintela	9c894df3a3	migration: Create migrate_max_bandwidth() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:47 +02:00
Juan Quintela	f774fde5d4	migration: Move migrate_postcopy() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:47 +02:00
Juan Quintela	873f674c55	migration: Create migrate_cpu_throttle_tailslow() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:47 +02:00
Juan Quintela	9605c2ac28	migration: Create migrate_cpu_throttle_increment() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:47 +02:00
Juan Quintela	2a8ec38082	migration: Create migrate_cpu_throttle_initial() to option.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:47 +02:00
Juan Quintela	2682c4eea7	migration: Move migrate_announce_params() to option.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> --- Fix extra whitespace (fabiano)	2023-04-24 15:01:46 +02:00
Juan Quintela	24155bd052	migration: Create migrate_max_cpu_throttle() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:46 +02:00
Juan Quintela	f94a858fa3	migration: Create migrate_checkpoint_delay() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:46 +02:00
Juan Quintela	6499efdb16	migration: Create migrate_throttle_trigger_threshold() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:46 +02:00
Juan Quintela	6f8be7080a	migration: Move migrate_use_block_incremental() to option.c To be consistent with every other parameter, rename to migrate_block_incremental(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	5390adec03	migration: Use migrate_max_postcopy_bandwidth() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	1dfc4b9e19	migration: Move parameters functions to option.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	f80196b772	migration: Move migrate_cap_set() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	45c1de13f0	migration: Move qmp_migrate_set_capabilities() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	4d0c6b695b	migration: Move qmp_query_migrate_capabilities() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	7760870645	migration: Move migrate_caps_check() to options.c Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	17cba690cd	migration: Create migrate_rdma_pin_all() function Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> --- Fixed missing space after comma (fabiano)	2023-04-24 15:01:46 +02:00
Juan Quintela	38ad1110e3	migration: Move migrate_use_return() to options.c Once that we are there, we rename the function to migrate_return_path() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	9d4b1e5f22	migration: Move migrate_use_block() to options.c Once that we are there, we rename the function to migrate_block() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	87dca0c9bb	migration: Move migrate_use_xbzrle() to options.c Once that we are there, we rename the function to migrate_xbzrle() to be consistent with all other capabilities. We change the type to return bool also for consistency. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	b4bc342c76	migration: Move migrate_use_zero_copy_send() to options.c Once that we are there, we rename the function to migrate_zero_copy_send() to be consistent with all other capabilities. We can remove the CONFIG_LINUX guard. We already check that we can't setup this capability in migrate_caps_check(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	51b07548f7	migration: Move migrate_use_multifd() to options.c Once that we are there, we rename the function to migrate_multifd() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	b890902c9c	migration: Move migrate_use_events() to options.c Once that we are there, we rename the function to migrate_events() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	a7a94d1435	migration: Move migrate_use_compression() to options.c Once that we are there, we rename the function to migrate_compress() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	5e80464455	migration: Move migrate_colo_enabled() to options.c Once that we are there, we rename the function to migrate_colo() to be consistent with all other capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 15:01:46 +02:00
Juan Quintela	1f0776f1c0	migration: Create options.c We move there all capabilities helpers from migration.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Following David advise: - looked through the history, capabilities are newer than 2012, so we can remove that bit of the header. - This part is posterior to Anthony. Original Author is Orit. Once there, I put myself. Peter Xu also did quite a bit of work here. Anyone else wants/needs to be there? I didn't search too hard because nobody asked before to be added. What do you think?	2023-04-24 15:01:46 +02:00
Juan Quintela	9eb1109cfb	migration: Create migrate_cap_set() And remove the convoluted use of qmp_migrate_set_capabilities() to enable disable MIGRATION_CAPABILITY_BLOCK. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de>	2023-04-24 15:01:46 +02:00
Juan Quintela	f9e1ef7482	spice: move client_migrate_info command to ui/ It has nothing to do with migration, except for the "migrate" in the name of the command. Move it with the rest of the ui commands. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-04-24 15:01:46 +02:00
Juan Quintela	c938157713	migration: move migration_global_dump() to migration-hmp-cmds.c It is only used there, so we can make it static. Once there, remove spice.h that it is not used. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> --- fix David Edmonson ui/qemu-spice.h unintended removal	2023-04-24 15:01:46 +02:00
Eric Blake	5d39f44d7a	migration: Minor control flow simplification No need to declare a temporary variable. Suggested-by: Juan Quintela <quintela@redhat.com> Fixes: 1df36e8c6289 ("migration: Handle block device inactivation failures better") Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 15:01:46 +02:00
Juan Quintela	b02c7fc9ef	migration: Pass migrate_caps_check() the old and new caps We used to pass the old capabilities array and the new capabilities as a list. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 11:29:02 +02:00
Juan Quintela	0cec2056ff	migration: rename enabled_capabilities to capabilities It is clear from the context what that means, and such a long name with the extra long names of the capabilities make very difficilut to stay inside the 80 columns limit. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>	2023-04-24 11:29:01 +02:00
Peter Xu	ae30b9b289	migration/postcopy: Detect file system on dest host Postcopy requires the memory support userfaultfd to work. Right now we check it but it's a bit too late (when switching to postcopy migration). Do that early right at enabling of postcopy. Note that this is still only a best effort because ramblocks can be dynamically created. We can add check in hostmem creations and fail if postcopy enabled, but maybe that's too aggressive. Still, we have chance to fail the most obvious where we know there's an existing unsupported ramblock. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:29:01 +02:00
Eric Blake	403d18ae38	migration: Handle block device inactivation failures better Consider what happens when performing a migration between two host machines connected to an NFS server serving multiple block devices to the guest, when the NFS server becomes unavailable. The migration attempts to inactivate all block devices on the source (a necessary step before the destination can take over); but if the NFS server is non-responsive, the attempt to inactivate can itself fail. When that happens, the destination fails to get the migrated guest (good, because the source wasn't able to flush everything properly): (qemu) qemu-kvm: load of migration failed: Input/output error at which point, our only hope for the guest is for the source to take back control. With the current code base, the host outputs a message, but then appears to resume: (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) (src qemu)info status VM status: running but a second migration attempt now asserts: (src qemu) qemu-kvm: ../block.c:6738: int bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. Whether the guest is recoverable on the source after the first failure is debatable, but what we do not want is to have qemu itself fail due to an assertion. It looks like the problem is as follows: In migration.c:migration_completion(), the source sets 'inactivate' to true (since COLO is not enabled), then tries savevm.c:qemu_savevm_state_complete_precopy() with a request to inactivate block devices. In turn, this calls block.c:bdrv_inactivate_all(), which fails when flushing runs up against the non-responsive NFS server. With savevm failing, we are now left in a state where some, but not all, of the block devices have been inactivated; but migration_completion() then jumps to 'fail' rather than 'fail_invalidate' and skips an attempt to reclaim those those disks by calling bdrv_activate_all(). Even if we do attempt to reclaim disks, we aren't taking note of failure there, either. Thus, we have reached a state where the migration engine has forgotten all state about whether a block device is inactive, because we did not set s->block_inactive in enough places; so migration allows the source to reach vm_start() and resume execution, violating the block layer invariant that the guest CPUs should not be restarted while a device is inactive. Note that the code in migration.c:migrate_fd_cancel() will also try to reactivate all block devices if s->block_inactive was set, but because we failed to set that flag after the first failure, the source assumes it has reclaimed all devices, even though it still has remaining inactivated devices and does not try again. Normally, qmp_cont() will also try to reactivate all disks (or correctly fail if the disks are not reclaimable because NFS is not yet back up), but the auto-resumption of the source after a migration failure does not go through qmp_cont(). And because we have left the block layer in an inconsistent state with devices still inactivated, the later migration attempt is hitting the assertion failure. Since it is important to not resume the source with inactive disks, this patch marks s->block_inactive before attempting inactivation, rather than after succeeding, in order to prevent any vm_start() until it has successfully reactivated all devices. See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Acked-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:29:00 +02:00
Juan Quintela	8c0cda8fa0	migration: Rename normal to normal_pages Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of full pages transferred. The name "normal" refered to the fact that they were sent without any optimization (compression, xbzrle, zero_page, ...). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:29:00 +02:00
Juan Quintela	1a386e8de5	migration: Rename duplicate to zero_pages Rest of counters that refer to pages has a _pages suffix. And historically, this showed the number of pages composed of the same character, here comes the name "duplicated". But since years ago, it refers to the number of zero_pages. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:59 +02:00
Juan Quintela	3c764f9b2b	migration: Make postcopy_requests atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:59 +02:00
Juan Quintela	536b5a4e56	migration: Make dirty_sync_count atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	296a4ac2aa	migration: Make downtime_bytes atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	b013b5d1f3	migration: Make precopy_bytes atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:58 +02:00
Juan Quintela	4291823694	migration: Make dirty_sync_missed_zero_copy atomic Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2023-04-24 11:28:57 +02:00
Juan Quintela	cf671116fa	migration: Make multifd_bytes atomic In the spirit of: commit 394d323bc3451e4d07f13341cb8817fac8dfbadd Author: Peter Xu <peterx@redhat.com> Date: Tue Oct 11 17:55:51 2022 -0400 migration: Use atomic ops properly for page accountings Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:57 +02:00
Juan Quintela	30fb22cda4	migration: Update atomic stats out of the mutex Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:56 +02:00
Juan Quintela	abce5fa16d	migration: Merge ram_counters and ram_atomic_counters Using MgrationStats as type for ram_counters mean that we didn't have to re-declare each value in another struct. The need of atomic counters have make us to create MigrationAtomicStats for this atomic counters. Create RAMStats type which is a merge of MigrationStats and MigrationAtomicStats removing unused members. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Fix typos found by David Edmondson	2023-04-24 11:28:56 +02:00
李皆俊	8ebb6ecc37	migration: remove extra whitespace character for code style Fix code style. Signed-off-by: 李皆俊 <a_lijiejun@163.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-24 11:28:55 +02:00
Paolo Bonzini	4592eaf387	postcopy-ram: do not use qatomic_mb_read It does not even pair with a qatomic_mb_set(), so it is clearer to use load-acquire in this case; they are synonyms. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-20 11:17:35 +02:00
Paolo Bonzini	394b9407e4	migration: mark mixed functions that can suspend There should be no paths from a coroutine_fn to aio_poll, however in practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine() path. By marking mixed functions, we can track accurately the call paths that execute entirely in coroutine context, and find more missing coroutine_fn markers. This results in more accurate checks that coroutine code does not end up blocking. If the marking were extended transitively to all functions that call these ones, static analysis could be done much more efficiently. However, this is a start and makes it possible to use vrc's path-based searches to find potential bugs where coroutine_fns call blocking functions. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-20 11:17:35 +02:00
Juan Quintela	28ef5339c3	migration: fix ram_state_pending_exact() I removed that bit on commit: commit `c8df4a7aef` Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* Fixes: `c8df4a7aef` Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 22:47:50 +02:00
Lukas Straub	37502df32c	migration/ram.c: Fix migration with compress enabled Since `ec6f3ab9`, migration with compress enabled was broken, because the compress threads use a dummy QEMUFile which just acts as a buffer and that commit accidentally changed it to use the outgoing migration channel instead. Fix this by using the dummy file again in the compress threads. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:51:34 +02:00
Peter Xu	06064a6715	migration: Recover behavior of preempt channel creation for pre-7.2 In 8.0 devel window we reworked preempt channel creation, so that there'll be no race condition when the migration channel and preempt channel got established in the wrong order in commit `5655aab079`. However no one noticed that the change will also be not compatible with older qemus, majorly 7.1/7.2 versions where preempt mode started to be supported. Leverage the same pre-7.2 flag introduced in the previous patch to recover the behavior hopefully before 8.0 releases, so we don't break migration when we migrate from 8.0 to older qemu binaries. Fixes: `5655aab079` ("migration: Postpone postcopy preempt channel to be after main") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:44:56 +02:00
Peter Xu	6621883f93	migration: Fix potential race on postcopy_qemufile_src postcopy_qemufile_src object should be owned by one thread, either the main thread (e.g. when at the beginning, or at the end of migration), or by the return path thread (when during a preempt enabled postcopy migration). If that's not the case the access to the object might be racy. postcopy_preempt_shutdown_file() can be potentially racy, because it's called at the end phase of migration on the main thread, however during which the return path thread hasn't yet been recycled; the recycle happens in await_return_path_close_on_source() which is after this point. It means, logically it's posslbe the main thread and the return path thread are both operating on the same qemufile. While I don't think qemufile is thread safe at all. postcopy_preempt_shutdown_file() used to be needed because that's where we send EOS to dest so that dest can safely shutdown the preempt thread. To avoid the possible race, remove this only place that a race can happen. Instead we figure out another way to safely close the preempt thread on dest. The core idea during postcopy on deciding "when to stop" is that dest will send a postcopy SHUT message to src, telling src that all data is there. Hence to shut the dest preempt thread maybe better to do it directly on dest node. This patch proposed such a way that we change postcopy_prio_thread_created into PreemptThreadStatus, so that we kick the preempt thread on dest qemu by a sequence of: mis->preempt_thread_status = PREEMPT_THREAD_QUIT; qemu_file_shutdown(mis->postcopy_qemufile_dst); While here shutdown() is probably so far the easiest way to kick preempt thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status to make sure it's not a network failure but a willingness to quit the thread. We could have avoided that extra status but just rely on migration status. The problem is postcopy_ram_incoming_cleanup() is just called early enough so we're still during POSTCOPY_ACTIVE no matter what.. So just make it simple to have the status introduced. One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of postcopy preempt. Fixes: `9358982744` ("migration: Send requested page directly in rp-return thread") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-04-12 21:44:38 +02:00
Paolo Bonzini	2c5451ca52	migration/block: replace uses of blk_nb_sectors that do not check result Uses of blk_nb_sectors must check whether the result is negative. Otherwise, underflow can happen. Fortunately, alloc_aio_bitmap() and bmds_aio_inflight() both have an alternative way to retrieve the number of sectors in the file. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20230407153303.391121-6-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2023-04-11 16:40:53 +02:00
Richard Henderson	cc37d98bfb	*: Add missing includes of qemu/error-report.h This had been pulled in via qemu/plugin.h from hw/core/cpu.h, but that will be removed. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org> [AJB: add various additional cases shown by CI] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org> Reviewed-by: Emilio Cota <cota@braap.org>	2023-03-22 15:06:57 +00:00
Steve Sistare	fa76c854ae	migration: fix populate_vfio_info Include CONFIG_DEVICES so that populate_vfio_info is instantiated for CONFIG_VFIO. Without it, the 'info migrate' command never returns info about vfio. Fixes: `43bd0bf30f` ("migration: Move populate_vfio_info() into a separate file") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Wei Wang	ff1585d1d8	migration/multifd: correct multifd_send_thread to trace the flags The p->flags could be updated via the send_prepare callback, e.g. OR-ed with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the local "flags" before the send_prepare callback could only get partial of p->flags. Fix it by moving the assignment of p->flags to the local flags after the callback, so that the correct flags can be traced. Fixes: `ab7cbb0b9a` ("multifd: Make no compression operations into its own structure") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Li Zhijian	bf0274192a	migration/rdma: Remove deprecated variable rdma_return_path It's no longer needed since commit `44bcfd45e9` ("migration/rdma: destination: create the return patch after the first accept") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino	1776b70f55	migration/xbzrle: fix out-of-bounds write with axv512 xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its outer loop, causing out-of-bounds writes: $ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw $ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle ==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430 WRITE of size 1 at 0x62100000b100 thread T0 #0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831 #1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275 #2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153 #3 0x7fb2fb65a58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d) #4 0x7fb2fb65a333 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333) #5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79) #6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94) #7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218 #8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) #9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d) 0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100) allocated by thread T0 here: #0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153 #1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0) Fix that by performing the overflow check in the inner loop, instead. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Matheus Tavares Bernardino	d84a78d15d	migration/xbzrle: use ctz64 to avoid undefined result __builtin_ctzll() produces undefined results when the argument is 0. This can be seen through test-xbzrle, which produces the following warning: ../migration/xbzrle.c:265: runtime error: passing zero to ctz(), which is not a valid argument Replace __builtin_ctzll() with our ctz64() wrapper which properly handles 0. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Dr. David Alan Gilbert	a5382214d8	migration/rdma: Fix return-path case The RDMA code has return-path handling code, but it's only enabled if postcopy is enabled; if the 'return-path' migration capability is enabled, the return path is NOT setup but the core migration code still tries to use it and breaks. Enable the RDMA return path if either postcopy or the return-path capability is enabled. bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
Peter Xu	a5d35dc7e0	migration: Wait on preempt channel in preempt thread QEMU main thread will wait until dest preempt channel established during processing the LISTEN command (within the whole postcopy PACKAGED data), by waiting on the semaphore postcopy_qemufile_dst_done. That's racy, because it's possible that the dest QEMU main thread hasn't yet accept()ed the new connection when processing the LISTEN event. The sem_wait() will yield the main thread without being able to run anything else including the accept() of the new socket, which can cause deadlock within the main thread. To avoid the race, move the "wait channel" from main thread to the preempt thread right at the start. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: `5655aab079` ("migration: Postpone postcopy preempt channel to be after main") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-16 16:07:07 +01:00
John Berberian, Jr	c31772ad68	Fix exec migration on Windows (w32+w64). * Use cmd instead of /bin/sh on Windows. * Try to auto-detect cmd.exe's path, but default to a hard-coded path. Note that this will require that gspawn-win[32\|64]-helper.exe and gspawn-win[32\|64]-helper-console.exe are included in the Windows binary distributions (cc: Stefan Weil). Signed-off-by: "John Berberian, Jr" <jeb.study@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-03-02 17:06:27 +01:00
Markus Armbruster	43aef7e632	migration/colo: Improve an x-colo-lost-heartbeat error message The QERR_ macros are leftovers from the days of "rich" error objects. We've been trying to reduce their remaining use. Get rid of a use of QERR_FEATURE_DISABLED, and improve the somewhat imprecise error message (qemu) x_colo_lost_heartbeat Error: The feature 'colo' is not enabled to Error: VM is not in COLO mode Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-12-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-02-23 14:10:17 +01:00
Markus Armbruster	6f1e91f716	error: Drop superfluous #include "qapi/qmp/qerror.h" Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>	2023-02-23 13:56:14 +01:00
Juan Quintela	24beea4efe	migration: Rename res_{postcopy,precopy}_only Once that res_compatible is removed, they don't make sense anymore. We remove the _only preffix. And to make things clearer we rename them to must_precopy and can_postcopy. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Juan Quintela	24f254ed79	migration: Remove unused res_compatible Nothing assigns to it after previous commit. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Juan Quintela	abbbd04da2	migration: In case of postcopy, the memory ends in res_postcopy_only So remove last assignation of res_compatible. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 20:04:30 +01:00
Philippe Mathieu-Daudé	163b8663b8	migration/block: Convert remaining DPRINTF() debug macro to trace events Finish the conversion from commit `fe80c0241d` ("migration: using trace_ to replace DPRINTF"). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 19:09:25 +01:00
Avihai Horon	c7a7db4b51	migration/qemu-file: Add qemu_file_get_to_fd() Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-15 19:09:25 +01:00
Juan Quintela	7b548761e5	ram: Document migration ram flags 0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now. Bigger usable flag is 0x200, noticing that. We can reuse RAM_SAVe_FLAG_FULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:47 +01:00
Leonardo Bras	cfc3bcf373	migration/multifd: Move load_cleanup inside incoming_state_destroy Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:40 +01:00
Leonardo Bras	10351fbad1	migration/multifd: Join all multifd threads in order to avoid leaks Current approach will only join threads that are still running. For the threads not joined, resources or private memory are always kept in the process space and never reclaimed before process end, and this risks serious memory leaks. This should usually not represent a big problem, since multifd migration is usually just ran at most a few times, and after it succeeds there is not much to be done before exiting the process. Yet still, it should not hurt performance to join all of them. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:34 +01:00
Leonardo Bras	d926f3bb2a	migration/multifd: Remove unnecessary assignment on multifd_load_cleanup() Before assigning "p->quit = true" for every multifd channel, multifd_load_cleanup() will call multifd_recv_terminate_threads() which already does the same assignment, while protected by a mutex. So there is no point doing the same assignment again. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:45:28 +01:00
Leonardo Bras	e5bac1f525	migration/multifd: Change multifd_load_cleanup() signature and usage Since it's introduction in commit `f986c3d256` ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: `b5eea99ec2` ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-13 03:44:44 +01:00
Peter Xu	5655aab079	migration: Postpone postcopy preempt channel to be after main Postcopy with preempt-mode enabled needs two channels to communicate. The order of channel establishment is not guaranteed. It can happen that the dest QEMU got the preempt channel connection request before the main channel is established, then the migration may make no progress even during precopy due to the wrong order. To fix it, create the preempt channel only if we know the main channel is established. For a general postcopy migration, we delay it until postcopy_start(), that's where we already went through some part of precopy on the main channel. To make sure dest QEMU has already established the channel, we wait until we got the first PONG received. That's something we do at the start of precopy when postcopy enabled so it's guaranteed to happen sooner or later. For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare() where we'll have round trips of data on bitmap synchronizations, which means the main channel must have been established. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	b28fb58227	migration: Add a semaphore to count PONGs This is mostly useless, but useful for us to know whether the main channel is correctly established without changing the migration protocol. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	fc063a7b8a	migration: Cleanup postcopy_preempt_setup() Since we just dropped the only case where postcopy_preempt_setup() can return an error, it doesn't need a retval anymore because it never fails. Move the preempt check to the caller, preparing it to be used elsewhere to do nothing but as simple as kicking the async connection. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Peter Xu	d6f74fd12e	migration: Rework multi-channel checks on URI The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: `5ad15e8614` ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
ling xu	04ffce137b	AVX512 support for xbzrle_encode_buffer This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer function to accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark for this feature are added. Compared with C version of xbzrle_encode_buffer function, avx512 version can achieve 50%-70% performance improvement on benchmarking. In addition, if dirty data is randomly located in 4K page, the avx512 version can achieve almost 140% performance gain. Signed-off-by: ling xu <ling1.xu@intel.com> Co-authored-by: Zhou Zhao <zhou.zhao@intel.com> Co-authored-by: Jun Jin <jun.i.jin@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	e264705012	migration: I messed state_pending_exact/estimate I called the helper function from the wrong top level function. This code was introduced in: commit `c8df4a7aef` Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. Thanks to Avihai Horon <avihaih@nvidia.com> for finding it. Fixes:c8df4a7aeffcb46020f610526eea621fa5b0cd47 When we introduced that patch, we enden calling state_pending_estimate() helper from qemu_savevm_statepending_exact() and state_pending_exact() helper from qemu_savevm_statepending_estimate() This patch fixes it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	4010ba388d	migration: Make ram_save_target_page() a pointer We are going to create a new function for multifd latest in the series. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	8d80e1951e	migration: Calculate ram size once We are recalculating ram size continously, when we know that it don't change during migration. Create a field in RAMState to track it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-02-11 16:51:09 +01:00
Juan Quintela	8008a272d6	migration: Split ram_bytes_total_common() in two functions It is just a big if in the middle of the function, and we need two functions anways. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com> --- Reindent to make Phillipe happy (and CODING_STYLE)	2023-02-11 16:51:09 +01:00
Juan Quintela	31e2ac742b	migration: Make find_dirty_block() return a single parameter We used to return two bools, just return a single int with the following meaning: old return / again / new return false false PAGE_ALL_CLEAN false true PAGE_TRY_AGAIN true true PAGE_DIRTY_FOUND /* We don't care about again at all */ Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	51efd36faf	migration: Simplify ram_find_and_save_block() We will need later that find_dirty_block() return errors, so simplify the loop. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Li Zhang	bca762c2b9	multifd: Remove some redundant code Clean up some unnecessary code Signed-off-by: Li Zhang <lizhang@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Li Zhang	e3f37b2ce6	multifd: cleanup the function multifd_channel_connect Cleanup multifd_channel_connect Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Juan Quintela	b530ccde5d	migration: Remove spurious files I introduced spurious files on my tree during a rebase: commit `ebfc578715` Author: Zhenzhong Duan <zhenzhong.duan@intel.com> Date: Mon Oct 17 15:53:51 2022 +0800 multifd: Fix flush of zero copy page send request Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> To make things worse, it appears like Zhenzhong is the one to blame. for(int i=0; i < 1000000; i++) { printf("I will not do rebases when I am tired\n"); } Sorry, Juan. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-11 16:51:09 +01:00
Markus Armbruster	a67dfa660b	Drop duplicate #include Tracked down with the help of scripts/clean-includes. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230202133830.2152150-21-armbru@redhat.com>	2023-02-08 07:28:05 +01:00
Jiang Jiacheng	1b1f4ab69c	migration: save/delete migration thread info To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Jiang Jiacheng	671326201d	migration: Introduce interface query-migrationthreads Introduce interface query-migrationthreads. The interface is used to query information about migration threads and returns with migration thread's name and its id. Introduce threadinfo.c to manage threads with migration. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Zhenzhong Duan	ebfc578715	multifd: Fix flush of zero copy page send request Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
Zhenzhong Duan	ddbe628c97	multifd: Fix a race on reading MultiFDPages_t.block In multifd_queue_page() MultiFDPages_t.block is checked twice. Between the two checks, MultiFDPages_t.block may be reset to NULL by multifd thread. This lead to the 2nd check always true then a redundant page submitted to multifd thread again. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
manish.mishra	6720c2b327	migration: check magic value for deciding the mapping of channels Current logic assumes that channel connections on the destination side are always established in the same order as the source and the first one will always be the main channel followed by the multifid or post-copy preemption channel. This may not be always true, as even if a channel has a connection established on the source side it can be in the pending state on the destination side and a newer connection can be established first. Basically causing out of order mapping of channels on the destination side. Currently, all channels except post-copy preempt send a magic number, this patch uses that magic number to decide the type of channel. This logic is applicable only for precopy(multifd) live migration, as mentioned, the post-copy preempt channel does not send any magic number. Also, tls live migrations already does tls handshake before creating other channels, so this issue is not possible with tls, hence this logic is avoided for tls live migrations. This patch uses read peek to check the magic number of channels so that current data/control stream management remains un-effected. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:57 +01:00
manish.mishra	84615a19dd	io: Add support for MSG_PEEK for socket channel MSG_PEEK peeks at the channel, The data is treated as unread and the next read shall still return this data. This support is currently added only for socket class. Extra parameter 'flags' is added to io_readv calls to pass extra read flags like MSG_PEEK. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Suggested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: manish.mishra <manish.mishra@nutanix.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Zhenzhong Duan	bd9510d385	migration/dirtyrate: Show sample pages only in page-sampling mode The value of "Sample Pages" is confusing in mode other than page-sampling. See below: (qemu) calc_dirty_rate -b 10 520 (qemu) info dirty_rate Status: measuring Start Time: 11646834 (ms) Sample Pages: 520 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: (not ready) (qemu) info dirty_rate Status: measured Start Time: 11646834 (ms) Sample Pages: 0 (per GB) Period: 10 (sec) Mode: dirty-bitmap Dirty rate: 2 (MB/s) While it's totally useless in dirty-ring and dirty-bitmap mode, fix to show it only in page-sampling mode. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Dr. David Alan Gilbert	bb25a72895	migration: Perform vmsd structure check during tests Perform a check on vmsd structures during test runs in the hope of catching any missing terminators and other simple screwups. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Dr. David Alan Gilbert	89c5684891	migration: Add canary to VMSTATE_END_OF_LIST We fairly regularly forget VMSTATE_END_OF_LIST markers off descriptions; given that the current check is only for ->name being NULL, sometimes we get unlucky and the code apparently works and no one spots the error. Explicitly add a flag, VMS_END that should be set, and assert it is set during the traversal. Note: This can't go in until we update the copy of vmstate.h in slirp. Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Fiona Ebner	74ecf6ac2b	migration/rdma: fix return value for qio_channel_rdma_{readv,writev} upon errors. As the documentation in include/io/channel.h states, only -1 and QIO_CHANNEL_ERR_BLOCK should be returned upon error. Other values have the potential to confuse the call sites. error_setg is used rather than error_setg_errno, because there are certain code paths where -1 (as a non-errno) is propagated up (e.g. starting from qemu_rdma_block_for_wrid or qemu_rdma_post_recv_control) all the way to qio_channel_rdma_{readv,writev}. Similar to `a216ec85b7` ("migration/channel-block: fix return value for qio_channel_block_{readv,writev}"). Suggested-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Peter Xu	db18dee7d7	migration: Show downtime during postcopy phase The downtime should be displayed during postcopy phase because the switchover phase is done. OTOH it's weird to show "expected downtime" which can confuse what does that mean if the switchover has already happened anyway. This is a slight ABI change on QMP, but I assume it shouldn't affect anyone. Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	80fe315c38	migration/ram: Factor out check for advised postcopy Let's factor out this check, to be used in virtio-mem context next. While at it, fix a spelling error in a related comment. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	62f42625d4	migration/savevm: Allow immutable device state to be migrated early (i.e., before RAM) For virtio-mem, we want to have the plugged/unplugged state of memory blocks available before migrating any actual RAM content, and perform sanity checks before touching anything on the destination. This information is immutable on the migration source while migration is active, We want to use this information for proper preallocation support with migration: currently, we don't preallocate memory on the migration target, and especially with hugetlb, we can easily run out of hugetlb pages during RAM migration and will crash (SIGBUS) instead of catching this gracefully via preallocation. Migrating device state via a VMSD before we start iterating is currently impossible: the only approach that would be possible is avoiding a VMSD and migrating state manually during save_setup(), to be restored during load_state(). Let's allow for migrating device state via a VMSD early, during the setup phase in qemu_savevm_state_setup(). To keep it simple, we indicate applicable VMSD's using an "early_setup" flag. Note that only very selected devices (i.e., ones seriously messing with RAM setup) are supposed to make use of such early state migration. While at it, also use a bool for the "unmigratable" member. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>S Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	e3bf5e68e2	migration/savevm: Prepare vmdesc json writer in qemu_savevm_state_setup() ... and store it in the migration state. This is a preparation for storing selected vmds's already in qemu_savevm_state_setup(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	5e104f24e7	migration/savevm: Move more savevm handling into vmstate_save() Let's move more code into vmstate_save(), reducing code duplication and preparing for reuse of vmstate_save() in qemu_savevm_state_setup(). We have to move vmstate_save() to make the compiler happy. We'll now also trace from qemu_save_device_state(), triggering the same tracepoints as previously called from qemu_savevm_state_complete_precopy_non_iterable() only. Note that qemu_save_device_state() ignores iterable device state, such as RAM, and consequently doesn't trigger some other trace points (e.g., trace_savevm_state_setup()). Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	e41c57702e	migration/ram: Optimize ram_write_tracking_start() for RamDiscardManager ram_block_populate_read() already optimizes for RamDiscardManager. However, ram_write_tracking_start() will still try protecting discarded memory ranges. Let's optimize, because discarded ranges don't map any pages and (1) For anonymous memory, trying to protect using uffd-wp without a mapped page is ignored by the kernel and consequently a NOP. (2) For shared/file-backed memory, we will fill present page tables in the range with PTE markers. However, we will even allocate page tables just to fill them with unnecessary PTE markers and effectively waste memory. So let's exclude these ranges, just like ram_block_populate_read() already does. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	59bcc049c1	migration/ram: Rely on used_length for uffd_change_protection() ram_mig_ram_block_resized() will abort migration (including background snapshots) when resizing a RAMBlock. ram_block_populate_read() will only populate RAM up to used_length, so at least for anonymous memory protecting everything between used_length and max_length won't actually be protected and is just a NOP. So let's only protect everything up to used_length. Note: it still makes sense to register uffd-wp for max_length, such that RAM_UF_WRITEPROTECT is independent of a changing used_length. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	7cc8e9e0fa	migration/ram: Don't explicitly unprotect when unregistering uffd-wp When unregistering uffd-wp, older kernels before commit f369b07c86143 ("mm/uffd:reset write protection when unregister with wp-mode") won't clear the uffd-wp PTE bit. When re-registering uffd-wp, the previous uffd-wp PTE bits would trigger again. With above commit, the kernel will clear the uffd-wp PTE bits when unregistering itself. Consequently, we'll clear the uffd-wp PTE bits now twice -- whereby we don't care about clearing them at all: a new background snapshot will re-register uffd-wp and re-protect all memory either way. So let's skip the manual clearing of uffd-wp. If ever relevant, we could clear conditionally in uffd_unregister_memory() -- we just need a way to figure out more recent kernels. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	72ef3a3708	migration/ram: Fix error handling in ram_write_tracking_start() If something goes wrong during uffd_change_protection(), we would miss to unregister uffd-wp and not release our reference. Fix it by performing the uffd_change_protection(true) last. Note that a uffd_change_protection(false) on the recovery path without a prior uffd_change_protection(false) is fine. Fixes: `278e2f551a` ("migration: support UFFD write fault processing in ram_save_iterate()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
David Hildenbrand	5f19a44919	migration/ram: Fix populate_read_range() Unfortunately, commit `f7b9dcfbcf` broke populate_read_range(): the loop end condition is very wrong, resulting in that function not populating the full range. Lets' fix that. Fixes: `f7b9dcfbcf` ("migration/ram: Factor out populating pages readable in ram_block_populate_pages()") Cc: qemu-stable@nongnu.org Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Peter Xu	d5890ea072	util/userfaultfd: Add uffd_open() Add a helper to create the uffd handle. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	d9df92925e	migration: simplify migration_iteration_run() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	fd70385d38	migration: Remove unused threshold_size parameter Until previous commit, save_live_pending() was used for ram. Now with the split into state_pending_estimate() and state_pending_exact() it is not needed anymore, so remove them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	c8df4a7aef	migration: Split save_live_pending() into state_pending_* We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. The only "device" that implements different functions for _estimate() and _exact() is ram. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Juan Quintela	255dc7af7e	migration: No save_live_pending() method uses the QEMUFile parameter So remove it everywhere. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2023-02-06 19:22:56 +01:00
Peter Xu	301d7ffe5f	migration: Fix migration crash when target psize larger than host Commit `d9e474ea56` overlooked the case where the target psize is even larger than the host psize. One example is Alpha has 8K page size and migration will start to crash the source QEMU when running Alpha migration on x86. Fix it by detecting that case and set host start/end just to cover the single page to be migrated. This will slightly optimize the common case where host psize equals to guest psize so we don't even need to do the roundups, but that's trivial. Cc: qemu-stable@nongnu.org Reported-by: Thomas Huth <thuth@redhat.com> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1456 Fixes: `d9e474ea56` ("migration: Teach PSS about host page") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2023-02-06 19:22:56 +01:00
Markus Armbruster	27be86351e	migration: Move the QMP command from monitor/ to migration/ This moves the command from MAINTAINERS sections "Human Monitor (HMP)" and "QMP" to "Migration". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-19-armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-02-04 07:56:54 +01:00
Markus Armbruster	119f50ce30	migration: Move HMP commands from monitor/ to migration/ This moves these commands from MAINTAINERS sections "Human Monitor (HMP)" and "QMP" to "Migration". Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-18-armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2023-02-04 07:56:54 +01:00
Markus Armbruster	e2c1c34f13	include/block: Untangle inclusion loops We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit `d7e2fe4aac`. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>	2023-01-20 07:24:28 +01:00
Peter Maydell	928eac9539	Migration patches for 8.0 Hi This are the patches that I had to drop form the last PULL request because they werent fixes: - AVX2 is dropped, intel posted a fix, I have to redo it - Fix for out of order channels is out Daniel nacked it and I need to redo it -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEGJn/jt6/WMzuA0uC9IfvGFhy1yMFAmOa6xUACgkQ9IfvGFhy 1yP13BAAj4GdlWCqgvv98qIf9dY5WjvrbzL+8qdUvt7VIsDgh18amjlBmvvBngmd tssPHqLTqs6CXYxo4PBwKsvhA1qBCg9Fr+RtMTJG4FoumFdeO/l4tcXs99Ww5o9p OnrMAshTRHMRapvvX0vIiR0dGUPXs6KOz2JLNX1oF5ZY1yqskLxp9x3ydL7iw2oN GikRUfd4bG8drvhrKl6WPZOMKt0fVRH/2j0TqKPtl/hh/F4Ie6AUSI7McYMwOeXx xUhFcm2PKY5US6uYhZpKo7envCmuxreZSAH/eRrlu5uNCCOKaZ9uWYwACMJGpfrB SqY5dCTDpfFoaOloFEOYDfWOwoCJl5u9vNwRK1ArSVCfjczq50itswFTQ3A/hyd2 1noMv60XcR3An3mUydQ3j/C+hfE3KVXdFPImOKjPrn8zU6f2Dfug3ALXiHi1xyov ZdpcZjCEhdSruYxIdlIKfzlYLy8R1G4mSFrBV3NuMrywlM2fWQgyCUAYwzRwQrJw oBiedgpNP/MCM4NPQKLpvz/sci6nxkrGV8QX44zg0LdViXkpCU5ZiaoPXQcbiQCC Xkkah3GLbVt6788qKja2U9ccdofAe5yUbjo6XYxdbXC7y9mSyvBS9FCHvWr4HY/8 TUavGrcjKqQ31WxiyWw5CEi/hqNftFUNtWmEzZuAjRwM2cw89sU= =zGNB -----END PGP SIGNATURE----- Merge tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu into staging Migration patches for 8.0 Hi This are the patches that I had to drop form the last PULL request because they werent fixes: - AVX2 is dropped, intel posted a fix, I have to redo it - Fix for out of order channels is out Daniel nacked it and I need to redo it # gpg: Signature made Thu 15 Dec 2022 09:38:29 GMT # gpg: using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu: migration: Drop rs->f migration: Remove old preempt code around state maintainance migration: Send requested page directly in rp-return thread migration: Move last_sent_block into PageSearchStatus migration: Make PageSearchStatus part of RAMState migration: Add pss_init() migration: Introduce pss_channel migration: Teach PSS about host page migration: Use atomic ops properly for page accountings migration: Yield bitmap_mutex properly when sending/sleeping migration: Remove RAMState.f references in compression code migration: Trivial cleanup save_page_header() on same block check migration: Cleanup xbzrle zero page cache update logic migration: Add postcopy_preempt_active() migration: Take bitmap mutex when completing ram migration migration: Export ram_release_page() migration: Export ram_transferred_ram() multifd: Create page_count fields into both MultiFD{Recv,Send}Params multifd: Create page_size fields into both MultiFD{Recv,Send}Params Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-12-15 14:52:13 +00:00
Peter Maydell	48804eebd4	Miscellaneous patches for 2022-12-14 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmOZ6lYSHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZT6VEQAKynjWh3AIZ4/qOgrVqsP0oRspevLmfH BbuGoldjYpEE7RbwuCaZalZ7iy7TcSySxnPfUDVsFHd7NWffJVjwKHifGC0D/Ez0 +Ggyb1CBebN+mS7t+BNFUHdMM+wxFIlHwg4f4aTFbn2o0HKgj2a8tcNzNRonZbfa xURnvbD4G4u0VZEc3Jak+x193xbOJFsuuWq0BZnDuNk+XqjyW2RwfpXLPJVk+82a 4uy/YgYuqXUqBeULwcJj+shBL4SXR9GyajTFMS64przSUle0ADUmXkPtaS2agV7e Pym/UQuAcxvNyw34fJsiMZxx6rZI9YU30jQUMRLoYcPRR/Q/aiPeiiHtiD6Kaid7 IfOeH/EArXaQRFpD89xj4YcaTnRLQOEj0NXgXvAbQf6eD8JYyao/S/0lCsPZEoA2 nibLqEQ25ncDNXoSomuwtfjVff3w68lODFbhwqfA0gf3cPtCgVZ6xQ8P/McNY6K6 wqFHXMWTDHk1LOCTucjYz1z2TGzTnSG4iWi5Yt6FSxAc958AO+v5ALn/1pcYun+E azM/MF0AInKj2aJCT530zT0tpCs/Jo07YKC8k6ubi77S0ZdmGS1XLeXkRXfk1+yI OhuUgiVlSTHxD69DagT2vbnx1mDMM9X+OBIMvEi5nwvD9A/ghaCgkDeGFvbA1ud0 t0mxPBZJ+tiZ =JJjG -----END PGP SIGNATURE----- Merge tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru into staging Miscellaneous patches for 2022-12-14 # gpg: Signature made Wed 14 Dec 2022 15:23:02 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * tag 'pull-misc-2022-12-14' of https://repo.or.cz/qemu/armbru: ppc4xx_sdram: Simplify sdram_ddr_size() to return block/vmdk: Simplify vmdk_co_create() to return directly cleanup: Tweak and re-run return_directly.cocci io: Tidy up fat-fingered parameter name qapi: Use returned bool to check for failure (again) sockets: Use ERRP_GUARD() where obviously appropriate qemu-config: Use ERRP_GUARD() where obviously appropriate qemu-config: Make config_parse_qdict() return bool monitor: Use ERRP_GUARD() in monitor_init() monitor: Simplify monitor_fd_param()'s error handling error: Move ERRP_GUARD() to the beginning of the function error: Drop a few superfluous ERRP_GUARD() error: Drop some obviously superfluous error_propagate() Drop more useless casts from void * to pointer Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2022-12-15 10:13:46 +00:00
Peter Xu	7f401b8044	migration: Drop rs->f Now with rs->pss we can already cache channels in pss->pss_channels. That pss_channel contains more infromation than rs->f because it's per-channel. So rs->f could be replaced by rss->pss[RAM_CHANNEL_PRECOPY].pss_channel, while rs->f itself is a bit vague now. Note that vanilla postcopy still send pages via pss[RAM_CHANNEL_PRECOPY], that's slightly confusing but it reflects the reality. Then, after the replacement we can safely drop rs->f. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2022-12-15 10:30:37 +01:00

1 2 3 4 5 ...

1890 Commits