qemu/migration
Peter Xu 0b246f8e9e migration: Fix race that dest preempt thread close too early
We hit intermit CI issue on failing at migration-test over the unit test
preempt/plain:

qemu-system-x86_64: Unable to read from socket: Connection reset by peer
Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1
**
ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0)
(test program exited with status code -6)

Fabiano debugged into it and found that the preempt thread can quit even
without receiving all the pages, which can cause guest not receiving all
the pages and corrupt the guest memory.

To make sure preempt thread finished receiving all the pages, we can rely
on the page_requested_count being zero because preempt channel will only
receive requested page faults. Note, not all the faulted pages are required
to be sent via the preempt channel/thread; imagine the case when a
requested page is just queued into the background main channel for
migration, the src qemu will just still send it via the background channel.

Here instead of spinning over reading the count, we add a condvar so the
main thread can wait on it if that unusual case happened, without burning
the cpu for no good reason, even if the duration is short; so even if we
spin in this rare case is probably fine.  It's just better to not do so.

The condvar is only used when that special case is triggered.  Some memory
ordering trick is needed to guarantee it from happening (against the
preempt thread status field), so the main thread will always get a kick
when that triggers correctly.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886
Debugged-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230918172822.19052-2-farosas@suse.de>
(cherry picked from commit cf02f29e1e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-10-03 02:00:54 +03:00
..
block-dirty-bitmap.c migration: Move rate_limit_max and rate_limit_used to migration_stats 2023-05-18 18:40:51 +02:00
block.c block-migration: Ensure we don't crash during migration cleanup 2023-09-21 19:35:19 +03:00
block.h migration: disable auto-converge during bulk block migration 2017-09-27 11:27:14 +01:00
channel-block.c io: Add support for MSG_PEEK for socket channel 2023-02-06 19:22:56 +01:00
channel-block.h migration: introduce a QIOChannel impl for BlockDriverState VMState 2022-06-22 19:33:43 +01:00
channel.c migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
channel.h migration: check magic value for deciding the mapping of channels 2023-02-06 19:22:57 +01:00
colo-failover.c migration/colo: Improve an x-colo-lost-heartbeat error message 2023-02-23 14:10:17 +01:00
colo.c migration: process_incoming_migration_co(): move colo part to colo 2023-05-18 18:40:51 +02:00
dirtyrate.c migration: Add last stage indicator to global dirty log 2023-05-18 08:53:50 +02:00
dirtyrate.h migration/dirtyrate: Refactor dirty page rate calculation 2022-07-20 12:15:08 +01:00
exec.c *: Add missing includes of qemu/error-report.h 2023-03-22 15:06:57 +00:00
exec.h migration: Export exec.c functions in its own file 2017-06-01 18:49:22 +02:00
fd.c bulk: Remove pointless QOM casts 2023-06-05 20:48:34 +02:00
fd.h migration: Fix fd protocol for incoming defer 2019-06-05 12:43:55 +02:00
global_state.c migration: never fail in global_state_store() 2023-06-02 01:03:19 +02:00
meson.build meson: Replace softmmu_ss -> system_ss 2023-06-20 10:01:30 +02:00
migration-hmp-cmds.c migration: Extend query-migrate to provide dirty page limit info 2023-07-26 10:55:56 +02:00
migration-stats.c migration: spelling fixes 2023-07-25 17:13:20 +03:00
migration-stats.h migration: We don't need the field rate_limit_used anymore 2023-05-18 18:40:51 +02:00
migration.c migration: Fix race that dest preempt thread close too early 2023-10-03 02:00:54 +03:00
migration.h migration: Fix race that dest preempt thread close too early 2023-10-03 02:00:54 +03:00
multifd-zlib.c migration: spelling fixes 2023-07-25 17:13:20 +03:00
multifd-zstd.c migration: spelling fixes 2023-07-25 17:13:20 +03:00
multifd.c migration/multifd: Rename threadinfo.c functions 2023-07-26 10:55:56 +02:00
multifd.h multifd: Add the ramblock to MultiFDRecvParams 2023-05-10 18:48:11 +02:00
options.c migration: enforce multifd and postcopy preempt to be set before incoming 2023-07-26 10:55:56 +02:00
options.h migration: Introduce dirty-limit capability 2023-07-26 10:55:56 +02:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c migration: Fix race that dest preempt thread close too early 2023-10-03 02:00:54 +03:00
postcopy-ram.h migration: Allow postcopy_ram_supported_by_host() to report err 2023-04-27 10:18:25 +02:00
qemu-file.c migration/rdma: Split qemu_fopen_rdma() into input/output functions 2023-07-26 10:55:56 +02:00
qemu-file.h migration/rdma: Split qemu_fopen_rdma() into input/output functions 2023-07-26 10:55:56 +02:00
ram-compress.c ram-compress.c: Make target independent 2023-05-08 15:25:26 +02:00
ram-compress.h ram.c: Move core decompression code into its own file 2023-05-08 15:25:26 +02:00
ram.c migration: Implement dirty-limit convergence algo 2023-07-26 10:55:56 +02:00
ram.h migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() 2023-07-12 09:25:37 +02:00
rdma.c migration/rdma: Split qemu_fopen_rdma() into input/output functions 2023-07-26 10:55:56 +02:00
rdma.h migration: Export rdma.c functions in its own file 2017-06-01 18:49:23 +02:00
savevm.c migration: Change qemu_file_transferred to noflush 2023-07-26 10:55:56 +02:00
savevm.h migration: Implement switchover ack logic 2023-06-30 06:02:51 +02:00
socket.c migration: Move migrate_use_zero_copy_send() to options.c 2023-04-24 15:01:46 +02:00
socket.h migration: Postcopy preemption preparation on channel creation 2022-07-20 12:15:08 +01:00
target.c vfio/migration: Reset bytes_transferred properly 2023-06-30 06:02:51 +02:00
threadinfo.c migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
threadinfo.h migration/multifd: Protect accesses to migration_threads 2023-07-26 10:55:56 +02:00
tls.c migration: Drop unused parameter for migration_tls_client_create() 2023-05-03 11:24:20 +02:00
tls.h migration: Drop unused parameter for migration_tls_client_create() 2023-05-03 11:24:20 +02:00
trace-events migration: Implement dirty-limit convergence algo 2023-07-26 10:55:56 +02:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c Move CPU softfloat unions to cpu-float.h 2022-04-06 14:31:43 +02:00
vmstate.c qemu-file: Rename qemu_file_transferred_ fast -> noflush 2023-07-26 10:55:56 +02:00
xbzrle.c migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
xbzrle.h migration/xbzrle: Use i386 host/cpuinfo.h 2023-05-23 16:51:18 -07:00
yank_functions.c bulk: Remove pointless QOM casts 2023-06-05 20:48:34 +02:00
yank_functions.h migration: Move the yank unregister of channel_close out 2021-07-26 12:45:03 +01:00