qemu/migration
Peter Xu 53021ea165 migration: Fix missing join() of rp_thread
It's possible that the migration thread skip the join() of the rp_thread in
below race and crash on src right at finishing migration:

       migration_thread                     rp_thread
       ----------------                     ---------
    migration_completion()
                                        (before rp_thread quits)
                                        from_dst_file=NULL
                                        [thread got scheduled out]
      s->rp_state.from_dst_file==NULL
        (skip join() of rp_thread)
    migrate_fd_cleanup()
      qemu_fclose(s->to_dst_file)
      yank_unregister_instance()
        assert(yank_find_entry())  <------- crash

It could mostly happen with postcopy, but that shouldn't be required, e.g., I
think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set.

It's suspected that above race could be the root cause of a recent (but rare)
migration-test break reported by either Dave or PMM:

https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/

The issue is: from_dst_file is reset in the rp_thread, so if the thread reset
it to NULL fast enough then the migration thread will assume there's no
rp_thread at all.

This could potentially cause more severe issue (e.g. crash) after the yank code.

Fix it by using a boolean to keep "whether we've created rp_thread".

Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20210722175841.938739-2-peterx@redhat.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26 12:44:26 +01:00
..
block-dirty-bitmap.c migration/block-dirty-bitmap: make incoming disabled bitmaps busy 2021-03-24 13:41:19 +00:00
block.c migration: using trace_ to replace DPRINTF 2020-10-26 16:15:04 +00:00
block.h
channel.c yank: Unregister function when using TLS migration 2021-06-08 18:50:03 +01:00
channel.h migration: Route errors down through migration_channel_connect 2018-02-06 10:55:12 +00:00
colo-failover.c qemu/atomic.h: rename atomic_ to qatomic_ 2020-09-23 16:07:44 +01:00
colo.c Remove migrate_set_block_enabled in checkpoint 2021-06-11 10:30:13 +08:00
dirtyrate.c hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds 2021-06-08 20:18:26 +01:00
dirtyrate.h migration/dirtyrate: make sample page count configurable 2021-06-08 20:18:25 +01:00
exec.c migration: unify incoming processing 2018-07-10 12:48:53 +01:00
exec.h
fd.c monitor: Use getter/setter functions for cur_mon 2020-10-09 07:08:19 +02:00
fd.h migration: Fix fd protocol for incoming defer 2019-06-05 12:43:55 +02:00
global_state.c migration: Silence compiler warning in global_state_store_running() 2020-10-02 12:28:48 +01:00
meson.build migration: Move populate_vfio_info() into a separate file 2021-05-14 12:31:51 +02:00
migration.c migration: Fix missing join() of rp_thread 2021-07-26 12:44:26 +01:00
migration.h migration: Fix missing join() of rp_thread 2021-07-26 12:44:26 +01:00
multifd-zlib.c multifd: Add zlib compression multifd support 2020-02-28 09:24:43 +01:00
multifd-zstd.c multifd: Add zstd compression multifd support 2020-02-28 09:25:49 +01:00
multifd.c migration/socket: Close the listener at the end 2021-06-08 19:36:19 +01:00
multifd.h migration/tls: add tls_hostname into MultiFDSendParams 2020-09-25 12:45:58 +01:00
page_cache.c migration: Fix cache_init()'s "Failed to allocate" error messages 2021-02-08 11:19:51 +00:00
page_cache.h migration: Clean up signed vs. unsigned XBZRLE cache-size 2021-02-08 11:19:51 +00:00
postcopy-ram.c migration/ram: Handle RAM block resizes during postcopy 2021-05-13 18:21:14 +01:00
postcopy-ram.h migration/: fix some comment spelling errors 2020-09-17 20:36:32 +02:00
qemu-file-channel.c yank: Unregister function when using TLS migration 2021-06-08 18:50:03 +01:00
qemu-file-channel.h
qemu-file.c migration: fix the memory overwriting risk in add_to_iovec 2021-07-05 10:51:26 +01:00
qemu-file.h Header cleanup patches for 2019-08-13 2019-08-16 14:53:43 +01:00
ram.c migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() 2021-07-13 16:21:57 +01:00
ram.h migration: Pre-fault memory before starting background snasphot 2021-04-07 18:37:28 +01:00
rdma.c migration/rdma: prevent from double free the same mr 2021-07-13 16:21:57 +01:00
rdma.h
savevm.c migration: use GDateTime for formatting timestamp in snapshot names 2021-06-14 13:28:50 +01:00
savevm.h migration: Add blocker information 2021-02-08 11:19:51 +00:00
socket.c migration/socket: Close the listener at the end 2021-06-08 19:36:19 +01:00
socket.h migration: unify the framework of socket-type channel 2020-08-28 13:34:52 +01:00
target.c migration: Move populate_vfio_info() into a separate file 2021-05-14 12:31:51 +02:00
tls.c migration/tls: Use qcrypto_tls_creds_check_endpoint() 2021-06-29 18:30:20 +01:00
tls.h migration: Fix Lesser GPL version number 2020-11-15 16:43:28 +01:00
trace-events migration: add trace point when vm_stop_force_state fails 2021-06-14 13:28:50 +01:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vmstate-types.c migration: Replace migration's JSON writer by the general one 2020-12-19 10:39:16 +01:00
vmstate.c migration: Replace migration's JSON writer by the general one 2020-12-19 10:39:16 +01:00
xbzrle.c
xbzrle.h
yank_functions.c yank: Remove dependency on qiochannel 2021-04-01 15:27:44 +04:00
yank_functions.h yank: Remove dependency on qiochannel 2021-04-01 15:27:44 +04:00