mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Emanuele Giuseppe Esposito	68b88468f6	migration: add missing qemu_mutex_lock_iothread in migration_completion qemu_savevm_state_complete_postcopy assumes the iothread lock (BQL) to be held, but instead it isn't. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20211005080751.3797161-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Emanuele Giuseppe Esposito	3c158eba1e	migration: block-dirty-bitmap: add missing qemu_mutex_lock_iothread init_dirty_bitmap_migration assumes the iothread lock (BQL) to be held, but instead it isn't. Instead of adding the lock to qemu_savevm_state_setup(), follow the same pattern as the other ->save_setup callbacks and lock+unlock inside dirty_bitmap_save_setup(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211005080751.3797161-2-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-10-05 13:10:29 +02:00
Markus Armbruster	7d6f6933aa	migration: Handle migration_incoming_setup() errors consistently Commit `b673eab4e2` "multifd: Make multifd_load_setup() get an Error parameter" changed migration_incoming_setup() to take an Error ** argument, and adjusted the callers accordingly. It neglected to change adjust multifd_load_setup(): it still exit()s on error. Clean that up. The error now gets propagated up two call chains: via migration_fd_process_incoming() to rdma_accept_incoming_migration(), and via migration_ioc_process_incoming() to migration_channel_process_incoming(). Both chain ends report the error with error_report_err(), but otherwise ignore it. Behavioral change: we no longer exit() on this error. This is consistent with how we handle other errors here, e.g. from multifd_recv_new_channel() via migration_ioc_process_incoming() to migration_channel_process_incoming(). Whether it's consistently right or consistently wrong I can't tell. Also clean up the return value from the unusual 0 on success, 1 on error to the more common true on success, false on error. Cc: Juan Quintela <quintela@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-11-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2021-08-26 17:15:28 +02:00
Markus Armbruster	f9734d5d40	error: Use error_fatal to simplify obvious fatal errors (again) We did this with scripts/coccinelle/use-error_fatal.cocci before, in commit `50beeb6809` and `007b06578a`. This commit cleans up rarer variations that don't seem worth matching with Coccinelle. Cc: Thomas Huth <thuth@redhat.com> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Peter Xu <peterx@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210720125408.387910-2-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2021-08-26 17:15:28 +02:00
Wei Wang	3143577d6a	migration: clear the memory region dirty bitmap when skipping free pages When skipping free pages to send, their corresponding dirty bits in the memory region dirty bitmap need to be cleared. Otherwise the skipped pages will be sent in the next round after the migration thread syncs dirty bits from the memory region dirty bitmap. Cc: David Hildenbrand <david@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Reported-by: David Hildenbrand <david@redhat.com> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Message-Id: <20210722083055.23352-1-wei.w.wang@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:50:13 +01:00
Peter Xu	39675ffffb	migration: Move the yank unregister of channel_close out It's efficient, but hackish to call yank unregister calls in channel_close(), especially it'll be hard to debug when qemu crashed with some yank function leaked. Remove that hack, but instead explicitly unregister yank functions at the places where needed, they are: (on src) - migrate_fd_cleanup - postcopy_pause (on dst) - migration_incoming_state_destroy - postcopy_pause_incoming Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-6-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:45:03 +01:00
Peter Xu	c6ad5be7ae	migration: Teach QEMUFile to be QIOChannel-aware migration uses QIOChannel typed qemufiles. In follow up patches, we'll need the capability to identify this fact, so that we can get the backing QIOChannel from a QEMUFile. We can also define types for QEMUFile but so far since we only need to be able to identify QIOChannel, introduce a boolean which is simpler. Introduce another helper qemu_file_get_ioc() to return the ioc backend of a qemufile if has_ioc is set. No functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-5-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:59 +01:00
Peter Xu	18711405b5	migration: Introduce migration_ioc_[un]register_yank() There're plenty of places in migration/* that checks against either socket or tls typed ioc for yank operations. Provide two helpers to hide all these information. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-4-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:54 +01:00
Peter Xu	43044ac0ee	migration: Make from_dst_file accesses thread-safe Accessing from_dst_file is potentially racy in current code base like below: if (s->from_dst_file) do_something(s->from_dst_file); Because from_dst_file can be reset right after the check in another thread (rp_thread). One example is migrate_fd_cancel(). Use the same qemu_file_lock to protect it too, just like to_dst_file. When it's safe to access without lock, comment it. There's one special reference in migration_thread() that can be replaced by the newly introduced rp_thread_created flag. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210722175841.938739-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> with Peter's fixup	2021-07-26 12:44:46 +01:00
Peter Xu	53021ea165	migration: Fix missing join() of rp_thread It's possible that the migration thread skip the join() of the rp_thread in below race and crash on src right at finishing migration: migration_thread rp_thread ---------------- --------- migration_completion() (before rp_thread quits) from_dst_file=NULL [thread got scheduled out] s->rp_state.from_dst_file==NULL (skip join() of rp_thread) migrate_fd_cleanup() qemu_fclose(s->to_dst_file) yank_unregister_instance() assert(yank_find_entry()) <------- crash It could mostly happen with postcopy, but that shouldn't be required, e.g., I think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set. It's suspected that above race could be the root cause of a recent (but rare) migration-test break reported by either Dave or PMM: https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/ The issue is: from_dst_file is reset in the rp_thread, so if the thread reset it to NULL fast enough then the migration thread will assume there's no rp_thread at all. This could potentially cause more severe issue (e.g. crash) after the yank code. Fix it by using a boolean to keep "whether we've created rp_thread". Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-2-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:26 +01:00
Peter Xu	63268c4970	migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Taking the mutex every time for each dirty bit to clear is too slow, especially we'll take/release even if the dirty bit is cleared. So far it's only used to sync with special cases with qemu_guest_free_page_hint() against migration thread, nothing really that serious yet. Let's move the lock to be upper. There're two callers of migration_bitmap_clear_dirty(). For migration, move it into ram_save_iterate(). With the help of MAX_WAIT logic, we'll only run ram_save_iterate() for no more than 50ms-ish time, so taking the lock once there at the entry. It also means any call sites to qemu_guest_free_page_hint() can be delayed; but it should be very rare, only during migration, and I don't see a problem with it. For COLO, move it up to colo_flush_ram_cache(). I think COLO forgot to take that lock even when calling ramblock_sync_dirty_bitmap(), where another example is migration_bitmap_sync() who took it right. So let the mutex cover both the ramblock_sync_dirty_bitmap() and migration_bitmap_clear_dirty() calls. It's even possible to drop the lock so we use atomic operations upon rb->bmap and the variable migration_dirty_pages. I didn't do it just to still be safe, also not predictable whether the frequent atomic ops could bring overhead too e.g. on huge vms when it happens very often. When that really comes, we can keep a local counter and periodically call atomic ops. Keep it simple for now. Cc: Wei Wang <wei.w.wang@intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Leonardo Bras Soares Passos <lsoaresp@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210630200805.280905-1-peterx@redhat.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca7bd0821b	migration: Clear error at entry of migrate_fd_connect() For each "migrate" command, remember to clear the s->error before going on. For one reason, when there's a new error it'll be still remembered; see migrate_set_error() who only sets the error if error==NULL. Meanwhile if a failed migration completes (e.g., postcopy recovered and finished), we shouldn't dump an error when calling migrate_fd_cleanup() at last. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	ca30f24d12	migration: Don't do migrate cleanup if during postcopy resume Below process could crash qemu with postcopy recovery: 1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Peter Xu	2e3e3da3c2	migration: Release return path early for paused postcopy When postcopy pause triggered, we rely on the migration thread to cleanup the to_dst_file handle, and the return path thread to cleanup the from_dst_file handle (which is stored in the local variable "rp"). Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's setup again due to a postcopy recovery. It used to work before yank was born; after yank is introduced we rely on the refcount of IOC to correctly unregister yank function in channel_close(). If without the early and on-time release of from_dst_file handle the yank function will be leftover during paused postcopy. Without this patch, below steps (quoted from Xiaohui) could trigger qemu src crash: 1.Boot vm on src host 2.Boot vm on dst host 3.Enable postcopy on src&dst host 4.Load stressapptest in vm and set postcopy speed to 50M 5.Start migration from src to dst host, change into postcopy mode when migration is active. 6.When postcopy is active, down the network card(do migration via this network) on dst host. 7.Wait untill postcopy is paused on src&dst host. 8.Before up network card, recover migration on dst host, will get error like following. 9.Ignore the error of step 8, go on recovering migration on src host: After step 9, qemu on src host will core dump after some seconds: qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. 1.sh: line 38: 44662 Aborted (core dumped) Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-2-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Laurent Vivier	a51dcef08b	migration: failover: emit a warning when the card is not fully unplugged When the migration fails or is canceled we wait the end of the unplug operation to be able to plug it back. But if the unplug operation is never finished we stop to wait and QEMU emits a warning to inform the user. Based-on: 20210629155007.629086-1-lvivier@redhat.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210701131458.112036-1-lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Li Zhijian	224f364a49	migration/rdma: prevent from double free the same mr backtrace: '0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 478 void *addr = mr->addr; (gdb) bt #0 0x00007ffff5f44ec2 in __ibv_dereg_mr_1_1 (mr=0x7fff1007d390) at /home/lizhijian/rdma-core/libibverbs/verbs.c:478 #1 0x0000555555891fcc in rdma_delete_block (block=<optimized out>, rdma=0x7fff38176010) at ../migration/rdma.c:691 #2 qemu_rdma_cleanup (rdma=0x7fff38176010) at ../migration/rdma.c:2365 #3 0x00005555558925b0 in qio_channel_rdma_close_rcu (rcu=0x555556b8b6c0) at ../migration/rdma.c:3073 #4 0x0000555555d652a3 in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:281 #5 0x0000555555d5edf9 in qemu_thread_start (args=0x7fffe88bb4d0) at ../util/qemu-thread-posix.c:541 #6 0x00007ffff54c73f9 in start_thread () at /lib64/libpthread.so.0 #7 0x00007ffff53f3b03 in clone () at /lib64/libc.so.6 ' Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210708144521.1959614-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-13 16:21:57 +01:00
Olaf Hering	179a808045	migration: fix typo in mig_throttle_guest_down comment Fixes commit `3d0684b2ad` ("ram: Update all functions comments") Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210708162159.18045-1-olaf@aepfle.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	eb1960aac1	misc: Remove redundant new line in perror() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210706094433.1766952-1-lizhijian@cn.fujitsu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-07-09 18:42:46 +02:00
Li Zhijian	e5f607913c	migration/rdma: Use error_report to suppress errno message Since the prior calls are successful, in this case a errno doesn't indicate a real error which would just make us confused. before: (qemu) migrate -d rdma:192.168.22.23:8888 source_resolve_host RDMA Device opened: kernel name rxe_eth0 uverbs device name uverbs2, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs2, infiniband class device path /sys/class/infiniband/rxe_eth0, transport: (2) Ethernet rdma_get_cm_event != EVENT_ESTABLISHED after rdma_connect: No space left on device Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210628071959.23455-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	944bc52842	migration: failover: continue to wait card unplug on error If the user cancels the migration in the unplug-wait state, QEMU will try to plug back the card and this fails because the card is partially unplugged. To avoid the problem, continue to wait the card unplug, but to allow the migration to be canceled if the card never finishes to unplug use a timeout. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629155007.629086-3-lvivier@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Laurent Vivier	fde93d99d9	migration: move wait-unplug loop to its own function The loop is used in migration_thread() and bg_migration_thread(), so we can move it to its own function and call it from these both places. Moreover, in migration_thread() we have a wrong state transition from SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly managed in bg_migration_thread() so use this code instead. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210629155007.629086-2-lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	b7f9afd48e	migration: Allow reset of postcopy_recover_triggered when failed It's possible qemu_start_incoming_migration() failed at any point, when it happens we should reset postcopy_recover_triggered to false so that the user can still retry with a saner incoming port. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210629181356.217312-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Peter Xu	cc48c587d2	migration: Move yank outside qemu_start_incoming_migration() Starting from commit `b5eea99ec2`, qmp_migrate_recover() calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance(), but I think that's wrong. Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, calling qmp_migrate_recover() twice with `b5eea99ec2` will directly crash the dest qemu (right now it can't; but it'll start to work right after the next patch) because the 1st call of qmp_migrate_recover() will unregister permanently when the channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). This patch fixes it by moving yank ops out of qemu_start_incoming_migration() into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of yank instance too since we keep it there during the recovery phase. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629181356.217312-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-05 10:51:26 +01:00
Feng Lin	c00d434ac6	migration: fix the memory overwriting risk in add_to_iovec When testing migration, a Segmentation fault qemu core is generated. 0 error_free (err=0x1) 1 0x00007f8b862df647 in qemu_fclose (f=f@entry=0x55e06c247640) 2 0x00007f8b8516d59a in migrate_fd_cleanup (s=s@entry=0x55e06c0e1ef0) 3 0x00007f8b8516d66c in migrate_fd_cleanup_bh (opaque=0x55e06c0e1ef0) 4 0x00007f8b8626a47f in aio_bh_poll (ctx=ctx@entry=0x55e06b5a16d0) 5 0x00007f8b8626e71f in aio_dispatch (ctx=0x55e06b5a16d0) 6 0x00007f8b8626a33d in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) 7 0x00007f8b866bdba4 in g_main_context_dispatch () 8 0x00007f8b8626cde9 in glib_pollfds_poll () 9 0x00007f8b8626ce62 in os_host_main_loop_wait (timeout=<optimized out>) 10 0x00007f8b8626cffd in main_loop_wait (nonblocking=nonblocking@entry=0) 11 0x00007f8b862ef01f in main_loop () Using gdb print the struct QEMUFile f = { ..., iovcnt = 65, last_error = 21984, last_error_obj = 0x1, shutdown = true } Well iovcnt is overflow, because the max size of MAX_IOV_SIZE is 64. struct QEMUFile { ...; struct iovec iov[MAX_IOV_SIZE]; unsigned int iovcnt; int last_error; Error *last_error_obj; bool shutdown; }; iovcnt and last_error is overwrited by add_to_iovec(). Right now, add_to_iovec() increase iovcnt before check the limit. And it seems that add_to_iovec() assumes that iovcnt will set to zero in qemu_fflush(). But qemu_fflush() will directly return when f->shutdown is true. The situation may occur when libvirtd restart during migration, after f->shutdown is set, before calling qemu_file_set_error() in qemu_file_shutdown(). So the safiest way is checking the iovcnt before increasing it. Signed-off-by: Feng Lin <linfeng23@huawei.com> Message-Id: <20210625062138.1899-1-linfeng23@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fix typo in 'writeable' which is actually misnamed 'writable'	2021-07-05 10:51:26 +01:00
Philippe Mathieu-Daudé	5590f65fac	migration/tls: Use qcrypto_tls_creds_check_endpoint() Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-29 18:30:20 +01:00
David Hildenbrand	8dbe22c686	memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap() Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-15 20:27:38 +02:00
Daniel P. Berrangé	85cd1cc668	migration: use GDateTime for formatting timestamp in snapshot names The GDateTime APIs provided by GLib avoid portability pitfalls, such as some platforms where 'struct timeval.tv_sec' field is still 'long' instead of 'time_t'. When combined with automatic cleanup, GDateTime often results in simpler code too. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Daniel P. Berrangé	626ff6515d	migration: add trace point when vm_stop_force_state fails This is a critical failure scenario for migration that is hard to diagnose from existing probes. Most likely it is caused by an error from bdrv_flush(), but we're not logging the errno anywhere, hence this new probe. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2021-06-14 13:28:50 +01:00
Rao, Lei	3ba024457f	Remove migrate_set_block_enabled in checkpoint We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Jason Wang <jasowang@redhat.com>	2021-06-11 10:30:13 +08:00
Peter Xu	a4a571d978	hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds These two commands are missing when adding the QMP sister commands. Add them, so developers can play with them easier. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <4cc0039fc3ad6145136770cf3b0f056c09a2910b.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 20:18:26 +01:00
Hyman Huang(黄勇)	7afa08cd8f	migration/dirtyrate: make sample page count configurable introduce optional sample-pages argument in calc-dirty-rate, making sample page count per GB configurable so that more accurate dirtyrate can be calculated. Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn> Message-Id: <3103453a3b2796f929269c99a6ad81a9a7f1f405.1623027729.git.huangy81@chinatelecom.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Wrapped a couple of long lines	2021-06-08 20:18:25 +01:00
Dr. David Alan Gilbert	a59136f3b1	migration/socket: Close the listener at the end Delay closing the listener until the cleanup hook at the end; mptcp needs the listener to stay open while the other paths come in. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:19 +01:00
Dr. David Alan Gilbert	1df6ddb43b	migration: Add cleanup hook for inwards migration Add a cleanup hook for incoming migration that gets called at the end as a way for a transport to allow cleanup. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:18 +01:00
Li Zhijian	6b8c2eb5c6	migration/rdma: Fix cm event use after free Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210602023506.3821293-1-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:55:43 +01:00
Leonardo Bras	7de2e85653	yank: Unregister function when using TLS migration After yank feature was introduced in migration, whenever migration is started using TLS, the following error happens in both source and destination hosts: (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. This happens because of a missing yank_unregister_function() when using qio-channel-tls. Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform yank_unregister_function() in channel_close() and multifd_load_cleanup(). Also, inside migration_channel_connect() and migration_channel_process_incoming() move yank_register_function() so it only runs once on a TLS migration. Fixes: `b5eea99ec2` ("migration: Add yank feature", 2021-01-13) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Changes since v2: - Dropped all references to ioc->master - yank_register_function() and yank_unregister_function() now only run once in a TLS migration. Changes since v1: - Cast p->c to QIOChannelTLS into multifd_load_cleanup() Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:50:03 +01:00
Stefano Garzarella	d0fb9657a3	docs: fix references to docs/devel/tracing.rst Commit `e50caf4a5c` ("tracing: convert documentation to rST") converted docs/devel/tracing.txt to docs/devel/tracing.rst. We still have several references to the old file, so let's fix them with the following command: sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt) Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210517151702.109066-2-sgarzare@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-06-02 06:51:09 +02:00
Peter Maydell	c5847f5e4e	Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEERfXHG0oMt/uXep+pBRYzHrxb/ecFAmCuiMIACgkQBRYzHrxb /edPvxAAimpeeMuCBKPswmuxyAqAoes6FKWNohzCyTCWXEmhvZGFl6SEgMxW0YWh DRLfyGUp9CODqNS2oVbj3I7/R5rDvpaqWFJ8GiBH99pc5B+pxYtXjGhjR8JWZ691 ZvoQsV/nN9dC7woIG5rcus09aPg1X8n8lD2GcV007Gacj3FGGHY6+DOmpfYmrdoG QY2OTQ2lMW3l1ACOhtdpWzVKgldYwTDLZsGWuFy+z5b64ijXyImpWa39Hu0HoWdS vds5D9GGMfI8kgOlIgsCoC26kre40952VqhOS5ie3axSZt48kufLMUaspiHMiA/B mwmrqMrNXoqC7pAWnv8Ur4NjEBlTkPtL4+kIMX0zeqJsRG24Fiee0e42CvNjFixF 1F419HHKFQPYOvCb/SPLv18gmcyfHdTEbB+yjYIO48ccrPfu/LtTTKG85D1lFw7C cy3AWKz6EzYuzs0wtia+0K+5UwOhm2h9Z77kyX5MIfyrb+KWAfupFW6l9NnfmBtp +sJBSkIrph6oNmZADx2eSwev+pdlcy0uTOULOiPTF4Iu826LQxoaV9pFcvA+/HXZ EbKNrYxDNNYCugLybtULIiFJ8f1i90/UtQlrqAr21deC+gJYliDZScXnzi6lPuv7 afn4QgMmFLus4ehB0TKcA7QbWL3yyrivXO4TSQNzcZaeXAvHOw0= =dIM2 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20210526a' into staging Virtiofs, migration and hmp pull 2021-05-26 Fixes for a loadvm regression from Kevin, some virtiofsd cleanups from Vivek and Mahmoud, and some RDMA migration fixups from Li. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Wed 26 May 2021 18:43:30 BST # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20210526a: migration/rdma: source: poll cm_event from return path migration/rdma: destination: create the return patch after the first accept migration/rdma: Fix rdma_addrinfo res leaks migration/rdma: cleanup rdma in rdma_start_incoming_migration error path migration/rdma: Fix cm_event used before being initialized tools/virtiofsd/fuse_opt.c: Replaced a malloc with GLib's g_try_malloc tools/virtiofsd/buffer.c: replaced a calloc call with GLib's g_try_new0 virtiofsd: Set req->reply_sent right after sending reply virtiofsd: Check EOF before short read virtiofsd: Simplify skip byte logic virtiofsd: get rid of in_sg_left variable virtiofsd: Use iov_discard_front() to skip bytes virtiofsd: Get rid of unreachable code in read virtiofsd: Check for EINTR in preadv() and retry hmp: Fix loadvm to resume the VM on success instead of failure Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-27 14:57:01 +01:00
Li Zhijian	e49e49dd73	migration/rdma: source: poll cm_event from return path source side always blocks if postcopy is only enabled at source side. users are not able to cancel this migration in this case. Let source side have chance to cancel this migration Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-4-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Typo fix	2021-05-26 18:39:32 +01:00
Li Zhijian	44bcfd45e9	migration/rdma: destination: create the return patch after the first accept destination side: $ build/qemu-system-x86_64 -enable-kvm -netdev tap,id=hn0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device e1000,netdev=hn0,mac=50:52:54:00:11:22 -boot c -drive if=none,file=./Fedora-rdma-server-migration.qcow2,id=drive-virtio-disk0 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -m 2048 -smp 2 -device piix3-usb-uhci -device usb-tablet -monitor stdio -vga qxl -spice streaming-video=filter,port=5902,disable-ticketing -incoming rdma:192.168.1.10:8888 (qemu) migrate_set_capability postcopy-ram on (qemu) dest_init RDMA Device opened: kernel name rocep1s0f0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/rocep1s0f0, transport: (2) Ethernet Segmentation fault (core dumped) (gdb) bt #0 qemu_rdma_accept (rdma=0x0) at ../migration/rdma.c:3272 #1 rdma_accept_incoming_migration (opaque=0x0) at ../migration/rdma.c:3986 #2 0x0000563c9e51f02a in aio_dispatch_handler (ctx=ctx@entry=0x563ca0606010, node=0x563ca12b2150) at ../util/aio-posix.c:329 #3 0x0000563c9e51f752 in aio_dispatch_handlers (ctx=0x563ca0606010) at ../util/aio-posix.c:372 #4 aio_dispatch (ctx=0x563ca0606010) at ../util/aio-posix.c:382 #5 0x0000563c9e4f4d9e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306 #6 0x00007fe96ef3fa9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0 #7 0x0000563c9e4ffeb8 in glib_pollfds_poll () at ../util/main-loop.c:231 #8 os_host_main_loop_wait (timeout=12188789) at ../util/main-loop.c:254 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530 #10 0x0000563c9e3c7211 in qemu_main_loop () at ../softmmu/runstate.c:725 #11 0x0000563c9dfd46fe in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 The rdma return path will not be created when qemu incoming is starting since migrate_copy() is false at that moment, then a NULL return path rdma was referenced if the user enabled postcopy later. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210525080552.28259-3-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	f53b450ada	migration/rdma: Fix rdma_addrinfo res leaks rdma_freeaddrinfo() is the reverse operation of rdma_getaddrinfo() Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210525080552.28259-2-lizhijian@cn.fujitsu.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	4e812d2338	migration/rdma: cleanup rdma in rdma_start_incoming_migration error path the error path after calling qemu_rdma_dest_init() should do rdma cleanup Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210520081148.17001-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Li Zhijian	efb208dc9c	migration/rdma: Fix cm_event used before being initialized A segmentation fault was triggered when i try to abort a postcopy + rdma migration. since rdma_ack_cm_event releases a uninitialized cm_event in these case. like below: 2496 ret = rdma_get_cm_event(rdma->channel, &cm_event); 2497 if (ret) { 2498 perror("rdma_get_cm_event after rdma_connect"); 2499 ERROR(errp, "connecting to destination!"); 2500 rdma_ack_cm_event(cm_event); <<<< cause segmentation fault 2501 goto err_rdma_source_connect; 2502 } Refer to the rdma_get_cm_event() code, cm_event will be updated/changed only if rdma_get_cm_event() returns 0. So it's okey to remove the ack in error patch. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Message-Id: <20210519064740.10828-1-lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-26 18:39:32 +01:00
Paolo Bonzini	b02629550d	replication: move include out of root directory The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-26 14:49:46 +02:00
Peter Maydell	9b1e81d1c2	* Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks -----BEGIN PGP SIGNATURE----- iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmCeXFMRHHRodXRoQHJl ZGhhdC5jb20ACgkQLtnXdP5wLbUV7g//VRxN74v2pO6zSsSNavYTkMa3jZE1Io3l w9lsOr3DM0HIMhyEMwSJsiygj0s8TNanUtFBHfDfo1k1gmZYd5FiM1sB7ZB/sB/S 9Q9wjmeV2NMG7pcr9zLmiJaEM2LGfGbso44/m0c+gpSyxzg2TeH7sF+38AUZI9/R J6/gOzIs4WWdSV4f8kp+YPPQCyOtxZsfxDuk1z1fKsBgXE5iEBUqYyyZUm4gYJkN 4ALmqc/NVeLszt08qkPHxwXQc488nCJP31psxx6MQ7toKfCAamT07Pp3oi70cakC y+HVfsIHc7SxaZyj8sbJCU3LJHwDd8N82ydJUrv216qDffb38fNb+m6y9mcYy2MH C25uGe9mucTuTP1WIttC6nCKg/MCgi4PWIzqEhkeKj3TxpTUJJP+BUIfSubV38Gc T+XUCNkWWW8sTeRiyE3m9pEZ+gz1MFubaIr/Owephch1SjRYn4zUwJFHHi3I4PY4 7XjEo8y2M9tKckW3pCfYi+aIDzC1DcRtzvXUGtdemX5xVjTAGlKkFLWl/brdCn2U y+JrwrL8cQ3Fy7bXzmK6m9mmhcIW6PcSOBE7RnSESyKzqM5VBSBgnjMqnHcP3FrV FYzihTCDcFKy3ELUe3Kbc1TgG/07stQs9xInGXN7ZFsIwadfHgoNVQ6JpOqq1oQa fvmLPBhq+a4= =QZuG -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging * Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks # gpg: Signature made Fri 14 May 2021 12:17:39 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/thuth-gitlab/tags/pull-request-2021-05-14: cirrus.yml: Fix the MSYS2 task pc-bios/s390-ccw: Fix inline assembly for older versions of Clang tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths configure: Poison all current target-specific #defines migration: Move populate_vfio_info() into a separate file include/sysemu: Poison all accelerator CONFIG switches in common code tests: Avoid side effects inside g_assert() arguments tests/qtest/rtc-test: Remove pointless NULL check tests/qtest/tpm-util.c: Free memory with correct free function tests/migration-test: Fix "true" vs true tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic util/compatfd.c: Replaced a malloc call with g_malloc. libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm docs/devel/qgraph: add troubleshooting information libqos/qgraph: fix "UNAVAILBLE" typo gitlab-ci: Replace YAML anchors by extends (native_test_job) gitlab-ci: Replace YAML anchors by extends (native_build_job) gitlab-ci: Replace YAML anchors by extends (container_job) tests/docker/dockerfiles: Add ccache to containers where it was missing Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-05-14 19:33:23 +01:00
Thomas Huth	43bd0bf30f	migration: Move populate_vfio_info() into a separate file The CONFIG_VFIO switch only works in target specific code. Since migration/migration.c is common code, the #ifdef does not have the intended behavior here. Move the related code to a separate file now which gets compiled via specific_ss instead. Fixes: `3710586caa` ("qapi: Add VFIO devices migration stats in Migration stats") Message-Id: <20210414112004.943383-3-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2021-05-14 12:31:51 +02:00
David Hildenbrand	542147f4e5	migration/ram: Use offset_in_ramblock() in range checks We never read or write beyond the used_length of memory blocks when migrating. Make this clearer by using offset_in_ramblock() consistently. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-11-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	c1668bde5c	migration/multifd: Print used_length of memory block We actually want to print the used_length, against which we check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-10-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	898ba906cc	migration/ram: Handle RAM block resizes during postcopy Resizing while migrating is dangerous and does not work as expected. The whole migration code works with the usable_length of a ram block and does not expect this value to change at random points in time. In the case of postcopy, relying on used_length is racy as soon as the guest is running. Also, when used_length changes we might leave the uffd handler registered for some memory regions, reject valid pages when migrating and fail when sending the recv bitmap to the source. Resizing can be trigger after (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Let's remember the original used_length in a separate variable and use it in relevant postcopy code. Make sure to update it when we resize during precopy, when synchronizing the RAM block sizes with the source. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-9-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
David Hildenbrand	6a23f6399a	migration/ram: Simplify host page handling in ram_load_postcopy() Add two new helper functions. This will come in come handy once we want to handle ram block resizes while postcopy is active. Note that ram_block_from_stream() will already print proper errors. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-8-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Added brackets in host_page_from_ram_block_offset to cause uintptr_t to cast the sum, to fix armhf-cross build	2021-05-13 18:21:13 +01:00
David Hildenbrand	cc61c703b6	migration/ram: Discard RAM when growing RAM blocks after ram_postcopy_incoming_init() In case we grow our RAM after ram_postcopy_incoming_init() (e.g., when synchronizing the RAM block state with the migration source), the resized part would not get discarded. Let's perform that when being notified about a resize while postcopy has been advised, but is not listening yet. With precopy, the process is as following: 1. VM created - RAM blocks are created 2. Incomming migration started - Postcopy is advised - All pages in RAM blocks are discarded 3. Precopy starts - RAM blocks are resized to match the size on the migration source. - RAM pages from precopy stream are loaded - Uffd handler is registered, postcopy starts listening 4. Guest started, postcopy running - Pagefaults get resolved, pages get placed Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-7-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:13 +01:00

1 2 3 4 5 ...

1439 Commits