mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Peter Maydell	17182bb47f	VFIO fixes 2018-08-23 - Fix coverity reported issue with use of realpath (Alex Williamson) - Cleanup file descriptor in error path (Alex Williamson) - Fix postcopy use of new balloon inhibitor (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJbfuTxAAoJECObm247sIsiO0sP/RJKVP52D5ZXChdv/Fxwo2N/ 5UulKRDVu1RAGKPFxww2JHN3hoo4Jz3keGSnK4l300pMBbxRLZsB8189IIb95ejH EzRFM73icK2//2zunFnv7V8CjcBS9Ol2ST6OdGmHk+kp5XFEb0wsysPn7lL+b0Uv +Ijext7xgPqqc6Lz7WiKcjcducD6Rm8ReIItvYw0S7ePVGviK8b/lM2BtfQOZY0W hlkp89za5l4fZigpHFy3XM9v1LEbtAUwnIWh+iHMelRU91og07cvnMuLINdaXyy9 lF52REiPL1jOTRQZMOuUUcuXBAHgeHtoVd837TRuZTI2kcresSC3b3SVgu7zG0To Z48fVIaFxmwjAcwJmzckash2Jhk3CfSE5HJvX1CODtWbSLh+o28MekxGGMPWpYM/ XJK8kTY7atace72j86f05INE4jPwQ3okak1jLb7FXz2LfRjpplPKeLEzSnhjm0dg 63e0c88D1eNTEad6H1WYrq9WZ7yjgEgu3jDBTtah+ZKKHPltUbOkc9j+ZkHWFsFK VIHNAD1X8TJCqj5vxSEAgiqT+XxIn8LfTrfCoeORJsoEfGJaRPTKMkdEGq7M1/k4 XMVJY7uK5C1p7loaUcqKs2J3RkIKuE4HvDWLXkeeAFHc1s9wtOzEBoKucREGt0UR V7L6J9paWJA4YQMy5SLw =aRcH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20180823.1' into staging VFIO fixes 2018-08-23 - Fix coverity reported issue with use of realpath (Alex Williamson) - Cleanup file descriptor in error path (Alex Williamson) - Fix postcopy use of new balloon inhibitor (Alex Williamson) # gpg: Signature made Thu 23 Aug 2018 17:46:41 BST # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-fixes-20180823.1: postcopy: Synchronize usage of the balloon inhibitor vfio/pci: Fix failure to close file descriptor on error vfio/pci: Handle subsystem realpath() returning NULL Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-08-25 10:59:06 +01:00
Alex Williamson	154304cd6e	postcopy: Synchronize usage of the balloon inhibitor While the qemu_balloon_inhibit() interface appears rather general purpose, postcopy uses it in a last-caller-wins approach with no guarantee of balanced inhibits and de-inhibits. Wrap postcopy's usage of the inhibitor to give it one vote overall, using the same last-caller-wins approach as previously implemented at the balloon level. Fixes: `01ccbec7bd` ("balloon: Allow multiple inhibit users") Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-23 10:45:58 -06:00
Xiao Guangrong	ae526e32bd	migration: hold the lock only if it is really needed Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	5e5fdcff28	migration: move handle of zero page to the thread Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6ef3771c0d	migration: drop the return value of do_compress_ram_page It is not used and cleans the code up a little Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6c97ec5f5a	migration: introduce save_zero_page_to_file It will be used by the compression threads Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:10 +02:00
Xiao Guangrong	980a19a929	migration: fix counting normal page for compression The compressed page is not normal page Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:21 +02:00
Xiao Guangrong	1d58872a91	migration: do not wait for free thread Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wants the old behavior Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:11 +02:00
Lidong Chen	923709896b	migration: poll the cm event for destination qemu The destination qemu only poll the comp_channel->fd in qemu_rdma_wait_comp_channel. But when source qemu disconnnect the rdma connection, the destination qemu should be notified. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:17:43 +02:00
Lidong Chen	54db882f07	migration: implement the shutdown for RDMA QIOChannel Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 #2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 #3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 #4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 #5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 #6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 #7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 #8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 #9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 #11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 #2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 #3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 #4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 #5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:14:45 +02:00
Lidong Chen	d5882995a1	migration: poll the cm event while wait RDMA work request completion If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function maybe loop forever. so we should also poll the cm event fd, and when receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL, we consider some error happened. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: Gal Shachaf <galsha@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:14:19 +02:00
Lidong Chen	5d5f4d8436	migration: invoke qio_channel_yield only when qemu_in_coroutine() when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 #2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 #5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 #6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 #7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 #8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 #9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:59 +02:00
Lidong Chen	4d9f675bcb	migration: implement io_set_aio_fd_handler function for RDMA QIOChannel if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: #2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 #3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 #4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 #5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 #6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 #7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 #8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 #9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 #13 0x00007f96fe858760 in ?? () #14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:11 +02:00
Lidong Chen	f5627c2af9	migration: Stop rdma yielding during incoming postcopy During incoming postcopy, the destination qemu will invoke qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma yield, and poll the completion channel fd instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:02 +02:00
Lidong Chen	74637e6f08	migration: implement bi-directional RDMA QIOChannel This patch implements bi-directional RDMA QIOChannel. Because different threads may access RDMAQIOChannel currently, this patch use RCU to protect it. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:26 +02:00
Lidong Chen	55cc1b5937	migration: create a dedicated connection for rdma return path If start a RDMA migration with postcopy enabled, the source qemu establish a dedicated connection for return path. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:16 +02:00
Lidong Chen	ccb7e1b5a6	migration: disable RDMA WRITE after postcopy started RDMA WRITE operations are performed with no notification to the destination qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE after postcopy started. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:07 +02:00
Li Qiang	4cbc9c7ffd	migrate/cpu-throttle: Add max-cpu-throttle migration parameter Currently, the default maximum CPU throttle for migration is 99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable performance effect for the guest. We see a lot of packets latency exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set adds a new max-cpu-throttle parameter to limit the CPU throttle. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 11:42:34 +02:00
Peter Maydell	6f4923fcad	migration: Correctly handle subsections with no 'needed' function Currently the vmstate subsection handling code treats a subsection with no 'needed' function pointer as if it were the subsection list terminator, so the subsection is never transferred and nor is any subsection following it in the list. Handle NULL 'needed' function pointers in subsections in the same way that we do for top level VMStateDescription structures: treat the subsection as always being needed. This doesn't change behaviour for the current set of devices in the tree, because all subsections declare a 'needed' function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 11:40:47 +02:00
Junyan He	56eb90af39	migration/ram: ensure write persistence on loading all data to PMEM. Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Junyan He	469dd51bc6	migration/ram: Add check and info message to nvdimm post copy. The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Lidong Chen	4b3fb65db9	migration: fix duplicate initialization for expected_downtime and cleanup_bh migrate_fd_connect duplicate initialize expected_downtime and cleanup_bh. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Message-Id: <1532434585-14732-2-git-send-email-lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:28:57 +01:00
Peter Xu	97ca211c62	migration: disallow recovery for release-ram Postcopy recovery won't work well with release-ram capability since release-ram will drop the page buffer as long as the page is put into the send buffer. So if there is a network failure happened, any page buffers that have not yet reached the destination VM but have already been sent from the source VM will be lost forever. Let's refuse the client from resuming such a postcopy migration. Luckily release-ram was designed to only be used when src and destination VMs are on the same host, so it should be fine. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:10:59 +01:00
Peter Xu	814bb08f17	migration: update recv bitmap only on dest vm We shouldn't update the received bitmap if we're the source VM. This fixes a breakage when release-ram is enabled on postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-2-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:10:41 +01:00
Dr. David Alan Gilbert	57225e5f32	migrate: Fix cancelling state warning We've been getting the warning: migration_iteration_finish: Unknown ending state 2 on a cancel. I think that's originally due to 39b9e17905c; although I've only seen the warning, I think that in some cases that we could find the VM stays paused after a cancel where it should restart. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180719092257.12703-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:00:45 +01:00
Peter Xu	4fcefd44a0	migration: fix potential overflow in multifd send I would guess it won't happen normally, but this should ease Coverity. >>> CID 1394385: Integer handling issues (OVERFLOW_BEFORE_WIDEN) >>> Potentially overflowing expression "pages->used * 8192U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). 854 transferred = pages->used * TARGET_PAGE_SIZE + p->packet_len; Fixes: CID 1394385 CC: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180720034713.11711-1-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 16:58:51 +01:00
Peter Xu	858b6d6224	migration: reorder MIG_CMD_POSTCOPY_RESUME It was accidently added before MIG_CMD_PACKAGED so it might break command compatibility when we run postcopy migration between old/new QEMUs. Fix that up quickly before the QEMU 3.0 release. Reported-by: Lukáš Doktor <ldoktor@redhat.com> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710094424.30754-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 15:23:23 +01:00
Peter Xu	3c9928d9f9	migration: show pause/recover state on dst host These two states will be missing when doing "query-migrate" on destination VM. Add these states so that we can get the query results as expected. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-5-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:37 +01:00
Peter Xu	a725ef9fe3	migration: fix incorrect bitmap size calculation The calculation on size of received bitmap is incorrect for postcopy recovery. Here we wanted to let the size to cover all the valid bits in the bitmap, we should use DIV_ROUND_UP() instead of a division. For example, a RAMBlock with size=4K (which contains only one single 4K page) will have nbits=1, then nbits/8=0, then the real bitmap won't be sent to source at all. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-4-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:18 +01:00
Peter Xu	fd037a656a	migration: loosen recovery check when load vm We were checking against -EIO, assuming that it will cover all IO failures. But actually it is not. One example is that in qemu_loadvm_section_start_full() we can have tons of places that will return -EINVAL even if the error is caused by IO failures on the network. Let's loosen the recovery check logic here to cover all the error cases happened by removing the explicit check against -EIO. After all we won't lose anything here if any other failure happened. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:07 +01:00
Peter Xu	1aa8367861	migration: simplify check to use qemu file buffer Firstly, renaming the old matching_page_sizes variable to matches_target_page_size, which suites more to what it did (it only checks against target page size rather than multiple page sizes). Meanwhile, simplify the check logic a bit, and enhance the comments. Should have no functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:55:59 +01:00
Peter Xu	a429e7f488	migration: unify incoming processing This is the 2nd patch to unbreak postcopy recovery. Let's unify the migration_incoming_process() call at a single place rather than calling it in connection setup codes. This fixes a problem that we will go into incoming migration procedure even if we are trying to recovery from a paused postcopy migration. Fixes: `36c2f8be2c` ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-5-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	884835fa1e	migration: unbreak postcopy recovery The whole postcopy recovery logic was accidentally broken. We need to fix it in two steps. This is the first step that we should do the recovery when needed. It was bypassed before after commit `36c2f8be2c`. Introduce postcopy_try_recovery() helper for the postcopy recovery logic. Call it both in migration_fd_process_incoming() and migration_ioc_process_incoming(). Fixes: `36c2f8be2c` ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-4-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	81e620531f	migration: move income process out of multifd Move the call to migration_incoming_process() out of multifd code. It's a bit strange that we can migration generic calls in multifd code. Instead, let multifd_recv_new_channel() return a boolean showing whether it's ready to continue the incoming migration. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	eed1cc7866	migration: delay postcopy paused state Before this patch we firstly setup the postcopy-paused state then we clean up the QEMUFile handles. That can be racy if there is a very fast "migrate-recover" command running in parallel. Fix that up. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Vladimir Sementsov-Ogievskiy	58f72b965e	dirty-bitmap: fix double lock on bitmap enabling Bitmap lock/unlock were added to bdrv_enable_dirty_bitmap in `8b1402ce80`, but some places were not updated correspondingly, which leads to trying to take this lock twice, which is dead-lock. Fix this. Actually, iotest 199 (about dirty bitmap postcopy migration) is broken now, and this fixes it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-07-04 02:12:49 -04:00
Peter Maydell	4a83bf2f33	migration/next for 20180627 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJbM4jhAAoJEPSH7xhYctcjFkoP/RkE3RhO/3JV+DKYU5KQwcrV n6FcXwsMIJ9EsCWLaZl4R+7nlbPYb4xPVG0UJnG1ntqIhJ5gN6RHzfi4l1wGCKUT EBRq7C5mEUbUVup7RPO1bEDixydvsvNrSHCL8AzALaag4t5HWrjneLcnJhrSykHx WovFn3Zyi/3LzUKHTzSQWLUoEKSBqNVJ1Ar3kIKwNDfS+jOhiZ6/Bm29hLn9UH0X b/iFxvCcalL2y/ulacocaS2dMe6Dx3/TvhKde8sZuyez2JeGCFFicrty7TOFhpoV INHjOQ3P4KLCWvJw8vs0jQmw2kikFQhqBXbJPVi3/0hV0MH/Uj4cC78/aGTn+Dt3 bNa63eBi6//2KUzgdA+tlrTsVrkifjIhd69a0keTy1oH0zQHRjKvJrgSu0Mcy1x3 YLbPnsjdIUrzuhCQwZcBAuzJP73OW0Q/Y+RSulFUjqkO2IBtJSUkx+AeLews5uhz +q9xom4FjHZMaRLwz6tWYUXxyfRlRyDTiDxZtL/9I+6XZWRHKqA8+/QZf0l2whTm NfJz43J+uL7Dlq9u8aMX3e+qHGwGj6b7o3zB5+xBtiHYFZMg7qPuO42Lstubdexp Zngb+EZl1H4SmYnOLhSVSY+Gus3f2xI6JXpuZuy1wo/bgTwFXM2o6z16t9sobKYl RPusi0XEBu2/XkomJxYB =kc0K -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180627' into staging migration/next for 20180627 # gpg: Signature made Wed 27 Jun 2018 13:53:53 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180627: migration: fix crash in when incoming client channel setup fails postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() migration: Stop sending whole pages through main channel migration: Remove not needed semaphore and quit migration: Wait for blocking IO migration: Start sending messages migration: Create ram_save_multifd_page migration: Create multifd_bytes ram_counter migration: Synchronize multifd threads with main thread migration: Add block where to send/receive packets migration: Multifd channels always wait on the sem migration: Add multifd traces for start/end thread migration: Abstract the number of bytes sent migration: Calculate mbps only during transfer time migration: Create multifd packet migration: Create multipage support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-28 15:31:42 +01:00
Daniel P. Berrangé	ca273df301	migration: fix crash in when incoming client channel setup fails The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit `428d89084c` Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit `36c2f8be2c` Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180619163552.18206-1-berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-27 13:29:35 +02:00
David Hildenbrand	c136180c90	postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() Not needed. Don't expose last_ram_page(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180620202736.21399-1-david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-27 13:28:31 +02:00
Juan Quintela	35374cbdff	migration: Stop sending whole pages through main channel We have to flush() the QEMUFile because now we sent really few data through that channel. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:30 +02:00
Juan Quintela	7a5cc33c48	migration: Remove not needed semaphore and quit We know quit with shutdwon in the QIO. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add comment Use shutdown() instead of unref()	2018-06-27 13:28:21 +02:00
Juan Quintela	4d22c148c9	migration: Wait for blocking IO We have three conditions here: - channel fails -> error - we have to quit: we close the channel and reads fails - normal read that success, we are in bussiness So forget the complications of waiting in a semaphore. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	8b2db7f5fd	migration: Start sending messages Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	b9ee2f7d70	migration: Create ram_save_multifd_page The function still don't use multifd, but we have simplified ram_save_page, xbzrle and RDMA stuff is gone. We have added a new counter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add last_page parameter Add commets for done and address Remove multifd field, it is the same than normal pages Merge next patch, now we send multiple pages at a time Remove counter for multifd pages, it is identical to normal pages Use iovec's instead of creating the equivalent. Clear memory used by pages (dave) Use g_new0(danp) define MULTIFD_CONTINUE now pages member is a pointer Fix off-by-one in number of pages in one packet Remove RAM_SAVE_FLAG_MULTIFD_PAGE s/multifd_pages_t/MultiFDPages_t/ add comment explaining what it means	2018-06-27 13:28:11 +02:00
Juan Quintela	a61c45bd22	migration: Create multifd_bytes ram_counter This will include how many bytes they are sent through multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	6df264ac5a	migration: Synchronize multifd threads with main thread We synchronize all threads each RAM_SAVE_FLAG_EOS. Bitmap synchronizations don't happen inside a ram section, so we are safe about two channels trying to overwrite the same memory. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- seq needs to be atomic now, will also be accessed from main thread. Fix the if (true \|\| ...) leftover We are back to non-atomics	2018-06-27 13:28:11 +02:00
Juan Quintela	0beb5ed327	migration: Add block where to send/receive packets Once there add tracepoints. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	d82628e4bd	migration: Multifd channels always wait on the sem Either for quit, sync or packet, we first wake them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	408ea6ae4c	migration: Add multifd traces for start/end thread We want to know how many pages/packets each channel has sent. Add counters for those. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- sort trace-events (dave)	2018-06-27 13:28:11 +02:00
Juan Quintela	0c8f0efdd4	migration: Abstract the number of bytes sent Right now we use the "position" inside the QEMUFile, but things like RDMA already do weird things to be able to maintain that counter right, and multifd will have some similar problems. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	6cde6fbe2b	migration: Calculate mbps only during transfer time We used to include in this calculation the setup time, but that can be quite big in rdma or multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	2a26c979b1	migration: Create multifd packet We still don't put anything there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- fix magic (dave) check offset/ramblock (dave) s/seq/packet_num/ and make it 64bit	2018-06-27 13:28:11 +02:00
Juan Quintela	34c55a94b1	migration: Create multipage support We only create/destry the page list here. We will use it later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Stefan Hajnoczi	ec09f87753	trace: forbid floating point types Only one existing trace event uses a floating point type. Unfortunately float and double cannot be supported since SystemTap does not have floating point types. Remove float and double from the whitelist and document this limitation. Update the migrate_transferred trace event to use uint64_t instead of double. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20180621150254.4922-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-27 11:09:29 +01:00
Balamuruhan S	650af8907b	migration: calculate expected_downtime with ram_bytes_remaining() expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	e03a34f8f3	migration/postcopy: Wake rate limit sleep on postcopy request Use the 'urgent request' mechanism added in the previous patch for entries added to the postcopy request queue for RAM. Ignore the rate limiting while we have requests. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	ad767bed5a	migration: Wake rate limiting for urgent requests Rate limiting sleeps the migration thread for a while when it runs out of bandwidth; but sometimes we want to wake up to get on with something more urgent (like a postcopy request). Here we use a semaphore with a timedwait instead of a simple sleep; Incrementing the sempahore will wake it up sooner. Anything that consumes these urgent events must decrement the sempahore. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	7e555c6c58	migration/postcopy: Add max-postcopy-bandwidth parameter Limit the background transfer bandwidth during the postcopy phase to the value set on this new parameter. The default, 0, corresponds to the existing behaviour which is unlimited bandwidth. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	b734035b61	migration: introduce migration_update_rates It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	e0e7a45d7f	migration: fix counting xbzrle cache_miss_rate Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Vladimir Sementsov-Ogievskiy	a36f6ff46f	migration/block-dirty-bitmap: fix dirty_bitmap_load dirty_bitmap_load_header return code is obtained but not handled. Fix this. Bug was introduced in `b35ebdf076` "migration: add postcopy migration of dirty bitmaps" with the whole function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180530112424.204835-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	343f632c70	migration: Poison ramblock loops in migration The migration code should be using the RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable not the all-block versions; poison them so that we can't accidentally use them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	ff0769a4ad	migration: Fixes for non-migratable RAMBlocks There are still a few cases where migration code is using the macros and functions that do all RAMBlocks rather than just the migratable blocks; fix those up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Greg Kurz	ea134caa08	typedefs: add QJSON Since commit `83ee768d62`, we now have two places that define the QJSON type: $ git grep 'typedef struct QJSON QJSON' include/migration/vmstate.h:typedef struct QJSON QJSON; migration/qjson.h:typedef struct QJSON QJSON; This breaks docker-test-build@centos6: In file included from /tmp/qemu-test/src/migration/savevm.c:59: /tmp/qemu-test/src/migration/qjson.h:16: error: redefinition of typedef 'QJSON' /tmp/qemu-test/src/include/migration/vmstate.h:30: note: previous declaration of 'QJSON' was here make: *** [migration/savevm.o] Error 1 This happens because CentOS 6 has an old GCC 4.4.7. Even if redefining a typedef with the same type is permitted since GCC 4.6, unless -pedantic is passed, we don't really need to do that on purpose. Let's have a single definition in <qemu/typedefs.h> instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <152844714981.11789.3657734445739553287.stgit@bahia.lan> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Peter Maydell	b74588a493	migration/next for 20180604 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJbFLygAAoJEPSH7xhYctcjszUP/j7/21sHXwPyB5y4CQHsivLL 18MnXTkjgSlue3fuPKI9ZzD2VZq19AtwarLuW3IFf15adKfeV0hhZsIMUmu0XTAz jgol8YYgoYlk3/on8Ux8PWV9RWKHh7vWAF7hjwJOnI5CFF7gXz6kVjUx03A7RSnB 1a92lvgl5P8AbK5HZ/i08nOaAxCUWIVIoQeZf3BlZ5WdJE0j+d0Mvp/S8NEq5VPD P6oYJZy64QFGotmV1vtYD4G/Hqe8XBw2LoSVgjPSmQpd1m4CXiq0iyCxtC8HT+iq 0P0eHATRglc5oGhRV6Mi5ixH2K2MvpshdyMKVAbkJchKBHn4k3m6jefhnUXuucwC fq9x7VIYzQ80S+44wJri683yUWP3Qmbb6mWYBb1L0jgbTkY1wx431IYMtDVv5q4R oXtbe77G6lWKHH0SFrU8TY/fqJvBwpat+QwOoNsluHVjzAzpOpcVokB8pLj592/1 bHkDwvv7i+B37muOF3PFy++BoliwpRiNbVC5Btw4mnbOoY+AeJHMRBuhJHU6EyJv GOFiQ5w5XEXEu98giNqHcc0AC8w7iTRGI472UzpS0AG20wpc1/2jF0yM99stauwD oIu3MU4dQlz6QRy8KFnJShqjYNYuOs59USLiYM79ABa9J+JxJxxTAaqRbUFojQR6 KN2i0QQJ+6MO4Skjbu/3 =K0eJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180604' into staging migration/next for 20180604 # gpg: Signature made Mon 04 Jun 2018 05:14:24 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180604: migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration: remove unnecessary variables len in QIOChannelRDMA migration: Don't activate block devices if using -S migration: discard non-migratable RAMBlocks migration: introduce decompress-error-check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 12:54:00 +01:00
Peter Maydell	f67c9b693a	acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM= =3+V0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 10:15:16 +01:00
Lidong Chen	c5e76115cc	migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Lidong Chen	f38f6d4155	migration: remove unnecessary variables len in QIOChannelRDMA Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked by different threads concurrently, this patch removes unnecessary variables len in QIOChannelRDMA and use local variable instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Lidong Chen <jemmy858585@gmail.com>	2018-06-04 05:46:15 +02:00
Dr. David Alan Gilbert	0f073f44df	migration: Don't activate block devices if using -S Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. This change is tied to the new migration capability 'late-block-activate' that defaults to off, keeping the old behaviour by default. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Cédric Le Goater	b895de5027	migration: discard non-migratable RAMBlocks On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Xiao Guangrong	f548222c24	migration: introduce decompress-error-check QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Peter Maydell	afd76ffba9	* Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlsRd2UUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPG8Qf+M85E8xAQ/bhs90tAymuXkUUsTIFF uI76K8eM0K3b2B+vGckxh1gyN5O3GQaMEDL7vITfqbX+EOH5U2lv8V9JRzf2YvbG Zahjd4pOCYzR0b9JENA1r5U/J8RntNrBNXlKmGTaXOaw9VCXlZyvgVd9CE3z/e2M 0jSXMBdF4LB3UzECI24Va8ejJxdSiJcqXA2j3J+pJFxI698i+Z5eBBKnRdo5TVe5 jl0TYEsbS6CLwhmbLXmt3Qhq+ocZn7YH9X3HjkHEdqDUeYWyT9jwUpa7OHFrIEKC ikWm9er4YDzG/vOC0dqwKbShFzuTpTJuMz5Mj4v8JjM/iQQFrp4afjcW2g== =RS/B -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) # gpg: Signature made Fri 01 Jun 2018 17:42:13 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (56 commits) hw: make virtio devices configurable via default-configs/ hw: allow compiling out SCSI memory: Make operations using MemoryRegionIoeventfd struct pass by pointer. char: Remove unwanted crlf conversion qdev: Remove DeviceClass::init() and ::exit() qdev: Simplify the SysBusDeviceClass::init path hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init hw/i2c/smbus: Use DeviceClass::realize instead of SMBusDeviceClass::init target/i386/kvm.c: Remove compatibility shim for KVM_HINTS_REALTIME Update Linux headers to 4.17-rc6 target/i386/kvm.c: Handle renaming of KVM_HINTS_DEDICATED scripts/update-linux-headers: Handle kernel license no longer being one file scripts/update-linux-headers: Handle __aligned_u64 virtio-gpu-3d: Define VIRTIO_GPU_CAPSET_VIRGL2 elsewhere gdbstub: Prevent fd leakage docs/interop: add "firmware.json" ipmi: Use proper struct reference for KCS vmstate vmstate: Add a VSTRUCT type tcg: remove softfloat from --disable-tcg builds qemu-options: Mark the non-functional -clock option as deprecated ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-01 18:24:16 +01:00
Michael S. Tsirkin	53d37d36ca	migration: use local path for local headers When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2018-06-01 19:20:38 +03:00
Michael S. Tsirkin	83ee768d62	migration: drop an unused include In the vmstate.h file, we just need a struct name. Use a forward declaration instead of an include, then adjust the one affected .c file to include the file that is no longer implicit from the header. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-06-01 19:20:37 +03:00
Corey Minyard	2dc6660bd8	vmstate: Add a VSTRUCT type The VMS_STRUCT has no way to specify which version of a structure to use. Add a type and a new field to allow the specific version of a structure to be used. Signed-off-by: Corey Minyard <cminyard@mvista.com> Message-Id: <1524670052-28373-2-git-send-email-minyard@acm.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-01 15:13:46 +02:00
Peter Xu	bf269906f5	migration: use g_free for ram load bitmap Buffers allocated with bitmap_new() should be freed with g_free(). Both reported by Coverity: *** CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 3517 in ram_dirty_bitmap_reload() 3511 * the last one to sync, we need to notify the main send thread. 3512 / 3513 ram_dirty_bitmap_reload_notify(s); 3514 3515 ret = 0; 3516 out: >>> CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 3517 free(le_bitmap); 3518 return ret; 3519 } 3520 3521 static int ram_resume_prepare(MigrationState s, void opaque) 3522 { ** CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 249 in ramblock_recv_bitmap_send() 243 * Mark as an end, in case the middle part is screwed up due to 244 * some "misterious" reason. 245 */ 246 qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); 247 qemu_fflush(file); 248 >>> CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 249 free(le_bitmap); 250 251 if (qemu_file_get_error(file)) { 252 return qemu_file_get_error(file); 253 } 254 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180525015042.31778-1-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-25 15:29:48 +02:00
Juan Quintela	0efc91423a	migration: fix exec/fd migrations Commit: commit `36c2f8be2c` Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines Missed tcp and fd transports. This fix its. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20180523091411.1073-1-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-25 15:29:47 +02:00
Dr. David Alan Gilbert	8b7bf2bada	Migration+TLS: Fix crash due to double cleanup During a TLS connect we see: migration_channel_connect calls migration_tls_channel_connect (calls after TLS setup) migration_channel_connect My previous error handling fix made migration_channel_connect call migrate_fd_connect in all cases; unfortunately the above means it gets called twice and crashes doing double cleanup. Fixes: `688a3dcba9` Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180430185943.35714-1-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 22:13:08 +02:00
Lidong Chen	71cd73061c	migration: update index field when delete or qsort RDMALocalBlock rdma_delete_block function deletes RDMALocalBlock base on index field, but not update the index field. So when next time invoke rdma_delete_block, it will not work correctly. If start and cancel migration repeatedly, some RDMALocalBlock not invoke ibv_dereg_mr to decrease kernel mm_struct vmpin. When vmpin is large than max locked memory limitation, ibv_reg_mr will failed, and migration can not start successfully again. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1525618499-1560-1-git-send-email-lidongchen@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Lidong Chen <jemmy858585@gmail.com>	2018-05-15 22:13:08 +02:00
Peter Xu	bfbf89c2b5	migration/qmp: add command migrate-pause It pauses an ongoing migration. Currently it only supports postcopy. Note that this command will work on either side of the migration. Basically when we trigger this on one side, it'll interrupt the other side as well since the other side will get notified on the disconnect event. However, it's still possible that the other side is not notified, for example, when the network is totally broken, or due to some firewall configuration changes. In that case, we will also need to run the same command on the other side so both sides will go into the paused state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-24-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 22:12:57 +02:00
Peter Xu	62df066fff	migration: introduce lock for to_dst_file Let's introduce a lock for that QEMUFile since we are going to operate on it in multiple threads. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-23-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 22:12:41 +02:00
Peter Xu	02affd41b1	qmp/migration: new command migrate-recover The first allow-oob=true command. It's used on destination side when the postcopy migration is paused and ready for a recovery. After execution, a new migration channel will be established for postcopy to continue. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-21-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 22:11:45 +02:00
Peter Xu	e1b1b1bc36	migration: init dst in migration_object_init too Though we may not need it, now we init both the src/dst migration objects in migration_object_init() so that even incoming migration object would be thread safe (it was not). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-20-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:57:01 +02:00
Peter Xu	9419069695	migration: final handshake for the resume Finish the last step to do the final handshake for the recovery. First source sends one MIG_CMD_RESUME to dst, telling that source is ready to resume. Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that dest is ready to resume (after switch to postcopy-active state). When source received the RESUME_ACK, it switches its state to postcopy-active, and finally the recovery is completed. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-19-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:57:00 +02:00
Peter Xu	08614f3497	migration: setup ramstate for resume After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:59 +02:00
Peter Xu	edd090c728	migration: synchronize dirty bitmap for resume This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:57 +02:00
Peter Xu	d1b8eadbc4	migration: introduce SaveVMHandlers.resume_prepare This is hook function to be called when a postcopy migration wants to resume from a failure. For each module, it should provide its own recovery logic before we switch to the postcopy-active state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-16-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:55 +02:00
Peter Xu	13955b89ce	migration: new message MIG_RP_MSG_RESUME_ACK Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t is used as payload to let the source know whether destination is ready to continue the migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-15-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:53 +02:00
Peter Xu	3f5875eca5	migration: new cmd MIG_CMD_POSTCOPY_RESUME Introducing this new command to be sent when the source VM is ready to resume the paused migration. What the destination does here is basically release the fault thread to continue service page faults. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-14-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:52 +02:00
Peter Xu	a335debb35	migration: new message MIG_RP_MSG_RECV_BITMAP Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:51 +02:00
Peter Xu	f25d42253c	migration: new cmd MIG_CMD_RECV_BITMAP Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for one ramblock. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:49 +02:00
Peter Xu	d96c9e8d78	migration: wakeup dst ram-load-thread for recover On the destination side, we cannot wake up all the threads when we got reconnected. The first thing to do is to wake up the main load thread, so that we can continue to receive valid messages from source again and reply when needed. At this point, we switch the destination VM state from postcopy-paused back to postcopy-recover. Now we are finally ready to do the resume logic. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-11-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:48 +02:00
Peter Xu	135b87b4f0	migration: new state "postcopy-recover" Introducing new migration state "postcopy-recover". If a migration procedure is paused and the connection is rebuilt afterward successfully, we'll switch the source VM state from "postcopy-paused" to the new state "postcopy-recover", then we'll do the resume logic in the migration thread (along with the return path thread). This patch only do the state switch on source side. Another following up patch will handle the state switching on destination side using the same status bit. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-10-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.11/2.13/	2018-05-15 20:56:30 +02:00
Peter Xu	d3e35b8f62	migration: rebuild channel on source This patch detects the "resume" flag of migration command, rebuild the channels only if the flag is set. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-9-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:55:19 +02:00
Peter Xu	7a4da28b26	qmp: hmp: add migrate "resume" option It will be used when we want to resume one paused migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-8-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 20:54:49 +02:00
Peter Xu	3a7804c306	migration: allow fault thread to pause Allows the fault thread to stop handling page faults temporarily. When network failure happened (and if we expect a recovery afterwards), we should not allow the fault thread to continue sending things to source, instead, it should halt for a while until the connection is rebuilt. When the dest main thread noticed the failure, it kicks the fault thread to switch to pause state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	14b1742eaa	migration: allow src return path to pause Let the thread pause for network issues. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	b411b844fb	migration: allow dst vm pause on postcopy When there is IO error on the incoming channel (e.g., network down), instead of bailing out immediately, we allow the dst vm to switch to the new POSTCOPY_PAUSE state. Currently it is still simple - it waits the new semaphore, until someone poke it for another attempt. One note is that here on ram loading thread we cannot detect the POSTCOPY_ACTIVE state, but we need to detect the more specific POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all the device states. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-5-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	b23c2ade25	migration: implement "postcopy-pause" src logic Now when network down for postcopy, the source side will not fail the migration. Instead we convert the status into this new paused state, and we will try to wait for a rescue in the future. If a recovery is detected, migration_thread() will reset its local variables to prepare for that. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	a688d2c1ab	migration: new postcopy-pause state Introducing a new state "postcopy-paused", which can be used when the postcopy migration is paused. It is targeted for postcopy network failure recovery. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-3-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00

1 2 3 4 5 ...

947 Commits