mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Wei Wang	386a907b37	migration: use bitmap_mutex in migration_bitmap_clear_dirty The bitmap mutex is used to synchronize threads to update the dirty bitmap and the migration_dirty_pages counter. For example, the free page optimization clears bits of free pages from the bitmap in an iothread context. This patch makes migration_bitmap_clear_dirty update the bitmap and counter under the mutex. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-4-git-send-email-wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Yury Kotov	fbd162e629	migration: Add an ability to ignore shared RAM blocks If ignore-shared capability is set then skip shared RAMBlocks during the RAM migration. Also, move qemu_ram_foreach_migratable_block (and rename) to the migration code, because it requires access to the migration capabilities. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:17 +00:00
Xiao Guangrong	aecbfe9c64	migration: introduce pages-per-second It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second is the one we want Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> With typo's Eric spotted fixed	2019-01-23 15:51:47 +00:00
Fei Li	1398b2e3fe	migration: multifd_save_cleanup() can't fail, simplify multifd_save_cleanup() takes an Error ** argument and returns an error code even though it can't actually fail. Its callers dutifully check for failure. Remove the useless argument and return value, and simplify the callers. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190113140849.38339-4-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Fei Li	49ed0d24a4	migration: fix the multifd code when receiving less channels In our current code, when multifd is used during migration, if there is an error before the destination receives all new channels, the source keeps running, however the destination does not exit but keeps waiting until the source is killed deliberately. Fix this by dumping the specific error and let users decide whether to quit from the destination side when failing to receive packet via some channel. And update the comment for multifd_recv_new_channel(). Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190113140849.38339-3-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Paolo Bonzini	b58deb344d	qemu/queue.h: leave head structs anonymous unless necessary Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
zhanghailiang	d1955d2219	COLO: flush host dirty ram from cache Don't need to flush all VM's ram from cache, only flush the dirty pages since last checkpoint Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	e6f4aa188c	COLO: Flush memory data from ram cache During the time of VM's running, PVM may dirty some pages, we will transfer PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint time. So, the content of SVM's RAM cache will always be same with PVM's memory after checkpoint. Instead of flushing all content of PVM's RAM cache into SVM's MEMORY, we do this in a more efficient way: Only flush any page that dirtied by PVM since last checkpoint. In this way, we can ensure SVM's memory same with PVM's. Besides, we must ensure flush RAM cache before load device state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	7d9acafa2c	ram/COLO: Record the dirty pages that SVM received We record the address of the dirty pages that received, it will help flushing pages that cached into SVM. Here, it is a trick, we record dirty pages by re-using migration dirty bitmap. In the later patch, we will start the dirty log for SVM, just like migration, in this way, we can record both the dirty pages caused by PVM and SVM, we only flush those dirty pages from RAM cache while do checkpoint. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	13af18f222	COLO: Load dirty pages into SVM's RAM cache firstly We should not load PVM's state directly into SVM, because there maybe some errors happen when SVM is receving data, which will break SVM. We need to ensure receving all data before load the state into SVM. We use an extra memory to cache these data (PVM's ram). The ram cache in secondary side is initially the same as SVM/PVM's memory. And in the process of checkpoint, we cache the dirty pages of PVM into this ram cache firstly, so this ram cache always the same as PVM's memory at every checkpoint, then we flush this cached ram to SVM after we receive all PVM's state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Peter Maydell	341ba0df4c	migration/ram.c: Avoid taking address of fields in packed MultiFDInit_t struct Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this: migration/ram.c:651:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:652:19: warning: taking address of packed member 'version' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:737:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:745:19: warning: taking address of packed member 'version' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:755:19: warning: taking address of packed member 'size' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] Avoid the bug by not using the "modify in place" byteswapping functions. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180925161924.7832-1-peter.maydell@linaro.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Fei Li	05306935b1	migration: fix the compression code Add judgement in compress_threads_save_cleanup() to check whether the static CompressParam *comp_param has been allocated. If not, just return; or else segmentation fault will occur when using the NULL comp_param's parameters. One test case can reproduce this is: set the compression on and migrate to a wrong nonexistent host IP address. Our current code does not judge before handling comp_param[idx]'s quit and cond that whether they have been initialized. If not initialized, "qemu_mutex_lock_impl: Assertion `mutex->initialized' failed." will occur. Fix this by squashing the terminate_compression_threads() into compress_threads_save_cleanup() and employing the existing judgement condition. One test case can reproduce this error is: set the compression on and fail to fully setup the default eight compression thread in compress_threads_save_setup(). Signed-off-by: Fei Li <fli@suse.com> Message-Id: <20180925091440.18910-1-fli@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Xiao Guangrong	32b054954f	migration: use save_page_use_compression in flush_compressed_data It avoids to touch compression locks if xbzrle and compression are both enabled Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180906070101.27280-4-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:27:43 +01:00
Xiao Guangrong	76e030004f	migration: show the statistics of compression Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180906070101.27280-3-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:27:27 +01:00
Xiao Guangrong	48df9d8002	migration: do not flush_compressed_data at the end of iteration flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all threads at the end of iteration, the data can be kept locally until the memory block is changed or memory migration starts over in that case we will meet a dirtied page which may still exists in compression threads's ring Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180906070101.27280-2-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:26:58 +01:00
Xiao Guangrong	e8f3735fa3	migration: handle the error condition properly ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180903092644.25812-5-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:22:21 +01:00
Xiao Guangrong	be8b02edae	migration: fix calculating xbzrle_counters.cache_miss_rate As Peter pointed out: \| - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's \| per-guest-page granularity \| \| - RAMState.iterations is done for each ram_find_and_save_block(), so \| it's per-host-page granularity \| \| An example is that when we migrate a 2M huge page in the guest, we \| will only increase the RAMState.iterations by 1 (since \| ram_find_and_save_block() will be called once), but we might increase \| xbzrle_counters.cache_miss for 2M/4K=512 times (we'll call \| save_xbzrle_page() that many times) if all the pages got cache miss. \| Then IMHO the cache miss rate will be 512/1=51200% (while it should \| actually be just 100% cache miss). And he also suggested as xbzrle_counters.cache_miss_rate is the only user of rs->iterations we can adapt it to count target guest page numbers After that, rename 'iterations' to 'target_page_count' to better reflect its meaning Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180903092644.25812-3-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:21:56 +01:00
Peter Xu	3ab72385b2	qapi: Drop qapi_event_send_FOO()'s Error argument The generated qapi_event_send_FOO() take an Error argument. They can't actually fail, because all they do with the argument is passing it to functions that can't fail: the QObject output visitor, and the @qmp_emit callback, which is either monitor_qapi_event_queue() or event_test_emit(). Drop the argument, and pass &error_abort to the QObject output visitor and @qmp_emit instead. Suggested-by: Eric Blake <eblake@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815133747.25032-4-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message rewritten, update to qapi-code-gen.txt corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-28 18:21:38 +02:00
Xiao Guangrong	ae526e32bd	migration: hold the lock only if it is really needed Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	5e5fdcff28	migration: move handle of zero page to the thread Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6ef3771c0d	migration: drop the return value of do_compress_ram_page It is not used and cleans the code up a little Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6c97ec5f5a	migration: introduce save_zero_page_to_file It will be used by the compression threads Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:10 +02:00
Xiao Guangrong	980a19a929	migration: fix counting normal page for compression The compressed page is not normal page Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:21 +02:00
Xiao Guangrong	1d58872a91	migration: do not wait for free thread Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wants the old behavior Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:11 +02:00
Lidong Chen	74637e6f08	migration: implement bi-directional RDMA QIOChannel This patch implements bi-directional RDMA QIOChannel. Because different threads may access RDMAQIOChannel currently, this patch use RCU to protect it. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:26 +02:00
Li Qiang	4cbc9c7ffd	migrate/cpu-throttle: Add max-cpu-throttle migration parameter Currently, the default maximum CPU throttle for migration is 99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable performance effect for the guest. We see a lot of packets latency exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set adds a new max-cpu-throttle parameter to limit the CPU throttle. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 11:42:34 +02:00
Junyan He	56eb90af39	migration/ram: ensure write persistence on loading all data to PMEM. Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Junyan He	469dd51bc6	migration/ram: Add check and info message to nvdimm post copy. The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Peter Xu	814bb08f17	migration: update recv bitmap only on dest vm We shouldn't update the received bitmap if we're the source VM. This fixes a breakage when release-ram is enabled on postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-2-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:10:41 +01:00
Peter Xu	4fcefd44a0	migration: fix potential overflow in multifd send I would guess it won't happen normally, but this should ease Coverity. >>> CID 1394385: Integer handling issues (OVERFLOW_BEFORE_WIDEN) >>> Potentially overflowing expression "pages->used * 8192U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). 854 transferred = pages->used * TARGET_PAGE_SIZE + p->packet_len; Fixes: CID 1394385 CC: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180720034713.11711-1-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 16:58:51 +01:00
Peter Xu	a725ef9fe3	migration: fix incorrect bitmap size calculation The calculation on size of received bitmap is incorrect for postcopy recovery. Here we wanted to let the size to cover all the valid bits in the bitmap, we should use DIV_ROUND_UP() instead of a division. For example, a RAMBlock with size=4K (which contains only one single 4K page) will have nbits=1, then nbits/8=0, then the real bitmap won't be sent to source at all. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-4-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:18 +01:00
Peter Xu	1aa8367861	migration: simplify check to use qemu file buffer Firstly, renaming the old matching_page_sizes variable to matches_target_page_size, which suites more to what it did (it only checks against target page size rather than multiple page sizes). Meanwhile, simplify the check logic a bit, and enhance the comments. Should have no functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:55:59 +01:00
Peter Xu	81e620531f	migration: move income process out of multifd Move the call to migration_incoming_process() out of multifd code. It's a bit strange that we can migration generic calls in multifd code. Instead, let multifd_recv_new_channel() return a boolean showing whether it's ready to continue the incoming migration. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
David Hildenbrand	c136180c90	postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() Not needed. Don't expose last_ram_page(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180620202736.21399-1-david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-27 13:28:31 +02:00
Juan Quintela	35374cbdff	migration: Stop sending whole pages through main channel We have to flush() the QEMUFile because now we sent really few data through that channel. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:30 +02:00
Juan Quintela	7a5cc33c48	migration: Remove not needed semaphore and quit We know quit with shutdwon in the QIO. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add comment Use shutdown() instead of unref()	2018-06-27 13:28:21 +02:00
Juan Quintela	4d22c148c9	migration: Wait for blocking IO We have three conditions here: - channel fails -> error - we have to quit: we close the channel and reads fails - normal read that success, we are in bussiness So forget the complications of waiting in a semaphore. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	8b2db7f5fd	migration: Start sending messages Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	b9ee2f7d70	migration: Create ram_save_multifd_page The function still don't use multifd, but we have simplified ram_save_page, xbzrle and RDMA stuff is gone. We have added a new counter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add last_page parameter Add commets for done and address Remove multifd field, it is the same than normal pages Merge next patch, now we send multiple pages at a time Remove counter for multifd pages, it is identical to normal pages Use iovec's instead of creating the equivalent. Clear memory used by pages (dave) Use g_new0(danp) define MULTIFD_CONTINUE now pages member is a pointer Fix off-by-one in number of pages in one packet Remove RAM_SAVE_FLAG_MULTIFD_PAGE s/multifd_pages_t/MultiFDPages_t/ add comment explaining what it means	2018-06-27 13:28:11 +02:00
Juan Quintela	6df264ac5a	migration: Synchronize multifd threads with main thread We synchronize all threads each RAM_SAVE_FLAG_EOS. Bitmap synchronizations don't happen inside a ram section, so we are safe about two channels trying to overwrite the same memory. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- seq needs to be atomic now, will also be accessed from main thread. Fix the if (true \|\| ...) leftover We are back to non-atomics	2018-06-27 13:28:11 +02:00
Juan Quintela	0beb5ed327	migration: Add block where to send/receive packets Once there add tracepoints. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	d82628e4bd	migration: Multifd channels always wait on the sem Either for quit, sync or packet, we first wake them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	408ea6ae4c	migration: Add multifd traces for start/end thread We want to know how many pages/packets each channel has sent. Add counters for those. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- sort trace-events (dave)	2018-06-27 13:28:11 +02:00
Juan Quintela	2a26c979b1	migration: Create multifd packet We still don't put anything there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- fix magic (dave) check offset/ramblock (dave) s/seq/packet_num/ and make it 64bit	2018-06-27 13:28:11 +02:00
Juan Quintela	34c55a94b1	migration: Create multipage support We only create/destry the page list here. We will use it later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Balamuruhan S	650af8907b	migration: calculate expected_downtime with ram_bytes_remaining() expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	e03a34f8f3	migration/postcopy: Wake rate limit sleep on postcopy request Use the 'urgent request' mechanism added in the previous patch for entries added to the postcopy request queue for RAM. Ignore the rate limiting while we have requests. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	b734035b61	migration: introduce migration_update_rates It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	e0e7a45d7f	migration: fix counting xbzrle cache_miss_rate Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	343f632c70	migration: Poison ramblock loops in migration The migration code should be using the RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable not the all-block versions; poison them so that we can't accidentally use them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	ff0769a4ad	migration: Fixes for non-migratable RAMBlocks There are still a few cases where migration code is using the macros and functions that do all RAMBlocks rather than just the migratable blocks; fix those up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Peter Maydell	b74588a493	migration/next for 20180604 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJbFLygAAoJEPSH7xhYctcjszUP/j7/21sHXwPyB5y4CQHsivLL 18MnXTkjgSlue3fuPKI9ZzD2VZq19AtwarLuW3IFf15adKfeV0hhZsIMUmu0XTAz jgol8YYgoYlk3/on8Ux8PWV9RWKHh7vWAF7hjwJOnI5CFF7gXz6kVjUx03A7RSnB 1a92lvgl5P8AbK5HZ/i08nOaAxCUWIVIoQeZf3BlZ5WdJE0j+d0Mvp/S8NEq5VPD P6oYJZy64QFGotmV1vtYD4G/Hqe8XBw2LoSVgjPSmQpd1m4CXiq0iyCxtC8HT+iq 0P0eHATRglc5oGhRV6Mi5ixH2K2MvpshdyMKVAbkJchKBHn4k3m6jefhnUXuucwC fq9x7VIYzQ80S+44wJri683yUWP3Qmbb6mWYBb1L0jgbTkY1wx431IYMtDVv5q4R oXtbe77G6lWKHH0SFrU8TY/fqJvBwpat+QwOoNsluHVjzAzpOpcVokB8pLj592/1 bHkDwvv7i+B37muOF3PFy++BoliwpRiNbVC5Btw4mnbOoY+AeJHMRBuhJHU6EyJv GOFiQ5w5XEXEu98giNqHcc0AC8w7iTRGI472UzpS0AG20wpc1/2jF0yM99stauwD oIu3MU4dQlz6QRy8KFnJShqjYNYuOs59USLiYM79ABa9J+JxJxxTAaqRbUFojQR6 KN2i0QQJ+6MO4Skjbu/3 =K0eJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180604' into staging migration/next for 20180604 # gpg: Signature made Mon 04 Jun 2018 05:14:24 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180604: migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration: remove unnecessary variables len in QIOChannelRDMA migration: Don't activate block devices if using -S migration: discard non-migratable RAMBlocks migration: introduce decompress-error-check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 12:54:00 +01:00
Peter Maydell	f67c9b693a	acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM= =3+V0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 10:15:16 +01:00
Cédric Le Goater	b895de5027	migration: discard non-migratable RAMBlocks On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Xiao Guangrong	f548222c24	migration: introduce decompress-error-check QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Michael S. Tsirkin	53d37d36ca	migration: use local path for local headers When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2018-06-01 19:20:38 +03:00
Peter Xu	bf269906f5	migration: use g_free for ram load bitmap Buffers allocated with bitmap_new() should be freed with g_free(). Both reported by Coverity: *** CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 3517 in ram_dirty_bitmap_reload() 3511 * the last one to sync, we need to notify the main send thread. 3512 / 3513 ram_dirty_bitmap_reload_notify(s); 3514 3515 ret = 0; 3516 out: >>> CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 3517 free(le_bitmap); 3518 return ret; 3519 } 3520 3521 static int ram_resume_prepare(MigrationState s, void opaque) 3522 { ** CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 249 in ramblock_recv_bitmap_send() 243 * Mark as an end, in case the middle part is screwed up due to 244 * some "misterious" reason. 245 */ 246 qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); 247 qemu_fflush(file); 248 >>> CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 249 free(le_bitmap); 250 251 if (qemu_file_get_error(file)) { 252 return qemu_file_get_error(file); 253 } 254 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180525015042.31778-1-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-25 15:29:48 +02:00
Peter Xu	08614f3497	migration: setup ramstate for resume After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:59 +02:00
Peter Xu	edd090c728	migration: synchronize dirty bitmap for resume This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:57 +02:00
Peter Xu	a335debb35	migration: new message MIG_RP_MSG_RECV_BITMAP Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:51 +02:00
Juan Quintela	8c4598f2b1	migration: Define MultifdRecvParams sooner Once there, we don't need the struct names anywhere, just the typedefs. And now also document all fields. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	af8b7d2b09	migration: Transmit initial package through the multifd channels Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Be network agnostic. Add error checking for all values.	2018-05-15 20:24:27 +02:00
Juan Quintela	36c2f8be2c	migration: Delay start of migration main routines We need to make sure that we have started all the multifd threads. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	60df2d4ae5	migration: Create multifd channels In both sides. We still don't transmit anything through them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	62c1e0ca73	migration: Be sure all recv channels are created We need them before we start migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	667707078d	migration: terminate_* can be called for other threads Once there, make count field to always be accessed with atomic operations. To make blocking operations, we need to know that the thread is running, so create a bool to indicate that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Once here, s/terminate_multifd_-threads/multifd__terminate_threads/ This is consistente with every other function	2018-05-15 20:24:27 +02:00
Juan Quintela	71bb07dbfc	migration: Introduce multifd_recv_new_channel() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	7a169d745c	migration: Set error state in case of error Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Xiao Guangrong	701b1876c0	migration: fix saving normal page even if it's been compressed Fix the bug introduced by `da3f56cb2e` (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180428081045.8878-1-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:00 +02:00
Xiao Guangrong	da3f56cb2e	migration: remove ram_save_compressed_page() Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-11-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:14 +01:00
Xiao Guangrong	65dacaa04f	migration: introduce save_normal_page() It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-10-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:12 +01:00
Xiao Guangrong	d7400a3409	migration: move calling save_zero_page to the common place save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-9-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:11 +01:00
Xiao Guangrong	a8ec91f941	migration: move calling control_save_page to the common place The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-8-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:10 +01:00
Xiao Guangrong	1faa5665c0	migration: move some code to ram_save_host_page Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-7-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:09 +01:00
Xiao Guangrong	059ff0fb29	migration: introduce control_save_page() Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-6-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:09 +01:00
Xiao Guangrong	34ab9e9743	migration: detect compression and decompression errors Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the migration more robuster, we copy the page to a buffer first to avoid it being written by VM, then detect and handle the errors of both compression and decompression errors properly Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-5-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:08 +01:00
Xiao Guangrong	797ca154b4	migration: stop decompression to allocate and free memory frequently Current code uses uncompress() to decompress memory which manages memory internally, that causes huge memory is allocated and freed very frequently, more worse, frequently returning memory to kernel will flush TLBs So, we maintain the memory by ourselves and reuse it for each decompression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-4-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:07 +01:00
Xiao Guangrong	dcaf446ebd	migration: stop compression to allocate and free memory frequently Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which interacts with KVM MMU, that dramatically reduce the performance of VM So, we maintain the memory by ourselves and reuse it for each compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-3-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:06 +01:00
Xiao Guangrong	263a289ae6	migration: stop compressing page in migration thread As compression is a heavy work, do not do it in migration thread, instead, we post it out as a normal page Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-2-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:05 +01:00
Peter Maydell	ed627b2ad3	virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJasR1rAAoJECgfDbjSjVRpOocH/R9A3g/TkpGjmLzJBrrX1NGO I/iq0ttHjqg4OBIChA4BHHjXwYUMs7XQn26B3efrk1otLAJhuqntZIIo3uU0WraA 5J+4DT46ogs5rZWNzDCZ0zAkSaATDA6h9Nfh7TvPc9Q2WpcIT0cTa/jOtrxRc9Vq 32hbUKtJSpNxRjwbZvk6YV21HtWo3Tktdaj9IeTQTN0/gfMyOMdgxta3+bymicbJ FuF9ybHcpXvrEctHhXHIL4/YVGEH/4shagZ4JVzv1dVdLeHLZtPomdf7+oc0+07m Qs+yV0HeRS5Zxt7w5blGLC4zDXczT/bUx8oln0Tz5MV7RR/+C2HwMOHC69gfpSc= =vomK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh	2018-03-20 15:48:34 +00:00
Dr. David Alan Gilbert	1cba9f6e66	migration/ram: ramblock_recv_bitmap_test_byte_offset Utility for testing the map when you already know the offset in the RAMBlock. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:28 +02:00
Vladimir Sementsov-Ogievskiy	4799502640	migration: introduce postcopy-only pending There would be savevm states (dirty-bitmap) which can migrate only in postcopy stage. The corresponding pending is introduced here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com	2018-03-13 17:05:41 -04:00
Peter Lieven	b255734531	migration: do not transfer ram during bulk storage migration this patch makes the bulk phase of a block migration to take place before we start transferring ram. As the bulk block migration can take a long time its pointless to transfer ram during that phase. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <1520507908-16743-2-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Peter Xu	7a9ddfbfae	migration: better error handling with QEMUFile If the postcopy down due to some reason, we can always see this on dst: qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000 However in most cases that's not the real issue. The problem is that qemu_get_be16() has no way to show whether the returned data is valid or not, and we are _always_ assuming it is valid. That's possibly not wise. The best approach to solve this would be: refactoring QEMUFile interface to allow the APIs to return error if there is. However it needs quite a bit of work and testing. For now, let's explicitly check the validity first before using the data in all places for qemu_get_*(). This patch tries to fix most of the cases I can see. Only if we are with this, can we make sure we are processing the valid data, and also can we make sure we can capture the channel down events correctly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:34:56 +00:00
Dr. David Alan Gilbert	b9ccaf6d74	migration: Fix early failure cleanup Avoid crash in cleanup after a very early migration failure (possibly due to my `688a3dcba9` 'Route errors down ...') Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180212160340.15333-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2018-02-14 10:31:01 +00:00
Markus Armbruster	e688df6bc4	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit `34e304e975` resolved, OSX breakage fixed]	2018-02-09 13:50:17 +01:00
Juan Quintela	7faccdc3e7	migration: Drop current address parameter from save_zero_page() It already has RAMBlock and offset, it can calculate it itself. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:13 +00:00
Dr. David Alan Gilbert	bae416e5ba	migration: Guard ram_bytes_remaining against early call Calling ram_bytes_remaining during the early part of setup is unsafe because the ram_state isn't yet initialised. This can happen in the sequence: migrate migrate_cancel info migrate if the migrate sticks trying to connect (e.g. to an unresponsive destination due to the connect timeout). Here 'info migrate' sees a state of CANCELLING and so assumes the migrate has partially happened. partial fix for: RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1525899 Reported-by: Xianxian Wang <xianwang@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:04 +01:00
Daniel Henrique Barboza	acab30b85d	migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_END When migrating a VM with 'migrate_set_capability postcopy-ram on' a postcopy_state is set during the process, ending up with the state POSTCOPY_INCOMING_END when the migration is over. This postcopy_state is taken into account inside ram_load to check how it will load the memory pages. This same ram_load is called when in a loadvm command. Inside ram_load, the logic to see if we're at postcopy_running state is: postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING postcopy_state_get() returns this enum type: typedef enum { POSTCOPY_INCOMING_NONE = 0, POSTCOPY_INCOMING_ADVISE, POSTCOPY_INCOMING_DISCARD, POSTCOPY_INCOMING_LISTENING, POSTCOPY_INCOMING_RUNNING, POSTCOPY_INCOMING_END } PostcopyState; In the case where ram_load is executed and postcopy_state is POSTCOPY_INCOMING_END, postcopy_running will be set to 'true' and ram_load will behave like a postcopy is in progress. This scenario isn't achievable in a migration but it is reproducible when executing savevm/loadvm after migrating with 'postcopy-ram on', causing loadvm to fail with Error -22: Source: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate tcp:127.0.0.1:4444 Dest: (qemu) migrate_set_capability postcopy-ram on (qemu) ubuntu1704-intel login: Ubuntu 17.04 ubuntu1704-intel ttyS0 ubuntu1704-intel login: (qemu) (qemu) savevm test1 (qemu) loadvm test1 Unknown combination of migration flags: 0x4 (postcopy mode) error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state (qemu) This patch fixes this problem by changing the existing logic for postcopy_advised and postcopy_running in ram_load, making them 'false' if we're at POSTCOPY_INCOMING_END state. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-11-22 08:50:37 +01:00
Juan Quintela	73af8dd8d7	migration: Make xbzrle_cache_size a migration parameter Right now it is a variable in MigrationState instead of a MigrationParameter. The change allows to set it as the rest of the Migration parameters, from the command line, with query_migration_paramters, set_migrate_parameters, etc. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-29 14:06:15 +01:00
Juan Quintela	c9dede2d48	migration: No need to return the size of the cache After the previous commits, we make sure that the value passed is right, or we just drop an error. So now we return if there is one error or we have setup correctly the value passed. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Improve error messasge Return 0 always for success	2017-10-29 14:06:15 +01:00
Juan Quintela	2a313e5cf6	migration: Don't play games with the requested cache size Now that we check that the value passed is a power of 2, we don't need to play games when comparing what is the size that is going to take the cache. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-29 14:06:15 +01:00
Alexey Perevalov	f949461489	migration: add bitmap for received page This patch adds ability to track down already received pages, it's necessary for calculation vCPU block time in postcopy migration feature, and for recovery after postcopy migration failure. Also it's necessary to solve shared memory issue in postcopy livemigration. Information about received pages will be transferred to the software virtual bridge (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for already received pages. fallocate syscall is required for remmaped shared memory, due to remmaping itself blocks ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT error (struct page is exists after remmap). Bitmap is placed into RAMBlock as another postcopy/precopy related bitmaps. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:41 +02:00
Alexey Perevalov	8be4620be2	migration: postcopy_place_page factoring out Need to mark copied pages as closer as possible to the place where it tracks down. That will be necessary in futher patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:39 +02:00
Peter Xu	d6eff5d75d	migration: new ram_init_bitmaps() Rearrange the bitmap initialization and the first sync. Since at it, make sure the locks are taken/released in correct order (I moved RCU unlock upper - though it may not affect much). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:38 +02:00
Peter Xu	84593a0807	migration: clean up xbzrle cache init/destroy Let's further simplify ram_init_all() and ram_save_cleanup() by abstract all the XBZRLE related codes into their own functions. When allocating xbzrle cache, we are always very careful on -ENOMEM; which makes sense. Replacing the last g_malloc0() with g_try_malloc0(), then refactor the logic a bit. This patch should be fixing some memory leaks when some memory allocation failed for XBZRLE in the past. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:37 +02:00
Peter Xu	7d7c96be7b	migration: provide ram_state_cleanup There are two Mutexes that are created but not yet destroyed for RAMState. Fix that. Since we are at it, provide helper function to clean up RAMState. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:36 +02:00
Peter Xu	7d00ee6ad6	migration: provide ram_state_init() The old ram_state_init() is not really initializing the RAMState only, but including lots of other stuff that is RAM-related. Renaming it to ram_init_all(). Instead, provide a real ram_state_init(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:34 +02:00
Juan Quintela	80f8dfde97	migration: Make cache_init() take an error parameter Once there, take a total size instead of the size of the pages. We move the check that the new_size is bigger than one page from xbzrle_cache_resize(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix typo spotted by Peter Xu	2017-10-23 18:03:25 +02:00
Juan Quintela	8acabf69ea	migration: Move xbzrle cache resize error handling to xbzrle_cache_resize Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-10-23 18:03:24 +02:00
Peter Lieven	9ac78b6171	migration: disable auto-converge during bulk block migration auto-converge and block migration currently do not play well together. During block migration the auto-converge logic detects that ram migration makes no progress and thus throttles down the vm until it nearly stalls completely. Avoid this by disabling the throttling logic during the bulk phase of the block migration. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:27:14 +01:00
Vladimir Sementsov-Ogievskiy	86e1167e9a	migration: fix ram_save_pending Fill postcopy-able pending only if ram postcopy is enabled. It is necessary because of there will be other postcopy-able states and when ram postcopy is disabled, it should not spoil common postcopy related pending. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:26 +02:00
Vladimir Sementsov-Ogievskiy	c646762736	migration: add has_postcopy savevm handler Now postcopy-able states are recognized by not NULL save_live_complete_postcopy handler. But when we have several different postcopy-able states, it is not convenient. Ram postcopy may be disabled, while some other postcopy enabled, in this case Ram state should behave as it is not postcopy-able. This patch add separate has_postcopy handler to specify behaviour of savevm state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:25 +02:00
Juan Quintela	f986c3d256	migration: Create multifd migration threads Creation of the threads, nothing inside yet. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Use pointers instead of long array names Move to use semaphores instead of conditions as paolo suggestion Put all the state inside one struct. Use a counter for the number of threads created. Needed during cancellation. Add error return to thread creation Add id field Rename functions to multifd_save/load_setup/cleanup Change recv parameters to a pointer to struct Change back to a struct Use Error * for _cleanup	2017-09-22 14:11:22 +02:00
Peter Xu	2dfaf12ebb	migration: fix comment disorder in RAMState Comments for "migration_dirty_pages" and "bitmap_mutex" are switched. Fix it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1501666880-10159-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-08-02 11:27:28 +01:00
Juan Quintela	f0afa331ce	migration: Make compression_threads use save/load_setup/cleanup() Once there, be consistent and use compress_thread_{save,load}_{setup,cleanup}. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-6-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	f265e0e437	migration: Convert ram to use new load_setup()/load_cleanup() Once there, I rename ram_migration_cleanup() to ram_save_cleanup(). Notice that this is the first pass, and I only passed XBZRLE to the new scheme. Moved decoded_buf to inside XBZRLE struct. As a bonus, I don't have to export xbzrle functions from ram.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- loaded_data pointer was needed because called can change it (dave) spell loaded correctly in comment (dave) Message-Id: <20170628095228.4661-5-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	70f794fcfa	migration: Rename cleanup() to save_cleanup() We need a cleanup for loads, so we rename here to be consistent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Rename htab_cleanup to htap_save_cleanup as dave suggestion Message-Id: <20170628095228.4661-3-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	9907e842d7	migration: Rename save_live_setup() to save_setup() We are going to use it now for more than save live regions. Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-2-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	3416ab5bb4	migration: Don't create decompression threads if not enabled Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- I removed the [HACK] part because previous patch just check that compression pages are not received.	2017-06-14 11:11:06 +02:00
Juan Quintela	edc60127e4	migration: Test for disabled features on reception Right now, if we receive a compressed page while this features are disabled, Bad Things (TM) can happen. Just add a test for them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- I had XBZRLE here also, but it don't need extra resources on destination, only on source. Additionally libvirt don't enable it on destination, so don't put it here. - initialize invalid_flags at declaration time. - remove extra space (peter)	2017-06-14 11:11:06 +02:00
Juan Quintela	1adc1ceef7	migration: Remove unneeded includes Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-14 11:10:19 +02:00
Juan Quintela	6666c96aac	migration: Move migration.h to migration/ Nothing uses it outside of migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	f2a8f0a631	migration: Split registration functions from vmstate.h They are indpendent, and nowadays almost every device register things with qdev->vmsd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:44 +02:00
Juan Quintela	53518d9448	ram: Make RAMState dynamic We create the variable while we are at migration and we remove it after migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-07 10:20:55 +02:00
Juan Quintela	9360447d34	ram: Use MigrationStats for statistics RAM Statistics need to survive migration to make info migrate work, so we need to store them outside of RAMState. As we already have an struct with those fields, just used them. (MigrationStats and XBZRLECacheStats). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:54 +02:00
Juan Quintela	c00e092832	ram: Move ZERO_TARGET_PAGE inside XBZRLE It was only used by XBZRLE anyways. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:54 +02:00
Juan Quintela	83c13382e4	ram: Call migration_page_queue_free() at ram_migration_cleanup() We shouldn't be using memory later than that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:53 +02:00
Juan Quintela	7b1e1a2202	migration: Export ram.c functions in its own file All functions are internal except for ram_mig_init(). Create migration/misc.h for this kind of functions. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	08a0aee15c	migration: Split qemu-file.h Split the file into public and internal interfaces. I have to rename the external one because we can't have two include files with the same name in the same directory. Build system gets confused. The only exported functions are the ones that handle basic types. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:22 +02:00
Felipe Franciosi	b4a3c64b16	migration: use dirty_rate_high_cnt more aggressively The commit message from `070afca25` suggests that dirty_rate_high_cnt should be used more aggressively to start throttling after two iterations instead of four. The code, however, only changes the auto convergence behaviour to throttle after three iterations. This makes the behaviour more aggressive by kicking off throttling after two iterations as originally intended. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	d2a4d85a8a	migration: set bytes_xfer_* outside of autoconverge logic The bytes_xfer_now/prev counters are only used by the auto convergence logic. However, they are used alongside the dirty_pages_rate counter, which is calculated (and required) outside of this logic. The problem with this approach is that if the auto convergence capability is changed while a migration is ongoing, the relationship of the counters will be broken. This moves the management of bytes_xfer_now/prev counters outside of the auto convergence logic to address this issue. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	d693c6f10f	migration: set dirty_pages_rate before autoconverge logic Currently, a "period" in the RAM migration logic is at least a second long and accounts for what happened since the last period (or the beginning of the migration). The dirty_pages_rate counter is calculated at the end this logic. If the auto convergence capability is enabled from the start of the migration, it won't be able to use this counter the first time around. This calculates dirty_pages_rate as soon as a period is deemed over, which allows for it to be used immediately. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	9884db2814	migration: keep bytes_xfer_prev init'd to zero The first time migration_bitmap_sync() is called, bytes_xfer_prev is set to ram_state.bytes_transferred which is, at this point, zero. The next time migration_bitmap_sync() is called, an iteration has happened and bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second has passed, so the auto converge logic will be triggered and bytes_xfer_now will also be set to 'x' bytes. This condition is currently masked by dirty_rate_high_cnt, which will wait for a few iterations before throttling. It would otherwise always assume zero bytes have been copied and therefore throttle the guest (possibly) prematurely. Given bytes_xfer_prev is only used by the auto convergence logic, it makes sense to only set its value after a check has been made against bytes_xfer_now. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Juan Quintela	987772d9e7	migration: Remove vmstate.h from migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> --- Minor rearrangements due to rebase	2017-05-18 19:20:59 +02:00
Juan Quintela	82b9d0f06a	migration: Remove qemu-file.h from vmstate.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- minor rearangements due to the rebase	2017-05-18 19:20:59 +02:00
Juan Quintela	709e3fe825	migration: Create migration/xbzrle.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-18 18:04:54 +02:00
Dr. David Alan Gilbert	1eb3fc0a0b	migration: Fix non-multiple of page size migration Unfortunately it's legal to create a VM with a RAM size that's not a multiple of the underlying host page or huge page size. Recently I'd changed things to always send host sized pages, and that breaks if we have say a 1025MB guest on 2MB hugepages. Unfortunately we can't just make that illegal since it would break migration from/to existing oddly configured VMs. Symptom: qemu-system-x86_64: Illegal RAM offset 40100000 as it transmits the fraction of the hugepage after the end of the RAMBlock (may also cause a crash on the source - possibly due to clearing bits after the bitmap) Reported-by: Yumei Huang <yuhuang@redhat.com> Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1449037 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-18 18:04:53 +02:00
Stefan Hajnoczi	56821559f0	HMP pull -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZHJB7AAoJEAUWMx68W/3nSr8P/3uw04AxuCiEpwVYjtYaMGgs C/1zNXFPRIWDx5kNxi7reH0aknYSjHVijxuGUiAqP9gM+VpGO485gEWDrmSo6tqy 9s3HbudTzL/Qfp7ZwJY+v8gFxjwCZYhDh3WI+3yIjWDLHpGhTdRuYIC4DEA1EkGY zvG8fHISMqZUh1DYE7ttCX1zLlZLaAA1znjYbnXCYNjoRdeeY2+uJsaPhdjqi1la 5qhKWtKS68cfiKSkntBASKMs2+z1+4iyGVVzgd8E64O4JbdirB2JHnjSwm3fJ5yY dhJLY6nSn7eNdBmVHSe0ZBDHfwoVbOlPC9+z0Ob+UxiGcXd0KSYaE3IaD2Ref2LS Wju92rSLcADNYBQtNAxwKpDeixf+WbfetiKDPx71N2rcpdpZg1I3kwu4ippMwh6g jpgjSV2Nkcnv4PQBD9kPSt6jJBmcyN3Y98VvaOFxyOv1u1Xm7ZbhkmxOQKQvG4jc OlBbA20EzOC0axl9ktVF7GljiziCqDaMFljBxNlpdAqzjou7ztEEvXBwcl8I8eBH ThocuOf25CPLrTAwggbEWpKfxzSQ9pEIPHpQL3Qz9Ip7STlcxB/FJ/9oEQvpZeJg MHpgBLDzw1xqJw01n711qgGtEsCvTUoQMcnJxgeaI/WzEIaRd68breI2/qG22aBl 0XgH7THRl8WP+EC8ZYrB =wJvl -----END PGP SIGNATURE----- Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging HMP pull # gpg: Signature made Wed 17 May 2017 07:03:39 PM BST # gpg: using RSA key 0x0516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * dgilbert/tags/pull-hmp-20170517: ramblock: add new hmp command "info ramblock" utils: provide size_to_str() ramblock: add RAMBLOCK_FOREACH() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-18 13:36:15 +01:00
Peter Xu	99e15582de	ramblock: add RAMBLOCK_FOREACH() So that it can simplifies the iterators. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1494562661-9063-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 17:30:37 +01:00
Juan Quintela	bb890ed551	ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO Reflects better what it does now, and avoid confussions with RAM_SAVE_FLAG_COMPRESS_PAGE. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	2bf3aa85f0	migration: Fix regression with compression threads Compression threads got broken on commit commit `2479569466` Author: Juan Quintela <quintela@redhat.com> Date: Tue Mar 21 11:45:01 2017 +0100 ram: reorganize last_sent_block On do_compress_ram_page() we use a different QEMUFile than the migration one. We need to pass it there. The failure can be seen as: (qemu) qemu-system-x86_64: Unknown combination of migration flags: 0 qemu-system-x86_64: error while loading state section id 3(ram) qemu-system-x86_64: load of migration failed: Invalid argument Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com>	2017-05-17 12:04:59 +02:00
Dr. David Alan Gilbert	1db9d8e501	migration: Extra tracing A couple more traces that would have made fixing that postcopy bug a bit easier. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-04 10:41:23 +02:00
Juan Quintela	be07b0ace8	migration: Move postcopy-ram.h to migration/ It is internal to migration, not intended for other users. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:40:30 +02:00
Juan Quintela	6b6712efcc	ram: Split dirty bitmap by RAMBlock Both the ram bitmap and the unsent bitmap are split by RAMBlock. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang)	2017-05-04 10:00:38 +02:00
Juan Quintela	66103a5796	ram: Remove migration_bitmap_extend() We have disabled memory hotplug, so we don't need to handle migration_bitamp there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	352b0de982	ram: Use RAMBitmap type for coherence Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	b8c4899398	ram: rename last_ram_offset() last_ram_pages() We always use it as pages anyways. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	f20e286516	ram: Use ramblock and page offset instead of absolute offset This removes the needto pass also the absolute offset. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	a935e30fbb	ram: Change offset field in PageSearchStatus to page We are moving everything to work on pages, not addresses. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	269ace2951	ram: Remember last_page instead of last_offset Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> -- Improve comment Fix typo	2017-04-21 12:25:39 +02:00
Juan Quintela	06b106889a	ram: Use page number instead of an address for the bitmap operations We use an unsigned long for the page number. Notice that our bitmaps already got that for the index, so we have that limit. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- rename page to page_abs everywhere. fix trace types for pages	2017-04-21 12:25:39 +02:00
Juan Quintela	2479569466	ram: reorganize last_sent_block We were setting it far away of when we changed it. Now everything is done inside save_page_header. Once there, reorganize code to pass RAMState. We also set CONTINUE flag in a single place. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	aaa2064c2a	ram: ram_discard_range() don't use the mis parameter Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	15440dd5a0	ram: Pass RAMBlock to bitmap_sync We change the meaning of start to be the offset from the beggining of the block. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	a0a8aa147a	ram: We don't need MigrationState parameter anymore Remove it from callers and callees. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	5727309d25	migration: Remove MigrationState from migration_in_postcopy We need to call for the migrate_get_current() in more that half of the uses, so call that inside. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	6d358d9494	ram: Remove compression_switch and inline its logic We can calculate its value, so we don't create a variable for it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- After Peter and Dave review, I dropped the variable and just inlined the condition. Fix typo	2017-04-21 12:25:39 +02:00
Juan Quintela	ce25d33781	ram: Move QEMUFile into RAMState We receive the file from save_live operations and we don't use it until 3 or 4 levels of calls down. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00

1 2 3 4 5 ...

373 Commits