mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Mao Zhongyi	2ee30cf078	migration/migration: improve error reporting for migrate parameters use QERR_INVALID_PARAMETER_VALUE instead of "Parameter '%s' expects" for consistency. Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <4ce71da4a5f98ad6ead0806ec71043473dcb4c07.1585641083.git.maozhongyi@cmss.chinamobile.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-07 17:40:24 +01:00
Mao Zhongyi	ed8b2828cc	migration: fix bad indentation in error_report() bad indentation conflicts with CODING_STYLE doc. Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <09f7529c665cac0c6a5e032ac6fdb6ca701f7e37.1585329482.git.maozhongyi@cmss.chinamobile.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-07 17:40:24 +01:00
Peter Maydell	a2261b2754	trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl6wOI4SHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748p7UQAIFSNN0FrDV+K7i8qqq0X+JrS+dNOHNm DSpOf8IaGm/BezzL6XirXBVpFxg9iB5DQVLsjP1kUggO7rbBO0blx5H5eOPhnXZj xg60kLN16ty7NZ/WPS1G9jF4nDsjz0ZUtCXb0OXsuGJIOrsmN2r/lxdJwcjHZaqJ RzbcCSFXlvL0g7mOakJinMJH5r/nWCiUoEYsikhP10DcvuSBoCnjr+LYV6Ef02G0 Y5lgKN2G0EAMgWTJaL3gIF27zS8QLDNll+eO+PIU5K4yo75/wRCKr4e3PpErZlf6 B+hCAAPnXCpDKw+8sK2z+9OZXUGe1hQ8LHNgNNM921C66f+vLLXpIDTAECihM4K4 0wThYlFDwT4j+PMHFNlzIobGMtb33ui8m40lepMt/YOVFqY4tr8u3MLhHkVDo2+8 sNuOOWLXAoFOYyRqgTeVJvZvMUFQqtDiftghw1BR55TyIpDWjvLYRqae5CI+MGXs 6YylZVHGzVjMVptxvivvIQ735Nq8LaKq7N8Cb7uvcbRaCki39BsxXVPZx4p6NdwN dMndUOz/y75dNlRMDjK8l/oRFPJa/p1Yz8mZhl0uVOO6JeJhBwYmk+WkQ7g/GHZb Rx15HnVWRu6C/Icbw4kqZYyqrgl5lykS8aAWURePdpjzKY77rY1H71FesMhjifRN ZGgfUdWI88M4 =ibgH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.1-pull-request' into staging trivial patches (20200504) Silent static analyzer warning Remove dead assignments Support -chardev serial on macOS Update MAINTAINERS Some cosmetic changes # gpg: Signature made Mon 04 May 2020 16:45:18 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.1-pull-request: hw/timer/pxa2xx_timer: Add assertion to silent static analyzer warning hw/timer/stm32f2xx_timer: Remove dead assignment hw/gpio/aspeed_gpio: Remove dead assignment hw/isa/i82378: Remove dead assignment hw/ide/sii3112: Remove dead assignment hw/input/adb-kbd: Remove dead assignment hw/i2c/pm_smbus: Remove dead assignment blockdev: Remove dead assignment block: Avoid dead assignment Compress lines for immediate return chardev: Add macOS to list of OSes that support -chardev serial MAINTAINERS: Update Keith Busch's email address elf_ops: Don't try to g_mapped_file_unref(NULL) hw/mem/pc-dimm: Fix line over 80 characters warning hw/mem/pc-dimm: Print slot number on error at pc_dimm_pre_plug() MAINTAINERS: Mark the LatticeMico32 target as orphan timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write() display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32() scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-05-05 14:03:28 +01:00
Daniel Brodsky	6e8a355de6	lockable: replaced locks with lock guard macros where appropriate - ran regexp "qemu_mutex_lock$.$.\n.*if" to find targets - replaced result with QEMU_LOCK_GUARD if all unlocks at function end - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-04 16:07:43 +01:00
Simran Singhal	b3ac2b94cd	Compress lines for immediate return Compress two lines into a single line if immediate return statement is found. It also remove variables progress, val, data, ret and sock as they are no longer needed. Remove space between function "mixer_load" and '(' to fix the checkpatch.pl error:- ERROR: space prohibited between function name and open parenthesis '(' Done using following coccinelle script: @@ local idexpression ret; expression e; @@ -ret = +return e; -return ret; Signed-off-by: Simran Singhal <singhalsimran0@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200401165314.GA3213@simran-Inspiron-5558> [lv: in handle_aiocb_write_zeroes_unmap() move "int ret" inside the #ifdef] Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-05-04 14:43:22 +02:00
Markus Armbruster	735527e179	migration/colo: Fix qmp_xen_colo_do_checkpoint() error handling The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. qmp_xen_colo_do_checkpoint() passes @errp first to replication_do_checkpoint_all(), and then to colo_notify_filters_event(). If both fail, this will trip the assertion in error_setv(). Similar code in secondary_vm_do_failover() calls colo_notify_filters_event() only after replication_do_checkpoint_all() succeeded. Do the same here. Fixes: `0e8818f023` Cc: Zhang Chen <chen.zhang@intel.com> Cc: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20200422130719.28225-12-armbru@redhat.com>	2020-04-29 08:01:52 +02:00
Marc-André Lureau	9cbc36497c	migration: fix cleanup_bh leak on resume Since commit `8c6b0356b5` ("util/async: make bh_aio_poll() O(1)"), migration-test reveals a leak: QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/qtest/migration-test -p /x86_64/migration/postcopy/recovery tests/qtest/libqtest.c:140: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) ================================================================= ==2082571==ERROR: LeakSanitizer: detected memory leaks Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f25971dfc58 in __interceptor_malloc (/lib64/libasan.so.5+0x10dc58) #1 0x7f2596d08358 in g_malloc (/lib64/libglib-2.0.so.0+0x57358) #2 0x560970d006f8 in qemu_bh_new /home/elmarco/src/qemu/util/main-loop.c:532 #3 0x5609704afa02 in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3407 #4 0x5609704b6b6f in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:92 #5 0x5609704b2bfb in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108 #6 0x560970b9bd6c in qio_task_complete /home/elmarco/src/qemu/io/task.c:196 #7 0x560970b9aa97 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:111 #8 0x7f2596cfee3a (/lib64/libglib-2.0.so.0+0x4de3a) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200325184723.2029630-2-marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-04-02 14:55:45 -04:00
Mao Zhongyi	7cd75cbdb8	migration: use "" instead of (null) for tls-authz run: (qemu) info migrate_parameters announce-initial: 50 ms ... announce-max: 550 ms multifd-compression: none xbzrle-cache-size: 4194304 max-postcopy-bandwidth: 0 tls-authz: '(null)' Migration parameter 'tls-authz' is used to provide the QOM ID of a QAuthZ subclass instance that provides the access control check, default is NULL. But the empty string is not a valid object ID, so use "" instead of the default. Although it will fail when lookup an object with ID "", it is harmless, just consistent with tls_creds. As a bonus, this patch also fixed the bad indentation on the last line and removed 'has_tls_authz' redundant check in 'hmp_info_migrate_parameters'. Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <119f539a9f4d198bc3bcced46b8280520d60bc51.1585100802.git.maozhongyi@cmss.chinamobile.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-25 12:31:38 +00:00
Vladimir Sementsov-Ogievskiy	b4a1733c5e	migration/ram: fix use after free of local_err local_err is used again in migration_bitmap_sync_precopy() after precopy_notify(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-6-vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-25 12:31:38 +00:00
Vladimir Sementsov-Ogievskiy	27d07fcfa7	migration/colo: fix use after free of local_err local_err is used again in secondary_vm_do_failover() after replication_stop_all(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-5-vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-25 12:31:38 +00:00
Mao Zhongyi	06b1c6f8b7	xbzrle: update xbzrle doc Add new parameter description, also: 1. Remove unsociable space. 2. Nit picking: s/two/2 in report Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <20200320143216.423374-1-maozhongyi@cmss.chinamobile.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-25 12:31:38 +00:00
zhanghailiang	19dd408a47	migration: recognize COLO as part of activating process We will migrate parts of dirty pages backgroud lively during the gap time of two checkpoints, without this modification, it will not work because ram_save_iterate() will check it before send RAM_SAVE_FLAG_EOS at the end of it. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-7-zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-13 09:36:30 +00:00
zhanghailiang	8af66371ed	ram/colo: only record bitmap of dirty pages in COLO stage It is only need to record bitmap of dirty pages while goes into COLO stage. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-6-zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-13 09:36:30 +00:00
zhanghailiang	0393031a16	COLO: Optimize memory back-up process This patch will reduce the downtime of VM for the initial process, Previously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> minor typo fixes	2020-03-13 09:36:30 +00:00
Keqian Zhu	dc14a47076	migration/throttle: Add throttle-trig-thres migration parameter Currently, if the bytes_dirty_period is more than the 50% of bytes_xfer_period, we start or increase throttling. If we make this percentage higher, then we can tolerate higher dirty rate during migration, which means less impact on guest. The side effect of higher percentage is longer migration time. We can make this parameter configurable to switch between mig- ration time first or guest performance first. The default value is 50 and valid range is 1 to 100. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Message-Id: <20200224023142.39360-1-zhukeqian1@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-03-13 09:36:30 +00:00
zhanghailiang	f51d0b4178	savevm: Don't call colo_init_ram_cache twice This helper has been called twice which is wrong. Left the one where called while get COLO enable message from source side. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 10:13:54 +01:00
zhanghailiang	6ad8ad38d0	migration/colo: wrap incoming checkpoint process into new helper Split checkpoint incoming process into a helper. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 10:13:54 +01:00
zhanghailiang	0306dae5ac	migration: fix COLO broken caused by a previous commit This commit "migration: Create migration_is_running()" broke COLO. Becuase there is a process broken by this commit. colo_process_checkpoint ->colo_do_checkpoint_transaction ->migrate_set_block_enabled ->qmp_migrate_set_capabilities It can be fixed by make COLO process as an exception, Maybe we need a better way to fix it. Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 10:13:54 +01:00
Stefan Hajnoczi	a152bd0093	migration/block: rename BLOCK_SIZE macro Both <linux/fs.h> and <sys/mount.h> define BLOCK_SIZE macros. Avoiding using that name in block/migration.c. I noticed this when including <liburing.h> (Linux io_uring) from "block/aio.h" and compilation failed. Although patches adding that include haven't been sent yet, it makes sense to rename the macro now in case someone else stumbles on it in the meantime. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 09:25:49 +01:00
Pan Nengyuan	26daeba4d6	migration/savevm: release gslist after dump_vmstate_json 'list' forgot to free at the end of dump_vmstate_json_to_file(), although it's called only once, but seems like a clean code. Fix the leak as follow: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768) #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445) #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066) #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139) #4 0x5585db591581 in object_class_get_list_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1084 #5 0x5585db590f66 in object_class_foreach_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1028 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7) #7 0x5585db59110c in object_class_foreach /mnt/sdb/qemu-new/qemu/qom/object.c:1038 #8 0x5585db5916b6 in object_class_get_list /mnt/sdb/qemu-new/qemu/qom/object.c:1092 #9 0x5585db335ca0 in dump_vmstate_json_to_file /mnt/sdb/qemu-new/qemu/migration/savevm.c:638 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308 #12 0x5585da29420d in _start (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d) Indirect leak of 7472 byte(s) in 467 object(s) allocated from: #0 0x7fb946abd768 in __interceptor_malloc (/lib64/libasan.so.5+0xef768) #1 0x7fb945eca445 in g_malloc (/lib64/libglib-2.0.so.0+0x52445) #2 0x7fb945ee2066 in g_slice_alloc (/lib64/libglib-2.0.so.0+0x6a066) #3 0x7fb945ee3139 in g_slist_prepend (/lib64/libglib-2.0.so.0+0x6b139) #4 0x5585db591581 in object_class_get_list_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1084 #5 0x5585db590f66 in object_class_foreach_tramp /mnt/sdb/qemu-new/qemu/qom/object.c:1028 #6 0x7fb945eb35f7 in g_hash_table_foreach (/lib64/libglib-2.0.so.0+0x3b5f7) #7 0x5585db59110c in object_class_foreach /mnt/sdb/qemu-new/qemu/qom/object.c:1038 #8 0x5585db5916b6 in object_class_get_list /mnt/sdb/qemu-new/qemu/qom/object.c:1092 #9 0x5585db335ca0 in dump_vmstate_json_to_file /mnt/sdb/qemu-new/qemu/migration/savevm.c:638 #10 0x5585daa5bcbf in main /mnt/sdb/qemu-new/qemu/vl.c:4420 #11 0x7fb941204812 in __libc_start_main ../csu/libc-start.c:308 #12 0x5585da29420d in _start (/mnt/sdb/qemu-new/qemu/build/x86_64-softmmu/qemu-system-x86_64+0x27f020d) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 09:25:49 +01:00
Chen Qun	600fe89d6e	migration/vmstate: Remove redundant statement in vmstate_save_state_v() The "ret" has been assigned in all branches. It didn't need to be assigned separately. Clang static code analyzer show warning: migration/vmstate.c:365:17: warning: Value stored to 'ret' is never read ret = 0; ^ ~ Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-28 09:25:49 +01:00
Juan Quintela	87dc6f5f66	multifd: Add zstd compression multifd support Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-28 09:25:49 +01:00
Juan Quintela	6a9ad15420	multifd: Add multifd-zstd-level parameter This parameter specifies the zstd compression level. The next patch will put it to use. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com>	2020-02-28 09:25:28 +01:00
Juan Quintela	7ec2c2b3c1	multifd: Add zlib compression multifd support Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-28 09:24:43 +01:00
Juan Quintela	9004db48c0	multifd: Add multifd-zlib-level parameter This parameter specifies the zlib compression level. The next patch will put it to use. Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-28 09:24:43 +01:00
Juan Quintela	ab7cbb0b9a	multifd: Make no compression operations into its own structure It will be used later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- No comp value needs to be zero.	2020-02-28 09:24:43 +01:00
Juan Quintela	96eef04238	multifd: Add multifd-compression parameter This will store the compression method to use. We start with none. Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Rename multifd-method to multifd-compression	2020-02-28 09:24:43 +01:00
Dr. David Alan Gilbert	2a1bc8bde7	migration/rdma: rdma_accept_incoming_migration fix error handling rdma_accept_incoming_migration is called from an fd handler and can't return an Error * anywhere. Currently it's leaking Error's in errp/local_err - there's no point putting them in there unless we can report them. Turn most into fprintf's, and the last into an error_reportf_err where it's coming up from another function. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-13 10:55:55 +01:00
Keqian Zhu	d05de9e39a	migration: Optimization about wait-unplug migration state qemu_savevm_nr_failover_devices() is originally designed to get the number of failover devices, but it actually returns the number of "unplug-pending" failover devices now. Moreover, what drives migration state to wait-unplug should be the number of "unplug-pending" failover devices, not all failover devices. We can also notice that qemu_savevm_state_guest_unplug_pending() and qemu_savevm_nr_failover_devices() is equivalent almost (from the code view). So the latter is incorrect semantically and useless, just delete it. In the qemu_savevm_state_guest_unplug_pending(), once hit a unplug-pending failover device, then it can return true right now to save cpu time. Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Jens Freimann <jfreimann@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-13 10:53:10 +01:00
Zhimin Feng	8958338b10	migration: Maybe VM is paused when migration is cancelled If the migration is cancelled when it is in the completion phase, the migration state is set to MIGRATION_STATUS_CANCELLING. The VM maybe wait for the 'pause_sem' semaphore in migration_maybe_pause function, so that VM always is paused. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-02-13 10:52:58 +01:00
Wei Yang	42d24611af	migration/compress: compress QEMUFile is not writable We open a file with empty_ops for compress QEMUFile, which means this is not writable. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-29 11:28:59 +01:00
Eric Auger	a085664f21	migration: Simplify get_qlist Instead of inserting read elements at the head and then reversing the list, it is simpler to add each element after the previous one. Introduce QLIST_RAW_INSERT_AFTER helper and use it in get_qlist(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	d32ca5ad79	multifd: Split multifd code into its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	b673eab4e2	multifd: Make multifd_load_setup() get an Error parameter We need to change the full chain to pass the Error parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	00f4b572e6	multifd: Make multifd_save_setup() get an Error parameter Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	857a4bbb86	migration: Make checkpatch happy with comments Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	a6703e4d33	multifd: Use qemu_target_page_size() We will make it cpu independent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	99f2c6fb46	multifd: multifd_send_sync_main only needs the qemufile Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	67a4c8910c	multifd: multifd_queue_page only needs the qemufile Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	df94d32bb1	multifd: multifd_send_pages only needs the qemufile Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Zhimin Feng	9c4d333c09	migration/multifd: fix nullptr access in multifd_send_terminate_threads If the multifd_send_threads is not created when migration is failed, multifd_save_cleanup would be called twice. In this senario, the multifd_send_state is accessed after it has been released, the result is that the source VM is crashing down. Here is the coredump stack: Program received signal SIGSEGV, Segmentation fault. 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 1012 MultiFDSendParams *p = &multifd_send_state->params[i]; #0 0x00005629333a78ef in multifd_send_terminate_threads (err=err@entry=0x0) at migration/ram.c:1012 #1 0x00005629333ab8a9 in multifd_save_cleanup () at migration/ram.c:1028 #2 0x00005629333abaea in multifd_new_send_channel_async (task=0x562935450e70, opaque=<optimized out>) at migration/ram.c:1202 #3 0x000056293373a562 in qio_task_complete (task=task@entry=0x562935450e70) at io/task.c:196 #4 0x000056293373a6e0 in qio_task_thread_result (opaque=0x562935450e70) at io/task.c:111 #5 0x00007f475d4d75a7 in g_idle_dispatch () from /usr/lib64/libglib-2.0.so.0 #6 0x00007f475d4da9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #7 0x0000562933785b33 in glib_pollfds_poll () at util/main-loop.c:219 #8 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #9 main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:518 #10 0x00005629334c5acf in main_loop () at vl.c:1810 #11 0x000056293334d7bb in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4471 If the multifd_send_threads is not created when migration is failed. In this senario, we don't call multifd_save_cleanup in multifd_new_send_channel_async. Signed-off-by: Zhimin Feng <fengzhimin1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	392d87e213	migration: Create migration_is_running() This function returns true if we are in the middle of a migration. It is like migration_is_setup_or_active() with CANCELLING and COLO. Adapt all callers that are needed. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	b69a0227a8	migration: Don't send data if we have stopped If we do a cancel, we got out without one error, but we can't do the rest of the output as in a normal situation. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	a555b8092a	qemu-file: Don't do IO after shutdown Be sure that we are not doing neither read/write after shutdown of the QEMUFile. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Juan Quintela	3d4095b222	multifd: Make sure that we don't do any IO after an error Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00
Marc-André Lureau	4f67d30b5e	qdev: set properties with device_class_set_props() The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:15 +01:00
Juan Quintela	ddac5cb2d9	multifd: Be consistent about using uint64_t We transmit ram_addr_t always as uint64_t. Be consistent in its use (on 64bit system, it is always uint64_t problem is 32bits). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-20 09:17:07 +01:00
Eric Auger	4746dbf8a9	migration: Support QLIST migration Support QLIST migration using the same principle as QTAILQ: `94869d5c52` ("migration: migrate QTAILQ"). The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V. The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD and QLIST_RAW_REVERSE. Tests also are provided. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Peter Xu	93062e2361	migration: Change SaveStateEntry.instance_id into uint32_t It was always used as 32bit, so define it as used to be clear. Instead of using -1 as the auto-gen magic value, we switch to UINT32_MAX. We also make sure that we don't auto-gen this value to avoid overflowed instance IDs without being noticed. Suggested-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Peter Xu	1df2c9a26f	migration: Define VMSTATE_INSTANCE_ID_ANY Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to auto-generate the vmstate instance ID. Previously it was hard coded as -1 instead of this macro. It helps to change this default value in the follow up patches. No functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Alexey Romko	8bba004cca	Bug #1829242 correction. Added type conversions to ram_addr_t before all left shifts of page indexes to TARGET_PAGE_BITS, to correct overflows when the page address was 4Gb and more. Signed-off-by: Alexey Romko <nevilad@yahoo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Jiahui Cen	9560a48ecc	migration/multifd: fix destroyed mutex access in terminating multifd threads One multifd will lock all the other multifds' IOChannel mutex to inform them to quit by setting p->quit or shutting down p->c. In this senario, if some multifds had already been terminated and multifd_load_cleanup/multifd_save_cleanup had destroyed their mutex, it could cause destroyed mutex access when trying lock their mutex. Here is the coredump stack: #0 0x00007f81a2794437 in raise () from /usr/lib64/libc.so.6 #1 0x00007f81a2795b28 in abort () from /usr/lib64/libc.so.6 #2 0x00007f81a278d1b6 in __assert_fail_base () from /usr/lib64/libc.so.6 #3 0x00007f81a278d262 in __assert_fail () from /usr/lib64/libc.so.6 #4 0x000055eb1bfadbd3 in qemu_mutex_lock_impl (mutex=0x55eb1e2d1988, file=<optimized out>, line=<optimized out>) at util/qemu-thread-posix.c:64 #5 0x000055eb1bb4564a in multifd_send_terminate_threads (err=<optimized out>) at migration/ram.c:1015 #6 0x000055eb1bb4bb7f in multifd_send_thread (opaque=0x55eb1e2d19f8) at migration/ram.c:1171 #7 0x000055eb1bfad628 in qemu_thread_start (args=0x55eb1e170450) at util/qemu-thread-posix.c:502 #8 0x00007f81a2b36df5 in start_thread () from /usr/lib64/libpthread.so.0 #9 0x00007f81a286048d in clone () from /usr/lib64/libc.so.6 To fix it up, let's destroy the mutex after all the other multifd threads had been terminated. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Jiahui Cen	f76e32eb05	migration/multifd: fix nullptr access in terminating multifd threads One multifd channel will shutdown all the other multifd's IOChannel when it fails to receive an IOChannel. In this senario, if some multifds had not received its IOChannel yet, it would try to shutdown its IOChannel which could cause nullptr access at qio_channel_shutdown. Here is the coredump stack: #0 object_get_class (obj=obj@entry=0x0) at qom/object.c:908 #1 0x00005563fdbb8f4a in qio_channel_shutdown (ioc=0x0, how=QIO_CHANNEL_SHUTDOWN_BOTH, errp=0x0) at io/channel.c:355 #2 0x00005563fd7b4c5f in multifd_recv_terminate_threads (err=<optimized out>) at migration/ram.c:1280 #3 0x00005563fd7bc019 in multifd_recv_new_channel (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce00) at migration/ram.c:1478 #4 0x00005563fda82177 in migration_ioc_process_incoming (ioc=ioc@entry=0x556400255610, errp=errp@entry=0x7ffec07dce30) at migration/migration.c:605 #5 0x00005563fda8567d in migration_channel_process_incoming (ioc=0x556400255610) at migration/channel.c:44 #6 0x00005563fda83ee0 in socket_accept_incoming_migration (listener=0x5563fff6b920, cioc=0x556400255610, opaque=<optimized out>) at migration/socket.c:166 #7 0x00005563fdbc25cd in qio_net_listener_channel_func (ioc=<optimized out>, condition=<optimized out>, opaque=<optimized out>) at io/net-listener.c:54 #8 0x00007f895b6fe9a9 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #9 0x00005563fdc18136 in glib_pollfds_poll () at util/main-loop.c:218 #10 0x00005563fdc181b5 in os_host_main_loop_wait (timeout=1000000000) at util/main-loop.c:241 #11 0x00005563fdc183a2 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:517 #12 0x00005563fd8edb37 in main_loop () at vl.c:1791 #13 0x00005563fd74fd45 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4473 To fix it up, let's check p->c before calling qio_channel_shutdown. Signed-off-by: Jiahui Cen <cenjiahui@huawei.com> Signed-off-by: Ying Fang <fangying1@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	c6b3a2e0c4	migration/multifd: not use multifd during postcopy We don't support multifd during postcopy, but user still could enable both multifd and postcopy. This leads to migration failure. Skip multifd during postcopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	eab54aa78f	migration/multifd: clean pages after filling packet This is a preparation for the next patch: not use multifd during postcopy. Without enabling postcopy, everything looks good. While after enabling postcopy, migration may fail even not use multifd during postcopy. The reason is the pages is not properly cleared and old target page will continue to be transferred. After clean pages, migration succeeds. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	644acf99b8	migration/postcopy: enable compress during postcopy postcopy requires to place a whole host page, while migration thread migrate memory in target page size. This makes postcopy need to collect all target pages in one host page before placing via userfaultfd. To enable compress during postcopy, there are two problems to solve: 1. Random order for target page arrival 2. Target pages in one host page arrives without interrupt by target page from other host page The first one is handled by previous cleanup patch. This patch handles the second one by: 1. Flush compress thread for each host page 2. Wait for decompress thread for before placing host page Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	91ba442f5c	migration/postcopy: enable random order target page arrival After using number of target page received to track one host page, we could have the capability to handle random order target page arrival in one host page. This is a preparation for enabling compress during postcopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	e5e73b0f90	migration/postcopy: set all_zero to true on the first target page For the first target page, all_zero is set to true for this round check. After target_pages introduced, we could leverage this variable instead of checking the address offset. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	4cbb3c63c1	migration/postcopy: count target page number to decide the place_needed In postcopy, it requires to place whole host page instead of target page. Currently, it relies on the page offset to decide whether this is the last target page. We also can count the target page number during the iteration. When the number of target page equals (host page size / target page size), this means it is the last target page in the host page. This is a preparation for non-ordered target page transmission. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	ca1a6b708b	migration/postcopy: wait for decompress thread in precopy Compress is not supported with postcopy, it is safe to wait for decompress thread just in precopy. This is a preparation for later patch. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Wei Yang	2e36bc1b88	migration/postcopy: reduce memset when it is zero page and matches_target_page_size In this case, page_buffer content would not be used. Skip this to save some time. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Yury Kotov	e65cec5e5d	migration/ram: Yield periodically to the main loop Usually, incoming migration coroutine yields to the main loop while its IO-channel is waiting for data to receive. But there is a case when RAM migration and data receive have the same speed: VM with huge zeroed RAM. In this case, IO-channel won't read and thus the main loop is stuck and for instance, it doesn't respond to QMP commands. For this case, yield periodically, but not too often, so as not to affect the speed of migration. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Scott Cheloha	174723ffe5	migration: savevm_state_handler_insert: constant-time element insertion savevm_state's SaveStateEntry TAILQ is a priority queue. Priority sorting is maintained by searching from head to tail for a suitable insertion spot. Insertion is thus an O(n) operation. If we instead keep track of the head of each priority's subqueue within that larger queue we can reduce this operation to O(1) time. savevm_state_handler_remove() becomes slightly more complex to accomodate these gains: we need to replace the head of a priority's subqueue when removing it. With O(1) insertion, booting VMs with many SaveStateEntry objects is more plausible. For example, a ppc64 VM with maxmem=8T has 40000 such objects to insert. Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Scott Cheloha	bd5de61e7b	migration: add savevm_state_handler_remove() Create a function to abstract common logic needed when removing a SaveStateEntry element from the savevm_state.handlers queue. For now we just remove the element. Soon it will involve additional cleanup. Signed-off-by: Scott Cheloha <cheloha@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Yury Kotov	603d5a42d3	migration: Fix the re-run check of the migrate-incoming command The current check sets an error but doesn't fail the command. This may cause a problem if new connection attempt by the same URI affects the first connection. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Fangrui Song	2667c98722	migration: Fix incorrect integer->float conversion caught by clang Clang does not like qmp_migrate_set_downtime()'s code to clamp double @value to 0..INT64_MAX: qemu/migration/migration.c:2038:24: error: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Werror,-Wimplicit-int-float-conversion] The warning will be enabled by default in clang 10. It is not available for clang <= 9. The clamp is actually useless; @value is checked to be within 0..MAX_MIGRATE_DOWNTIME_SECONDS immediately before. Delete it. While there, make the conversion from double to int64_t explicit. Signed-off-by: Fangrui Song <i@maskray.me> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [Patch split, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Dr. David Alan Gilbert	97e1e06780	migration: Rate limit inside host pages When using hugepages, rate limiting is necessary within each huge page, since a 1G huge page can take a significant time to send, so you end up with bursty behaviour. Fixes: `4c011c37ec` ("postcopy: Send whole huge pages") Reported-by: Lin Ma <LMa@suse.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Daniel Henrique Barboza	03acb4e94d	ram.c: remove unneeded labels ram_save_queue_pages() has an 'err' label that can be replaced by 'return -1' instead. Same thing with ram_discard_range(), and in this case we can also get rid of the 'ret' variable and return either '-1' on error or the result of ram_block_discard_range(). CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Juan Quintela	4d65a6216b	migration: Make sure that we don't call write() in case of error If we are exiting due to an error/finish/.... Just don't try to even touch the channel with one IO operation. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:22 +01:00
Juan Quintela	d069bcca6c	multifd: Initialize local variable Fill everything with zero, so the padding fields are also initialized. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2020-01-20 09:08:53 +01:00
Marc-André Lureau	3cad405bab	vmstate: replace DeviceState with VMStateIf Replace DeviceState dependency with VMStateIf on vmstate API. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com>	2020-01-06 18:41:32 +04:00
Paolo Bonzini	44901b5aff	colo: fix return without releasing RCU Use WITH_RCU_READ_LOCK_GUARD to avoid exiting colo_init_ram_cache without releasing RCU. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:33:52 +01:00
Marc-André Lureau	e4f1bea2a8	migration: fix maybe-uninitialized warning ../migration/ram.c: In function ‘multifd_recv_thread’: /home/elmarco/src/qq/include/qapi/error.h:165:5: error: ‘block’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 165 \| error_setg_internal((errp), __FILE__, __LINE__, __func__, \ \| ^~~~~~~~~~~~~~~~~~~ ../migration/ram.c:818:15: note: ‘block’ was declared here 818 \| RAMBlock *block; \| ^~~~~ Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:32:47 +01:00
Beata Michalska	bd108a44bc	migration: ram: Switch to ram block writeback Switch to ram block writeback for pmem migration. Signed-off-by: Beata Michalska <beata.michalska@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20191121000843.24844-4-beata.michalska@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-12-16 10:46:35 +00:00
Jens Freimann	284f42a520	net/virtio: fix dev_unplug_pending .dev_unplug_pending is set up by virtio-net code indepent of failover support was set for the device or not. This gives a wrong result when we check for existing primary devices in migration code. Fix this by actually calling dev_unplug_pending() instead of just checking if the function pointer was set. When the feature was not negotiated dev_unplug_pending() will always return false. This prevents us from going into the wait-unplug state when there's no primary device present. Fixes: `9711cd0dfc` ("net/virtio: add failover support") Signed-off-by: Jens Freimann <jfreimann@redhat.com> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2019-11-25 23:30:28 +08:00
Jens Freimann	c7e0acd5a3	migration: add new migration state wait-unplug This patch adds a new migration state called wait-unplug. It is entered after the SETUP state if failover devices are present. It will transition into ACTIVE once all devices were succesfully unplugged from the guest. So if a guest doesn't respond or takes long to honor the unplug request the user will see the migration state 'wait-unplug'. In the migration thread we query failover devices if they're are still pending the guest unplug. When all are unplugged the migration continues. If one device won't unplug migration will stay in wait_unplug state. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20191029114905.6856-9-jfreimann@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-10-29 18:55:26 -04:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Vladimir Sementsov-Ogievskiy	ef9041a7b8	block/dirty-bitmap: refactor bdrv_dirty_bitmap_next bdrv_dirty_bitmap_next is always used in same pattern. So, split it into _next and _first, instead of combining two functions into one and add FOR_EACH_DIRTY_BITMAP macro. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-5-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Vladimir Sementsov-Ogievskiy	5deb6cbd1f	block/dirty-bitmap: add bs link Add bs field to BdrvDirtyBitmap structure. Drop BlockDriverState parameter from bitmap APIs where possible. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20190916141911.5255-3-vsementsov@virtuozzo.com [Rebased on top of block-copy. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-10-17 17:02:32 -04:00
Eric Auger	9a85e4b8f6	migration: Support gtree migration Introduce support for GTree migration. A custom save/restore is implemented. Each item is made of a key and a data. If the key is a pointer to an object, 2 VMSDs are passed into the GTree VMStateField. When putting the items, the tree is traversed in sorted order by g_tree_foreach. On the get() path, gtrees must be allocated using the proper key compare, key destroy and value destroy. This must be handled beforehand, for example in a pre_load method. Tests are added to test save/dump of structs containing gtrees including the virtio-iommu domain/mappings scenario. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20191011121724.433-1-eric.auger@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> uintptr_t fixup for test on 32bit	2019-10-11 17:52:31 +01:00
Wei Yang	aff66d2ef0	migration/multifd: pages->used would be cleared when attach to multifd_send_state When we found an available channel in multifd_send_pages(), its pages->used is cleared and then attached to multifd_send_state. It is not necessary to do this twice. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191011085050.17622-5-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 15:02:06 +01:00
Wei Yang	9985e1f48d	migration/multifd: initialize packet->magic/version once at setup stage MultiFDPacket_t's magic and version field never changes during migration, so move these two fields in setup stage. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191011085050.17622-4-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 15:02:00 +01:00
Wei Yang	f2148c4c79	migration/multifd: use pages->allocated instead of the static max multifd_send_fill_packet() prepares meta data for following pages to transfer. It would be more proper to fill pages->allocated instead of static max value, especially we want to support flexible packet size. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191011085050.17622-3-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 15:01:54 +01:00
Wei Yang	d884e77bfe	migration/multifd: fix a typo in comment of multifd_recv_unfill_packet() Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191011085050.17622-2-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 15:01:44 +01:00
Wei Yang	0197d89025	migration/postcopy: check PostcopyState before setting to POSTCOPY_INCOMING_RUNNING Currently, we set PostcopyState blindly to RUNNING, even we found the previous state is not LISTENING. This will lead to a corner case. First let's look at the code flow: qemu_loadvm_state_main() ret = loadvm_process_command() loadvm_postcopy_handle_run() return -1; if (ret < 0) { if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING) ... } >From above snippet, the corner case is loadvm_postcopy_handle_run() always sets state to RUNNING. And then it checks the previous state. If the previous state is not LISTENING, it will return -1. But at this moment, PostcopyState is already been set to RUNNING. Then ret is checked in qemu_loadvm_state_main(), when it is -1 PostcopyState is checked. Current logic would pause postcopy and retry if PostcopyState is RUNNING. This is not what we expect, because postcopy is not active yet. This patch makes sure state is set to RUNNING only previous state is LISTENING by checking the state first. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Suggested by: Peter Xu <peterx@redhat.com> Message-Id: <20191010011316.31363-3-richardw.yang@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 15:00:16 +01:00
Wei Yang	2a7eb14844	migration/postcopy: rename postcopy_ram_enable_notify to postcopy_ram_incoming_setup Function postcopy_ram_incoming_setup and postcopy_ram_incoming_cleanup is a pair. Rename to make it clear for audience. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20191010011316.31363-2-richardw.yang@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:59:58 +01:00
Wei Yang	2d49bacda0	migration/postcopy: postpone setting PostcopyState to END There are two places to call function postcopy_ram_incoming_cleanup() postcopy_ram_listen_thread on migration success loadvm_postcopy_handle_listen one setup failure On success, the vm will never accept another migration. On failure, PostcopyState is transited from LISTENING to END and would be checked in qemu_loadvm_state_main(). If PostcopyState is RUNNING, migration would be paused and retried. Currently PostcopyState is set to END in function postcopy_ram_incoming_cleanup(). With above analysis, we can take this step out and postpone this till the end of listen thread to indicate the listen thread is done. This is a preparation patch for later cleanup. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191006000249.29926-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up in merge to the 1 parameter postcopy_state_set	2019-10-11 14:57:22 +01:00
Wei Yang	2a461c2467	migration/postcopy: mis->have_listen_thread check will never be touched If mis->have_listen_thread is true, this means current PostcopyState must be LISTENING or RUNNING. While the check at the beginning of the function makes sure the state transaction happens when its previous PostcopyState is ADVISE or DISCARD. This means we would never touch this check. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191006000249.29926-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:53:30 +01:00
Wei Yang	4991f3091e	migration: report SaveStateEntry id and name on failure This provides helpful information on which entry failed. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191005220517.24029-5-richardw.yang@linux.intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:31:39 +01:00
Wei Yang	17d9351bf2	migration: pass in_postcopy instead of check state again Not necessary to do the check again. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191005220517.24029-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:31:27 +01:00
Wei Yang	da1725d3f9	migration/postcopy: fix typo in mark_postcopy_blocktime_begin's comment Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191005220517.24029-3-richardw.yang@linux.intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:31:08 +01:00
Wei Yang	6629890d55	migration/postcopy: map large zero page in postcopy_ram_incoming_setup() postcopy_ram_incoming_setup() and postcopy_ram_incoming_cleanup() are counterpart. It is reasonable to map/unmap large zero page in these two functions respectively. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191005135021.21721-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:28:49 +01:00
Wei Yang	3414322a83	migration/postcopy: allocate tmp_page in setup stage During migration, a tmp page is allocated so that we could place a whole host page during postcopy. Currently the page is allocated during load stage, this is a little bit late. And more important, if we failed to allocate it, the error is not checked properly. Even it is NULL, we would still use it. This patch moves the allocation to setup stage and if failed error message would be printed and caller would notice it. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:28:19 +01:00
Dr. David Alan Gilbert	fb14a42ade	migration: Don't try and recover return path in non-postcopy In normal precopy we can't do reconnection recovery - but we also don't need to, since you can just rerun migration. At the moment if the 'return-path' capability is on, we use the return path in precopy to give a positive 'OK' to the end of migration; however if migration fails then we fall into the postcopy recovery path and hang. This fixes it by only running the return path in the postcopy case. Reported-by: Greg Kurz <groug@kaod.org> Tested-by: Greg Kurz <groug@kaod.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:25:26 +01:00
Dr. David Alan Gilbert	987ab2a549	migration: Use automatic rcu_read unlock in rdma.c Use the automatic read unlocker in migration/rdma.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:20:01 +01:00
Dr. David Alan Gilbert	89ac5a1d2a	migration: Use automatic rcu_read unlock in ram.c Use the automatic read unlocker in migration/ram.c Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:20:00 +01:00
Dr. David Alan Gilbert	0e6ebd4877	migration: Fix missing rcu_read_unlock Use the automatic rcu_read unlocker to fix a missing unlock. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20191007143642.301445-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:19:59 +01:00
Wei Yang	8f8d528e73	migration: use migration_is_active to represent active state Wrap the check into a function to make it easy to read. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190717005341.14140-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-10-11 14:18:13 +01:00
Dr. David Alan Gilbert	3748fef9b9	migration/postcopy: Recognise the recovery states as 'in_postcopy' Various parts of the migration code do different things when they're in postcopy mode; prior to this patch this has been 'postcopy-active'. This patch extends 'in_postcopy' to include 'postcopy-paused' and 'postcopy-recover'. In particular, when you set the max-postcopy-bandwidth parameter, this only affects the current migration fd if we're 'in_postcopy'; this leads to a race in the postcopy recovery test where it increases the speed from 4k/sec to unlimited, but that increase can get ignored if the change is made between the point at which the reconnection happens and it transitions back to active. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190923174942.12182-1-dgilbert@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> Tested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Dr. David Alan Gilbert	d46a4847ca	migration/rdma.c: Swap synchronize_rcu for call_rcu This fixes a deadlock that can occur on the migration source after a failed RDMA migration; as the source tries to cleanup it clears a pair of pointers and uses synchronize_rcu to wait; this is happening on the main thread. With the CPUs running a CPU thread can be an rcu reader and attempt to grab the main lock (kvm_handle_io->address_space_write->flatview_write->flatview_write_continue-> prepare_mmio_access->qemu_mutex_lock_iothread_impl) Replace the synchronize_rcu with a call_rcu to postpone the freeing. Fixes: `74637e6f08` ("migration: implement bi-directional RDMA QIOChannel") ( https://bugzilla.redhat.com/show_bug.cgi?id=1746787 ) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190913163507.1403-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Dr. David Alan Gilbert	de8434a35a	migration/rdma: Don't moan about disconnects at the end If we've already finished the migration or something has already gone wrong, don't moan about the migration stream disconnecting. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190913163507.1403-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Wei Yang	64737606e8	migration: remove sent parameter in get_queued_page_not_dirty This is a cleanup for previous removal of unsentmap. The sent parameter is not necessary now. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190819061843.28642-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Wei Yang	1e7cf8c323	migration/postcopy: unsentmap is not necessary for postcopy Commit `f3f491fcd6` ('Postcopy: Maintain unsentmap') introduced unsentmap to track not yet sent pages. This is not necessary since: * unsentmap is a sub-set of bmap before postcopy start * unsentmap is the summation of bmap and unsentmap after canonicalizing This patch just removes it. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190819061843.28642-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Wei Yang	8324ef86f0	migration/postcopy: not necessary to do discard when canonicalizing bitmap All pages, either partially sent or partially dirty, will be discarded in postcopy_send_discard_bm_ram(), since we update the unsentmap to be unsentmap = unsentmap \| dirty in ram_postcopy_send_discard_bitmap(). This is not necessary to do discard when canonicalizing bitmap. And by doing so, we separate the page discard into two individual steps: * canonicalize bitmap * discard page Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190819061843.28642-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Marc-André Lureau	91490583f3	migration: fix vmdesc leak on vmstate_save() error Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190912122514.22504-2-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-25 15:51:19 +01:00
Nir Soffer	8972571509	block: Remove unused masks Replace confusing usage: ~BDRV_SECTOR_MASK With more clear: (BDRV_SECTOR_SIZE - 1) Remove BDRV_SECTOR_MASK and the unused BDRV_BLOCK_OFFSET_MASK which was it's last user. Signed-off-by: Nir Soffer <nsoffer@redhat.com> Message-id: 20190827185913.27427-3-nsoffer@redhat.com Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2019-09-16 14:48:30 +02:00
Wei Yang	268dcd46ae	migration: fix one typo in comment of function migration_total_bytes() Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190912024957.11780-1-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:25:06 +01:00
Wei Yang	1bf57fb3df	migration/qemu-file: fix potential buf waste for extra buf_index adjustment In add_to_iovec(), qemu_fflush() will be called if iovec is full. If this happens, buf_index is reset. Currently, this is not checked and buf_index would always been adjust with buf size. This is not harmful, but will waste some space in file buffer. This patch make add_to_iovec() return 1 when it has flushed the file. Then the caller could check the return value to see whether it is necessary to adjust the buf_index any more. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190911132839.23336-3-richard.weiyang@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:23:32 +01:00
Wei Yang	89fe04b458	migration/qemu-file: remove check on writev_buffer in qemu_put_compression_data The check of writev_buffer is in qemu_fflush, which means it is not harmful if it is NULL. And removing it will make the code consistent since all other add_to_iovec() is called without the check. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190911132839.23336-2-richard.weiyang@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:23:26 +01:00
Peter Xu	8504ddeca0	migration: Fix postcopy bw for recovery We've got max-postcopy-bandwidth parameter but it's not applied correctly after a postcopy recovery so the recovered migration stream will still eat the whole net bandwidth. Fix that up. Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190906130103.20961-1-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:21:25 +01:00
Yury Kotov	b9d68df62a	migration: Add validate-uuid capability This capability realizes simple source validation by UUID. It's useful for live migration between hosts. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190903162246.18524-2-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:19:23 +01:00
Dr. David Alan Gilbert	3b34870672	qemu-file: Rework old qemu_fflush comment Commit `11808bb` removed the non-iovec based write support, the comment hung on. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190823103946.7388-1-dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:16:10 +01:00
Dr. David Alan Gilbert	ce62df5378	migration: register_savevm_live doesn't need dev Commit `78dd48df3` removed the last caller of register_savevm_live for an instantiable device (rather than a single system wide device); so trim out the parameter. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190822115433.12070-1-dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 11:15:03 +01:00
Wei Yang	cea3b4c083	migration: cleanup check on ops in savevm.handlers iterations During migration, there are several places to iterate on savevm.handlers. And on each iteration, we need to check its ops and related callbacks before invoke it. Generally, ops is the first element to check, and it is only necessary to check it once. This patch clean all the related part in savevm.c to check ops only once in those iterations. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190819032804.8579-1-richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 10:55:23 +01:00
Ivan Ren	2f4aefd320	migration: multifd_send_thread always post p->sem_sync when error happen When encounter error, multifd_send_thread should always notify who pay attention to it before exit. Otherwise it may block migration_thread at multifd_send_sync_main forever. Error as follow: ------------------------------------------------------------------------------- (gdb) bt #0 0x00007f4d669dfa0b in do_futex_wait.constprop.1 () from /lib64/libpthread.so.0 #1 0x00007f4d669dfa9f in __new_sem_wait_slow.constprop.0 () from /lib64/libpthread.so.0 #2 0x00007f4d669dfb3b in sem_wait@@GLIBC_2.2.5 () from /lib64/libpthread.so.0 #3 0x0000562ccf0a5614 in qemu_sem_wait (sem=sem@entry=0x562cd1b698e8) at util/qemu-thread-posix.c:319 #4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099 #5 0x0000562ccecb95f4 in ram_save_iterate (f=0x562cd0ecc000, opaque=<optimized out>) at /qemu/migration/ram.c:3550 #6 0x0000562ccef43c23 in qemu_savevm_state_iterate (f=0x562cd0ecc000, postcopy=false) at migration/savevm.c:1189 #7 0x0000562ccef3dcf3 in migration_iteration_run (s=0x562cd09fabf0) at migration/migration.c:3131 #8 migration_thread (opaque=opaque@entry=0x562cd09fabf0) at migration/migration.c:3258 #9 0x0000562ccf0a4c26 in qemu_thread_start (args=<optimized out>) at util/qemu-thread-posix.c:502 #10 0x00007f4d669d9e25 in start_thread () from /lib64/libpthread.so.0 #11 0x00007f4d6670635d in clone () from /lib64/libc.so.6 (gdb) f 4 #4 0x0000562ccecb4752 in multifd_send_sync_main (rs=<optimized out>) at /qemu/migration/ram.c:1099 1099 qemu_sem_wait(&p->sem_sync); (gdb) list 1094 } 1095 for (i = 0; i < migrate_multifd_channels(); i++) { 1096 MultiFDSendParams *p = &multifd_send_state->params[i]; 1097 1098 trace_multifd_send_sync_main_wait(p->id); 1099 qemu_sem_wait(&p->sem_sync); 1100 } 1101 trace_multifd_send_sync_main(multifd_send_state->packet_num); 1102 } 1103 (gdb) p i $1 = 0 (gdb) p multifd_send_state->params[0].pending_job $2 = 2 //It means the job before MULTIFD_FLAG_SYNC has already fail (gdb) p multifd_send_state->params[0].quit $3 = true Signed-off-by: Ivan Ren <ivanren@tencent.com> Message-Id: <1567044996-2362-1-git-send-email-ivanren@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-09-12 10:53:33 +01:00
Juan Quintela	0705e56496	multifd: Use number of channels as listen backlog Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-09-03 23:24:42 +02:00
Juan Quintela	fc8135c630	socket: Add num connections to qio_net_listener_open_sync() Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-09-03 23:24:42 +02:00
Paolo Bonzini	9458a9a1df	memory: fix race between TCG and accesses to dirty bitmap There is a race between TCG and accesses to the dirty log: vCPU thread reader thread ----------------------- ----------------------- TLB check -> slow path notdirty_mem_write write to RAM set dirty flag clear dirty flag TLB check -> fast path read memory write to RAM Fortunately, in order to fix it, no change is required to the vCPU thread. However, the reader thread must delay the read after the vCPU thread has finished the write. This can be approximated conservatively by run_on_cpu, which waits for the end of the current translation block. A similar technique is used by KVM, which has to do a synchronous TLB flush after doing a test-and-clear of the dirty-page flags. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-08-20 17:26:20 +02:00
John Snow	c4e4b0fa59	qapi: implement block-dirty-bitmap-remove transaction action It is used to do transactional movement of the bitmap (which is possible in conjunction with merge command). Transactional bitmap movement is needed in scenarios with external snapshot, when we don't want to leave copy of the bitmap in the base image. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190708220502.12977-3-jsnow@redhat.com [Edited "since" version to 4.2 --js] Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:03 -04:00
John Snow	28636b8211	block/dirty-bitmap: add bdrv_dirty_bitmap_get Add a public interface for get. While we're at it, rename "bdrv_get_dirty_bitmap_locked" to "bdrv_dirty_bitmap_get_locked". (There are more functions to rename to the bdrv_dirty_bitmap_VERB form, but they will wait until the conclusion of this series.) Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20190709232550.10724-11-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-08-16 16:28:02 -04:00
Peter Maydell	95a9457fd4	Header cleanup patches for 2019-08-13 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAl1WleASHGFybWJydUBy ZWRoYXQuY29tAAoJEDhwtADrkYZTBBYQALQLzIYb2Zux95bAxoJdhqNuEOGLfxeu gx0i0roPe6SBleHozUK+gf7kVYyw7he58n2dZURGqrpqktgZOFcea2a6Dq1rnVw6 JMJ2Oy7V326bHwJT0Np9rW4n+FHsMQZoAUEHjl9EeGCZfO/zy2aSWPsD8mbcbm0g hUW5Jr4+cpm28BCL8I+2HhWFazB6G2IPAF9oEXmNsOM6J1Ho8WGrTAjASe0Il5Yi m2B4QWG+4uz77WYnkttnssm41K1S95HYyaKluIVyNwTnsPTN303V/sUj+wdRaooL k1O6WqaavGhal7QeRqy+vCpF8m6qLq7NaYCzSCOrrkkuC8TAnpVn7Xmi9qI+vb6O kGBpDWhq5wOnphsEhnFvhPZgD+WZo3mwTgW4h0d3UhB6orOTPTMvWKEwFJ1j/O6/ gntV61o542c9gpZjS133221HRmNjteHF/5/TFzmX/G50sgivJn+WOP87naM2aBAz 8MW5HatTox+qQqYD4VMUIVnVkguxHDVhFRBunYu0HvZZ1Rud+Lc6Xzi6H4jDlZ81 vtOmAlMU3dbp97gNvJrAVqV4JIL3puOWbu0MMaQWoG53Kcdfu46LIr57TTg3dw61 R9e7HSOQjYILChoodwELlyeAsVeZo3IzX9vPX8aw7MoHvneyTUNqtha/rHsLEwsb 97G19dydGEC6 =eSUz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-include-2019-08-13-v2' into staging Header cleanup patches for 2019-08-13 # gpg: Signature made Fri 16 Aug 2019 12:39:12 BST # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-include-2019-08-13-v2: (29 commits) sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h Include sysemu/sysemu.h a lot less Clean up inclusion of sysemu/sysemu.h numa: Move remaining NUMA declarations from sysemu.h to numa.h Include sysemu/hostmem.h less numa: Don't include hw/boards.h into sysemu/numa.h Include hw/boards.h a bit less Include hw/qdev-properties.h less Include qemu/main-loop.h less Include qemu/queue.h slightly less Include hw/hw.h exactly where needed Include qom/object.h slightly less Include exec/memory.h slightly less Include migration/vmstate.h less migration: Move the VMStateDescription typedef to typedefs.h Clean up inclusion of exec/cpu-common.h Include hw/irq.h a lot less typedefs: Separate incomplete types and function types ide: Include hw/ide/internal a bit less outside hw/ide/ ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-08-16 14:53:43 +01:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Markus Armbruster	46517dd497	Include sysemu/sysemu.h a lot less In my "build everything" tree, changing sysemu/sysemu.h triggers a recompile of some 5400 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/qdev-core.h includes sysemu/sysemu.h since recent commit `e965ffa70a` "qdev: add qdev_add_vm_change_state_handler()". This is a bad idea: hw/qdev-core.h is widely included. Move the declaration of qdev_add_vm_change_state_handler() to sysemu/sysemu.h, and drop the problematic include from hw/qdev-core.h. Touching sysemu/sysemu.h now recompiles some 1800 objects. qemu/uuid.h also drops from 5400 to 1800. A few more headers show smaller improvement: qemu/notify.h drops from 5600 to 5200, qemu/timer.h from 5600 to 4500, and qapi/qapi-types-run-state.h from 5500 to 5000. Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190812052359.30071-28-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>	2019-08-16 13:31:53 +02:00
Markus Armbruster	a27bd6c779	Include hw/qdev-properties.h less In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>	2019-08-16 13:31:53 +02:00
Markus Armbruster	db72581598	Include qemu/main-loop.h less In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	d484205210	Include exec/memory.h slightly less Drop unnecessary inclusions from headers. Downgrade a few more to exec/hwaddr.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-17-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	d645427057	Include migration/vmstate.h less In my "build everything" tree, changing migration/vmstate.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get VMStateDescription. The previous commit made that unnecessary. Include migration/vmstate.h only where it's still needed. Touching it now recompiles only some 1600 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-16-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-08-16 13:31:52 +02:00
Markus Armbruster	6a0acfff99	Clean up inclusion of exec/cpu-common.h migration/qemu-file.h neglects to include it even though it needs ram_addr_t. Fix that. Drop a few superfluous inclusions elsewhere. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-14-armbru@redhat.com>	2019-08-16 13:31:52 +02:00
Juan Quintela	7dd59d01dd	migration: add some multifd traces Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190814020218.1868-6-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Juan Quintela	18cdcea371	migration: Make global sem_sync semaphore by channel This makes easy to debug things because when you want for all threads to arrive at that semaphore, you know which one your are waiting for. Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190814020218.1868-3-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Juan Quintela	5558c91ae8	migration: Add traces for multifd terminate threads Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190814020218.1868-2-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Marc-André Lureau	3170a6453b	qemu-file: move qemu_{get,put}_counted_string() declarations Move migration helpers for strings under include/, so they can be used outside of migration/ Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190808150325.21939-2-marcandre.lureau@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	1ce542620a	migration/postcopy: use mis->bh instead of allocating a QEMUBH For migration incoming side, it either quit in precopy or postcopy. It is safe to use the mis->bh for both instead of allocating a dedicated QEMUBH for postcopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190805053146.32326-1-richardw.yang@linux.intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	7a3e957177	migration: rename migration_bitmap_sync_range to ramblock_sync_dirty_bitmap Rename for better understanding of the code. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190808033155.30162-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Ivan Ren	81507f6b7e	migration: update ram_counters for multifd sync packet Multifd sync will send MULTIFD_FLAG_SYNC flag info to destination, add these bytes to ram_counters record. Signed-off-by: Ivan Ren <ivanren@tencent.com> Suggested-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <1564464816-21804-4-git-send-email-ivanren@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Ivan Ren	1b81c974cc	migration: add speed limit for multifd migration Limit the speed of multifd migration through common speed limitation qemu file. Signed-off-by: Ivan Ren <ivanren@tencent.com> Message-Id: <1564464816-21804-3-git-send-email-ivanren@tencent.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Ivan Ren	5d7d255863	migration: add qemu_file_update_transfer interface Add qemu_file_update_transfer for just update bytes_xfer for speed limitation. This will be used for further migration feature such as multifd migration. Signed-off-by: Ivan Ren <ivanren@tencent.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1564464816-21804-2-git-send-email-ivanren@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Ivan Ren	87f3bd8717	migration: always initialise ram_counters for a new migration This patch fix a multifd migration bug in migration speed calculation, this problem can be reproduced as follows: 1. start a vm and give a heavy memory write stress to prevent the vm be successfully migrated to destination 2. begin a migration with multifd 3. migrate for a long time [actually, this can be measured by transferred bytes] 4. migrate cancel 5. begin a new migration with multifd, the migration will directly run into migration_completion phase Reason as follows: Migration update bandwidth and s->threshold_size in function migration_update_counters after BUFFER_DELAY time: current_bytes = migration_total_bytes(s); transferred = current_bytes - s->iteration_initial_bytes; time_spent = current_time - s->iteration_start_time; bandwidth = (double)transferred / time_spent; s->threshold_size = bandwidth * s->parameters.downtime_limit; In multifd migration, migration_total_bytes function return qemu_ftell(s->to_dst_file) + ram_counters.multifd_bytes. s->iteration_initial_bytes will be initialized to 0 at every new migration, but ram_counters is a global variable, and history migration data will be accumulated. So if the ram_counters.multifd_bytes is big enough, it may lead pending_size >= s->threshold_size become false in migration_iteration_run after the first migration_update_counters. Signed-off-by: Ivan Ren <ivanren@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Suggested-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <1564741121-1840-1-git-send-email-ivanren@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	14adf288d3	migration: remove unused field bytes_xfer MigrationState->bytes_xfer is only set to 0 in migrate_init(). Remove this unnecessary field. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190402003106.17614-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	9dec3cc3f4	migration/postcopy: use QEMU_IS_ALIGNED to replace host_offset Use QEMU_IS_ALIGNED for the check, it would be more consistent with other align calculations. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190806004648.8659-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	dad45ab2be	migration/postcopy: simplify calculation of run_start and fixup_start_addr The purpose of the calculation is to find a HostPage which is partially dirty. * fixup_start_addr points to the start of the HostPage to discard * run_start points to the next HostPage to check While in the middle stage, there would two cases for run_start: * aligned with HostPage means this is not partially dirty * not aligned means this is partially dirty When it is aligned, no work and calculation is necessary. run_start already points to the start of next HostPage and is ready to continue. When it is not aligned, the calculation could be simplified with: * fixup_start_addr = QEMU_ALIGN_DOWN(run_start, host_ratio) * run_start = QEMU_ALIGN_UP(run_start, host_ratio) By doing so, run_start always points to the next HostPage to check. fixup_start_addr always points to the HostPage to discard. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190806004648.8659-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	810cf2bbd4	migration/postcopy: make PostcopyDiscardState a static variable In postcopy-ram.c, we provide three functions to discard certain RAMBlock range: * postcopy_discard_send_init() * postcopy_discard_send_range() * postcopy_discard_send_finish() Currently, we allocate/deallocate PostcopyDiscardState for each RAMBlock on sending discard information to destination. This is not necessary and the same data area could be reused for each RAMBlock. This patch defines PostcopyDiscardState a static variable. By doing so: 1) avoid memory allocation and deallocation to the system 2) avoid potential failure of memory allocation 3) hide some details for their users Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190724010721.2146-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	10da4a3689	migration: extract ram_load_precopy After cleanup, it would be clear to audience there are two cases ram_load: * precopy * postcopy And it is not necessary to check postcopy_running on each iteration for precopy. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190725002023.2335-3-richardw.yang@linux.intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	be4a1a1b6f	migration: return -EINVAL directly when version_id mismatch It is not reasonable to continue when version_id mismatch. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190722075339.25121-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	4695ce3fdc	migration: equation is more proper than and to check LOADVM_QUIT LOADVM_QUIT allows a command to quit all layers of nested loadvm loops, while current return value check is not that proper even it works now. Current return value check "ret & LOADVM_QUIT" would return true if bit[0] is 1. This would be true when ret is -1 which is used to indicate an error of handling a command. Since there is only one place return LOADVM_QUIT and no other combination of return value, use "ret == LOADVM_QUIT" would be more proper. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190718064257.29218-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	5d0980a459	migration: just pass RAMBlock is enough RAMBlock->used_length is always passed to migration_bitmap_sync_range(), which could be retrieved from RAMBlock. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190718012547.16373-1-richardw.yang@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	6a88eb2b08	migration: use migration_in_postcopy() to check POSTCOPY_ACTIVE Use common helper function to check the state. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190719071129.11880-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	52aec70923	migration/postcopy: start_postcopy could be true only when migrate_postcopy() return true There is only one place to set start_postcopy to true, qmp_migrate_start_postcopy(), which make sure start_postcopy could be set to true when migrate_postcopy() return true. So start_postcopy is true implies the other one. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190718083747.5859-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	305b6f8431	migration/postcopy: PostcopyState is already set in loadvm_postcopy_handle_advise() PostcopyState is already set to ADVISE at the beginning of loadvm_postcopy_handle_advise(). Remove the redundant set. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190711080816.6405-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	e326767b45	migration/savevm: move non SaveStateEntry condition check out of iteration in_postcopy and iterable_only are not SaveStateEntry specific, it would be more proper to check them out of iteration. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190709140924.13291-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	622a80c955	migration/savevm: split qemu_savevm_state_complete_precopy() into two parts This is a preparation patch for further cleanup. No functional change, just wrap two major part of qemu_savevm_state_complete_precopy() into function. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190709140924.13291-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	4e455d51ef	migration/savevm: flush file for iterable_only case It would be proper to flush file even for iterable_only case. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190709140924.13291-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	8996604fe6	migration/postcopy: do_fixup is true when host_offset is non-zero This means it is not necessary to spare an extra variable to hold this condition. Use host_offset directly is fine. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190710050814.31344-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	e927a03317	migration/postcopy: reduce one operation to calculate fixup_start_addr Use the same way for run_end to calculate run_start, which saves one operation. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190710050814.31344-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	a162b572e9	migration/postcopy: discard_length must not be 0 Since we break the loop when there is no more page to discard, we are sure the following process would find some page to discard. It is not necessary to check it again. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190627020822.15485-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	33a5cb6202	migration/postcopy: break the loop when there is no more page to discard When one is equal or bigger then end, it means there is no page to discard. Just break the loop in this case instead of processing it. No functional change, just refactor it a little. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190627020822.15485-3-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	0abfff9ea7	migration/postcopy: the valid condition is one less then end If one equals end, it means we have gone through the whole bitmap. Use a more restrict check to skip a unnecessary condition. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190627020822.15485-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Wei Yang	640dfb14db	migration: consolidate time info into populate_time_info Consolidate time information fill up into its function for better readability. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190716005411.4156-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Yury Kotov	3d661c8ab1	migration: Add error_desc for file channel errors Currently, there is no information about error if outgoing migration was failed because of file channel errors. Example (QMP session): -> { "execute": "migrate", "arguments": { "uri": "exec:head -c 1" }} <- { "return": {} } ... -> { "execute": "query-migrate" } <- { "return": { "status": "failed" }} // There is not error's description And even in the QEMU's output there is nothing. This patch 1) Adds errp for the most of QEMUFileOps 2) Adds qemu_file_get_error_obj/qemu_file_set_error_obj 3) And finally using of qemu_file_get_error_obj in migration.c And now, the status for the mentioned fail will be: -> { "execute": "query-migrate" } <- { "return": { "status": "failed", "error-desc": "Unable to write to command: Broken pipe" }} Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190422103420.15686-1-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-08-14 17:33:14 +01:00
Ivan Ren	f193bc0c53	migration: fix migrate_cancel multifd migration leads destination hung forever When migrate_cancel a multifd migration, if run sequence like this: [source] [destination] multifd_send_sync_main[finish] multifd_recv_thread wait &p->sem_sync shutdown to_dst_file detect error from_src_file send RAM_SAVE_FLAG_EOS[fail] [no chance to run multifd_recv_sync_main] multifd_load_cleanup join multifd receive thread forever will lead destination qemu hung at following stack: pthread_join qemu_thread_join multifd_load_cleanup process_incoming_migration_co coroutine_trampoline Signed-off-by: Ivan Ren <ivanren@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1561468699-9819-4-git-send-email-ivanren@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-24 14:47:21 +02:00
Juan Quintela	3c3ca25d1f	migration: Make explicit that we are quitting multifd We add a bool to indicate that. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-24 14:47:12 +02:00
Ivan Ren	a3ec6b7d23	migration: fix migrate_cancel leads live_migration thread hung forever When we 'migrate_cancel' a multifd migration, live_migration thread may hung forever at some points, because of multifd_send_thread has already exit for socket error: 1. multifd_send_pages may hung at qemu_sem_wait(&multifd_send_state-> channels_ready) 2. multifd_send_sync_main my hung at qemu_sem_wait(&multifd_send_state-> sem_sync) Signed-off-by: Ivan Ren <ivanren@tencent.com> Message-Id: <1561468699-9819-3-git-send-email-ivanren@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- Remove spurious not needed bits	2019-07-24 14:47:02 +02:00
Ivan Ren	713f762a31	migration: fix migrate_cancel leads live_migration thread endless loop When we 'migrate_cancel' a multifd migration, live_migration thread may go into endless loop in multifd_send_pages functions. Reproduce steps: (qemu) migrate_set_capability multifd on (qemu) migrate -d url (qemu) [wait a while] (qemu) migrate_cancel Then may get live_migration 100% cpu usage in following stack: pthread_mutex_lock qemu_mutex_lock_impl multifd_send_pages multifd_queue_page ram_save_multifd_page ram_save_target_page ram_save_host_page ram_find_and_save_block ram_find_and_save_block ram_save_iterate qemu_savevm_state_iterate migration_iteration_run migration_thread qemu_thread_start start_thread clone Signed-off-by: Ivan Ren <ivanren@tencent.com> Message-Id: <1561468699-9819-2-git-send-email-ivanren@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-24 14:46:51 +02:00
Ivan Ren	40c4d4a835	migration: always initial RAMBlock.bmap to 1 for new migration Reproduce the problem: migrate migrate_cancel migrate Error happen for memory migration The reason as follows: 1. qemu start, ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] all set to 1 by a series of cpu_physical_memory_set_dirty_range 2. migration start:ram_init_bitmaps - memory_global_dirty_log_start: begin log diry - memory_global_dirty_log_sync: sync dirty bitmap to ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] - migration_bitmap_sync_range: sync ram_list. dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero 3. migration data... 4. migrate_cancel, will stop log dirty 5. migration start:ram_init_bitmaps - memory_global_dirty_log_start: begin log diry - memory_global_dirty_log_sync: sync dirty bitmap to ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] - migration_bitmap_sync_range: sync ram_list. dirty_memory[DIRTY_MEMORY_MIGRATION] to RAMBlock.bmap and ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] is set to zero Here RAMBlock.bmap only have new logged dirty pages, don't contain the whole guest pages. Signed-off-by: Ivan Ren <ivanren@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1563115879-2715-1-git-send-email-ivanren@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:47:47 +02:00
Wei Yang	40277ca807	migration/postcopy: remove redundant cpu_synchronize_all_post_init cpu_synchronize_all_post_init() is called twice in loadvm_postcopy_handle_run_bh(), so remove one redundant call. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190715080751.24304-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:45:59 +02:00
Wei Yang	89dab31b27	migration/postcopy: fix document of postcopy_send_discard_bm_ram() Commit `6b6712efcc` ('ram: Split dirty bitmap by RAMBlock') changes the parameter of postcopy_send_discard_bm_ram(), while left the document part untouched. This patch correct the document and fix two typo by hand. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190715020549.15018-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:45:22 +02:00
Peng Tao	b17fbbe55c	migration: allow private destination ram with x-ignore-shared By removing the share ram check, qemu is able to migrate to private destination ram when x-ignore-shared capability is on. Then we can create multiple destination VMs based on the same source VM. This changes the x-ignore-shared migration capability to work similar to Lai's original bypass-shared-memory work(https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg00003.html) which enables kata containers (https://katacontainers.io) to implement the VM templating feature. An example usage in kata containers(https://katacontainers.io): 1. Start the source VM: qemu-system-x86 -m 2G \ -object memory-backend-file,id=mem0,size=2G,share=on,mem-path=/tmpfs/template-memory \ -numa node,memdev=mem0 2. Stop the template VM, set migration x-ignore-shared capability, migrate "exec:cat>/tmpfs/state", quit it 3. Start target VM: qemu-system-x86 -m 2G \ -object memory-backend-file,id=mem0,size=2G,share=off,mem-path=/tmpfs/template-memory \ -numa node,memdev=mem0 \ -incoming defer 4. connect to target VM qmp, set migration x-ignore-shared capability, migrate_incoming "exec:cat /tmpfs/state" 5. create more target VMs repeating 3 and 4 Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Yury Kotov <yury-kotov@yandex-team.ru> Cc: Jiangshan Lai <laijs@hyper.sh> Cc: Xu Wang <xu@hyper.sh> Signed-off-by: Peng Tao <tao.peng@linux.alibaba.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1560494113-1141-1-git-send-email-tao.peng@linux.alibaba.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:03 +02:00
Peter Xu	002cad6b16	migration: Split log_clear() into smaller chunks Currently we are doing log_clear() right after log_sync() which mostly keeps the old behavior when log_clear() was still part of log_sync(). This patch tries to further optimize the migration log_clear() code path to split huge log_clear()s into smaller chunks. We do this by spliting the whole guest memory region into memory chunks, whose size is decided by MigrationState.clear_bitmap_shift (an example will be given below). With that, we don't do the dirty bitmap clear operation on the remote node (e.g., KVM) when we fetch the dirty bitmap, instead we explicitly clear the dirty bitmap for the memory chunk for each of the first time we send a page in that chunk. Here comes an example. Assuming the guest has 64G memory, then before this patch the KVM ioctl KVM_CLEAR_DIRTY_LOG will be a single one covering 64G memory. If after the patch, let's assume when the clear bitmap shift is 18, then the memory chunk size on x86_64 will be 1UL<<18 * 4K = 1GB. Then instead of sending a big 64G ioctl, we'll send 64 small ioctls, each of the ioctl will cover 1G of the guest memory. For each of the 64 small ioctls, we'll only send if any of the page in that small chunk was going to be sent right away. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190603065056.25211-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:03 +02:00
Peter Xu	267691b65c	migration: No need to take rcu during sync_dirty_bitmap cpu_physical_memory_sync_dirty_bitmap() has one RAMBlock* as parameter, which means that it must be with RCU read lock held already. Taking it again inside seems redundant. Removing it. Instead comment on the functions about the RCU read lock. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190603065056.25211-2-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:02 +02:00
Wei Yang	422314e751	migration/ram.c: reset complete_round when we gets a queued page In case we gets a queued page, the order of block is interrupted. We may not rely on the complete_round flag to say we have already searched the whole blocks on the list. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190605010828.6969-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:02 +02:00
Wei Yang	77568ea7f8	migration/multifd: sync packet_num after all thread are done Notification from recv thread is not ordered, which means we may be notified by one MultiFDRecvParams but adjust packet_num for another. Move the adjustment after we are sure each recv thread are sync-ed. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190604023540.26532-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:02 +02:00
Wei Yang	ca35380390	migration/xbzrle: update cache and current_data in one place When we are not in the last_stage, we need to update the cache if page is not the same. Currently this procedure is scattered in two places and mixed with encoding status check. This patch extract this general step out to make the code a little bit easy to read. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190610004159.20966-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:02 +02:00
Wei Yang	b6526c4b21	migration/multifd: call multifd_send_sync_main when sending RAM_SAVE_FLAG_EOS On receiving RAM_SAVE_FLAG_EOS, multifd_recv_sync_main() is called to synchronize receive threads. Current synchronization mechanism is to wait for each channel's sem_sync semaphore. This semaphore is triggered by a packet with MULTIFD_FLAG_SYNC flag. While in current implementation, we don't do multifd_send_sync_main() to send such packet when blk_mig_bulk_active() is true. This will leads to the receive threads won't notify multifd_recv_sync_main() by sem_sync. And multifd_recv_sync_main() will always wait there. [Note]: normal migration test works, while didn't test the blk_mig_bulk_active() case. Since not sure how to produce this situation. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190612014337.11255-1-richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:02 +02:00
Juan Quintela	8ebad0f7a7	migration: fix multifd_recv event typo It uses num in multifd_send(). Make it coherent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-07-15 15:39:01 +02:00
Like Xu	5cc8767d05	general: Replace global smp variables with smp machine properties Basically, the context could get the MachineState reference via call chains or unrecommended qdev_get_machine() in !CONFIG_USER_ONLY mode. A local variable of the same name would be introduced in the declaration phase out of less effort OR replace it on the spot if it's only used once in the context. No semantic changes. Signed-off-by: Like Xu <like.xu@linux.intel.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190518205428.90532-4-like.xu@linux.intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-07-05 17:07:36 -03:00
Alex Bennée	1f4abd81f7	migration: move port_attr inside CONFIG_LINUX Otherwise the FreeBSD compiler complains about an unused variable. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2019-07-04 19:23:07 +01:00
Zhang Chen	0e8818f023	migration/colo.c: Add missed filter notify for Xen COLO. We need to notify net filter to do checkpoint for Xen COLO, like KVM side. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2019-07-02 10:21:07 +08:00
Markus Armbruster	a8d2532645	Include qemu-common.h exactly where needed No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]	2019-06-12 13:20:20 +02:00
Markus Armbruster	0b8fa32f55	Include qemu/module.h where needed, drop it from qemu-common.h Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]	2019-06-12 13:18:33 +02:00
Peter Maydell	0d74f3b427	Trivial fixes 06/06/2019 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAlz4844SHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748FtwQALCoNnMrEY4mnmHy0dnEQPRFcPMKa9pp 3lqpmxLHAkSsWFKmmLKPteZhUroBmzXPa91984hhQiglcMMMsIPy+A+x1QBj7Yt2 KeEKpIdSS6Qi4T72zVOtO4MR1pCeKUYHY8ICn/rqAkpkA/lt5DuX2xJSepgrSdAI /JgpawJ4Rz95x5rCLuy/t5egtKVYVhauv4EbQ9PeaFhSlwoKNYbc6qAZSvs8pr9n H4W8DgtI35wPj4zE3i9bbmnUUxCUMj6MjkIm/jTB5qewY/I+llb27CN2Uq1yvRKW ANGbGW3rVwQe8p6kbVcM7CDbawm4J0c59w/4mUTa3BRRuAj4KtHTeghXALHLn/gv aO90oZKGd2xGxpSMAapzgebNezUQxFFoRWhyI4o8N+SWEpoRbHkxDwrk2WlKXsCR xRYOensU17NOKMJ32AbUReC2/m7D71EH3723aVzd2O5nuIHlsEG2CYjlzjXFB4X8 wPbaigcqpDEMwLTt3kYy4TrghFdcSaAYepmqXJ9D9UONMOrnRhhR9GkvEmCB4Eus BJanLE0xp59KTVDZ5c/v6+44P/RQ04aD2oFh0bMKlNb4+cfbQlq2odoiCPcUKimh XCCbYbwJFRRkgGTh9pFMgMqH9zX8HTgG/2Zp2VGFnYXnJzz/AuvgH9k5Pc4P3C/A lOCSBWpxR6bk =wmCY -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-pull-request' into staging Trivial fixes 06/06/2019 # gpg: Signature made Thu 06 Jun 2019 12:05:50 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-pull-request: hw/watchdog/wdt_i6300esb: Use DEVICE() macro to access DeviceState.qdev hw/scsi: Use the QOM BUS() macro to access BusState.qbus hw/sd: Use the QOM BUS() macro to access BusState.qbus hw/audio/ac97: Use the QOM DEVICE() macro to access DeviceState.qdev hw/vfio/pci: Use the QOM DEVICE() macro to access DeviceState.qdev hw/usb-storage: Use the QOM DEVICE() macro to access DeviceState.qdev hw/isa: Use the QOM DEVICE() macro to access DeviceState.qdev hw/s390x/event-facility: Use the QOM BUS() macro to access BusState.qbus hw/pci-bridge: Use the QOM BUS() macro to access BusState.qbus hw/scsi/vmw_pvscsi: Use qbus_reset_all() directly docs/devel/build-system: Update an example test: Fix make target check-report.tap util: Adjust qemu_guest_getrandom_nofail for Coverity vhost: fix incorrect print type migration: fix a typo hw/rdma: Delete unused headers inclusion Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-06-06 14:09:14 +01:00
Li Qiang	ff1543af22	migration: fix a typo 'postocpy' should be 'postcopy'. CC: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190525062832.18009-1-liq3ea@163.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-06-06 11:17:32 +02:00
Wei Yang	0315851938	migratioin/ram: leave RAMBlock->bmap blank on allocating During migration, we would sync bitmap from ram_list.dirty_memory to RAMBlock.bmap in cpu_physical_memory_sync_dirty_bitmap(). Since we set RAMBlock.bmap and ram_list.dirty_memory both to all 1, this means at the first round this sync is meaningless and is a duplicated work. Leaving RAMBlock->bmap blank on allocating would have a side effect on migration_dirty_pages, since it is calculated from the result of cpu_physical_memory_sync_dirty_bitmap(). To keep it right, we need to set migration_dirty_pages to 0 in ram_state_init(). Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-06-05 12:44:03 +02:00
Yury Kotov	61053d4826	migration: Fix fd protocol for incoming defer Currently, incoming migration through fd supports only command-line case: E.g. fork(); fd = open(); exec("qemu ... -incoming fd:%d", fd); It's possible to use add-fd commands to pass fd for migration, but it's invalid case. add-fd works with fdset but not with particular fds. To work with getfd in incoming defer it's enough to use monitor_fd_param instead of strtol. monitor_fd_param supports both cases: * fd:123 * fd:fd_name (added by getfd). And also the use of monitor_fd_param improves error messages. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-06-05 12:43:55 +02:00
Wei Yang	f38d7fbc01	migration/ram.c: multifd_send_state->count is not really used Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-06-05 12:42:54 +02:00
Wei Yang	7d4eaace46	migration/ram.c: MultiFDSendParams.sem_sync is not really used Besides init and destroy, MultiFDSendParams.sem_sync is not really used. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-06-05 12:42:39 +02:00
Kevin Wolf	d861ab3acf	block: Add BlockBackend.ctx This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2019-06-04 15:22:22 +02:00
John Snow	592203e7cf	migration/dirty-bitmaps: change bitmap enumeration method Shift from looking at every root BDS to every BDS. This will migrate bitmaps that are attached to blockdev created nodes instead of just ones attached to emulated storage devices. Note that this will not migrate anonymous or internal-use bitmaps, as those are defined as having no name. This will also fix the Coverity issues Peter Maydell has been asking about for the past several releases, as well as fixing a real bug. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reported-by: Coverity 😅 Reported-by: aihua liang <aliang@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com> Message-id: 20190514201926.10407-1-jsnow@redhat.com Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1652490 Fixes: Coverity CID 1390625 CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>	2019-05-28 19:33:31 -04:00
Greg Kurz	b6eca81e1b	migration: Fix typo in migrate_add_blocker() error message Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <155800428514.543845.17558475870097990036.stgit@bahia.lan> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-05-22 17:35:27 +02:00
Wei Yang	a5f7b1a63c	migration/ram.c: fix typos in comments Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190510233729.15554-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 19:00:04 +01:00
Yury Kotov	fd392cfa8e	migration: Fix use-after-free during process exit It fixes heap-use-after-free which was found by clang's ASAN. Control flow of this use-after-free: main_thread: * Got SIGTERM and completes main loop * Calls migration_shutdown - migrate_fd_cancel (so, migration_thread begins to complete) - object_unref(OBJECT(current_migration)); migration_thread: * migration_iteration_finish -> schedule cleanup bh * object_unref(OBJECT(s)); (Now, current_migration is freed) * exits main_thread: * Calls vm_shutdown -> drain bdrvs -> main loop -> cleanup_bh -> use after free If you want to reproduce, these couple of sleeps will help: vl.c:4613: migration_shutdown(); + sleep(2); migration.c:3269: + sleep(1); trace_migration_thread_after_loop(); migration_iteration_finish(s); Original output: qemu-system-x86_64: terminating on signal 15 from pid 31980 (<unknown process>) ================================================================= ==31958==ERROR: AddressSanitizer: heap-use-after-free on address 0x61900001d210 at pc 0x555558a535ca bp 0x7fffffffb190 sp 0x7fffffffb188 READ of size 8 at 0x61900001d210 thread T0 (qemu-vm-0) #0 0x555558a535c9 in migrate_fd_cleanup migration/migration.c:1502:23 #1 0x5555594fde0a in aio_bh_call util/async.c:90:5 #2 0x5555594fe522 in aio_bh_poll util/async.c:118:13 #3 0x555559524783 in aio_poll util/aio-posix.c:725:17 #4 0x555559504fb3 in aio_wait_bh_oneshot util/aio-wait.c:71:5 #5 0x5555573bddf6 in virtio_blk_data_plane_stop hw/block/dataplane/virtio-blk.c:282:5 #6 0x5555589d5c09 in virtio_bus_stop_ioeventfd hw/virtio/virtio-bus.c:246:9 #7 0x5555589e9917 in virtio_pci_stop_ioeventfd hw/virtio/virtio-pci.c:287:5 #8 0x5555589e22bf in virtio_pci_vmstate_change hw/virtio/virtio-pci.c:1072:9 #9 0x555557628931 in virtio_vmstate_change hw/virtio/virtio.c:2257:9 #10 0x555557c36713 in vm_state_notify vl.c:1605:9 #11 0x55555716ef53 in do_vm_stop cpus.c:1074:9 #12 0x55555716eeff in vm_shutdown cpus.c:1092:12 #13 0x555557c4283e in main vl.c:4617:5 #14 0x7fffdfdb482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) #15 0x555556ecb118 in _start (x86_64-softmmu/qemu-system-x86_64+0x1977118) 0x61900001d210 is located 144 bytes inside of 952-byte region [0x61900001d180,0x61900001d538) freed by thread T6 (live_migration) here: #0 0x555556f76782 in __interceptor_free /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:124:3 #1 0x555558d5fa94 in object_finalize qom/object.c:618:9 #2 0x555558d57651 in object_unref qom/object.c:1068:9 #3 0x555558a55588 in migration_thread migration/migration.c:3272:5 #4 0x5555595393f2 in qemu_thread_start util/qemu-thread-posix.c:502:9 #5 0x7fffe057f6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9) previously allocated by thread T0 (qemu-vm-0) here: #0 0x555556f76b03 in __interceptor_malloc /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:146:3 #1 0x7ffff6ee37b8 in g_malloc (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4f7b8) #2 0x555558d58031 in object_new qom/object.c:640:12 #3 0x555558a31f21 in migration_object_init migration/migration.c:139:25 #4 0x555557c41398 in main vl.c:4320:5 #5 0x7fffdfdb482f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f) Thread T6 (live_migration) created by T0 (qemu-vm-0) here: #0 0x555556f5f0dd in pthread_create /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_interceptors.cc:210:3 #1 0x555559538cf9 in qemu_thread_create util/qemu-thread-posix.c:539:11 #2 0x555558a53304 in migrate_fd_connect migration/migration.c:3332:5 #3 0x555558a72bd8 in migration_channel_connect migration/channel.c:92:5 #4 0x555558a6ef87 in exec_start_outgoing_migration migration/exec.c:42:5 #5 0x555558a4f3c2 in qmp_migrate migration/migration.c:1922:9 #6 0x555558bb4f6a in qmp_marshal_migrate qapi/qapi-commands-migration.c:607:5 #7 0x555559363738 in do_qmp_dispatch qapi/qmp-dispatch.c:131:5 #8 0x555559362a15 in qmp_dispatch qapi/qmp-dispatch.c:174:11 #9 0x5555571bac15 in monitor_qmp_dispatch monitor.c:4124:11 #10 0x55555719a22d in monitor_qmp_bh_dispatcher monitor.c:4207:9 #11 0x5555594fde0a in aio_bh_call util/async.c:90:5 #12 0x5555594fe522 in aio_bh_poll util/async.c:118:13 #13 0x5555595201e0 in aio_dispatch util/aio-posix.c:460:5 #14 0x555559503553 in aio_ctx_dispatch util/async.c:261:5 #15 0x7ffff6ede196 in g_main_context_dispatch (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x4a196) SUMMARY: AddressSanitizer: heap-use-after-free migration/migration.c:1502:23 in migrate_fd_cleanup Shadow bytes around the buggy address: 0x0c327fffb9f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c327fffba00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c327fffba10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c327fffba20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa 0x0c327fffba30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd =>0x0c327fffba40: fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c327fffba50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c327fffba60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c327fffba70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c327fffba80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x0c327fffba90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==31958==ABORTING Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190408113343.2370-1-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up comment formatting	2019-05-14 18:59:54 +01:00
Wei Yang	16015d32e4	migration/savevm: wrap into qemu_loadvm_state_header() On source side, we have qemu_savevm_state_header() to send related data, while on the receiving side those steps are scattered in qemu_loadvm_state(). This patch wrap those related steps into qemu_loadvm_state_header() to make it friendly to read. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190424004700.12766-5-richardw.yang@linux.intel.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	9e14b84908	migration/savevm: load_header before load_setup In migration_thread() and qemu_savevm_state(), we savevm_state in following sequence: qemu_savevm_state_header(f); qemu_savevm_state_setup(f); Then it would be more proper to loadvm_state in the save sequence. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190424004700.12766-4-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	5351e69af8	migration/savevm: remove duplicate check of migration_is_blocked Current call flow of save_snapshot is: save_snapshot migration_is_blocked qemu_savevm_state migration_is_blocked Since qemu_savevm_state is only called in save_snapshot, this means migration_is_blocked has been already checked. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190424004700.12766-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Yi Wang	4633456ced	migration: update comments of migration bitmap Since the ram bitmap and the unsent bitmap are split by RAMBlock in commit `6b6712e`, it's better to update the comments about them. Signed-off-by: Yi Wang <wang.yi59@zte.com.cn> Message-Id: <1555311089-18610-1-git-send-email-wang.yi59@zte.com.cn> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	bf21297923	migration/ram.c: start of migration_bitmap_sync_range is always 0 We can eliminate to pass 0. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190430034412.12935-2-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Zhang Chen	c0913d1dfd	migration/colo.c: Remove redundant input parameter The colo_do_failover no need the input parameter. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20190426090730.2691-2-chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Cole Robinson	aded9dfa74	migration: savevm: fix error code with migration blockers The only caller that checks the error code is looking for != 0, so returning false is incorrect. Fixes: `5aaac46793` "migration: savevm: consult migration blockers" Signed-off-by: Cole Robinson <crobinso@redhat.com> Message-Id: <b991a4d0e6c4253bc08b2794c6084be55fc72e1d.1554851834.git.crobinso@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	f2dd7eddf2	vmstate: check subsection_found is enough subsection_found is true implies vmdesc is not NULL. This patch remove the additional check on vmdesc and rename subsection_found to vmdesc_has_subsections to make it more self-explain. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190403011016.12549-1-richardw.yang@linux.intel.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	15d2d64cf5	migration: remove not used field xfer_limit MigrationState->xfer_limit is only set to 0 in migrate_init(). Remove this unnecessary field. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190326055726.10539-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Wei Yang	a94cd7b8ab	migration: not necessary to check ops again During each iteration, se->ops is checked before each loop. So it is not necessary to check it again and simplify the following check a little. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20190327013130.26259-1-richardw.yang@linux.intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-05-14 17:33:35 +01:00
Peter Maydell	f151f8aca5	migration/ram.c: Fix use-after-free in multifd_recv_unfill_packet() Coverity points out (CID 1400442) that in this code: if (packet->pages_alloc > p->pages->allocated) { multifd_pages_clear(p->pages); multifd_pages_init(packet->pages_alloc); } we free p->pages in multifd_pages_clear() but continue to use it in the following code. We also leak memory, because multifd_pages_init() returns the pointer to a new MultiFDPages_t struct but we are ignoring its return value. Fix both of these bugs by adding the missing assignment of the newly created struct to p->pages. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20190409151830.6024-1-peter.maydell@linaro.org Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-04-09 20:46:34 +01:00
Zhang Chen	c6e5bafb6f	migration/ram.c: Fix codes conflict about bitmap_mutex I found upstream codes conflict with COLO and lead to crash, and I located to this patch: commit `386a907b37` Author: Wei Wang <wei.w.wang@intel.com> Date: Tue Dec 11 16:24:49 2018 +0800 migration: use bitmap_mutex in migration_bitmap_clear_dirty My colleague Wei's patch add bitmap_mutex in migration_bitmap_clear_dirty, but COLO didn't initialize the bitmap_mutex. So we always get an error when COLO start up. like that: qemu-system-x86_64: util/qemu-thread-posix.c:64: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. This patch add the bitmap_mutex initialize and destroy in COLO lifecycle. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20190329222951.28945-1-chen.zhang@intel.com> Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-04-05 15:29:48 +01:00
Markus Armbruster	daff7f0bbe	migration: Support adding migration blockers earlier migrate_add_blocker() asserts we have a current_migration object, in migrate_get_current(). We do only after migration_object_init(). This contributes to the following dependency cycle: * configure_blockdev() must run before machine_set_property() so machine properties can refer to block backends * machine_set_property() before configure_accelerator() so machine properties like kvm-irqchip get applied * configure_accelerator() before migration_object_init() so that Xen's accelerator compat properties get applied. * migration_object_init() before configure_blockdev() so configure_blockdev() can add migration blockers The cycle was closed when recent commit `cda4aa9a5a` "Create block backends before setting machine properties" added the first dependency, and satisfied it by violating the last one. Broke block backends that add migration blockers, as demonstrated by qemu-iotests 055. To fix it, break the last dependency: make migrate_add_blocker() usable before migration_object_init(). The previous commit already removed the use of migrate_get_current() from migrate_add_blocker() itself. Didn't quite do the trick, as there's another one hiding in migration_is_idle(). The use there isn't actually necessary: when no migration object has been created yet, migration is surely idle. Make migration_is_idle() return true then. Fixes: `cda4aa9a5a` Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190401090827.20793-4-armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2019-04-02 13:49:36 +02:00
Markus Armbruster	811f865271	Revert "migration: move only_migratable to MigrationState" This reverts commit `3df663e575`. This reverts commit `b605c47b57`. Command line option --only-migratable is for disallowing any configuration that can block migration. Initially, --only-migratable set global variable @only_migratable. Commit `3df663e575` "migration: move only_migratable to MigrationState" replaced it by MigrationState member @only_migratable. That was a mistake. First, it doesn't make sense on the design level. MigrationState captures the state of an individual migration, but --only-migratable isn't a property of an individual migration, it's a restriction on QEMU configuration. With fault tolerance, we could have several migrations at once. --only-migratable would certainly protect all of them. Storing it in MigrationState feels inappropriate. Second, it contributes to a dependency cycle that manifests itself as a bug now. Putting @only_migratable into MigrationState means its available only after migration_object_init(). We can't set it before migration_object_init(), so we delay setting it with a global property (this is fixup commit `b605c47b57` "migration: fix handling for --only-migratable"). We can't get it before migration_object_init(), so anything that uses it can only run afterwards. Since migrate_add_blocker() needs to obey --only-migratable, any code adding migration blockers can run only afterwards. This contributes to the following dependency cycle: * configure_blockdev() must run before machine_set_property() so machine properties can refer to block backends * machine_set_property() before configure_accelerator() so machine properties like kvm-irqchip get applied * configure_accelerator() before migration_object_init() so that Xen's accelerator compat properties get applied. * migration_object_init() before configure_blockdev() so configure_blockdev() can add migration blockers The cycle was closed when recent commit `cda4aa9a5a` "Create block backends before setting machine properties" added the first dependency, and satisfied it by violating the last one. Broke block backends that add migration blockers. Moving @only_migratable into MigrationState was a mistake. Revert it. This doesn't quite break the "migration_object_init() before configure_blockdev() dependency, since migrate_add_blocker() still has another dependency on migration_object_init(). To be addressed the next commit. Note that the reverted commit made -only-migratable sugar for -global migration.only-migratable=on below the hood. Documentation has only ever mentioned -only-migratable. This commit removes the arcane & undocumented alternative to -only-migratable again. Nobody should be using it. Conflicts: include/migration/misc.h migration/migration.c migration/migration.h vl.c Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190401090827.20793-3-armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>	2019-04-02 13:38:05 +02:00
Peter Maydell	7e9a2137ce	Pull request - Rebase last pull request - Drop multifd - several other minor fixesLaLaLa -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJcmRP1AAoJEPSH7xhYctcjVDcP/iZoCgMDn0BVzYFamRAIvtlh 1h1ElV+Jx49bHRvDRs0RaTSIlowqnbMY5yiTfn0L7aSbOr8KLEbs+i+jo5moF3+Q 50TNxGTDF/WWvl+z8X3WljwDPYBnG7mYeDBNBk+8V2RI/DvV2uAdm29VPmPN/Kc8 hW8S6kXRAQekkkt0BOkXHXWQlmvzHS9RqQoZ0dETP9GqcT7cJ6HDZJu8akiz6Oz3 r0Hek41EVQirjfKL+Sm5BluiiuvNcdFGsYK/TqLiCpnHolNUboMnIhXiTX2BJRf7 TEK8UGrbgXa3SarszCBxjsjMFYRJlq6Vi7ZQ54Ly7+wFr09jhIDgt9AlEr0YjOj4 8AgGF6nKYmFahQuKvJ1xMrgY3EccBDWXJKBwcnnd5zMJyVGlNtUUs7f7pSA3V/oG wEDMzmxcpKxK3A9jpPBgEN4ev0oKaR+rxAdy5NPTU7kMZV651JXt2pOirGm5AL2V soKiiSklUZ7VpJ998PnGj7pO4LL8xWW3Pi4mzlH6dv+Aw9T2L9vY8rPFEktOJ4V5 8qB9PERlAG/KbpVH2lrkUFFk4sfxBmVTG+SppwCk4I6/eSaDuO3pjXcuwiFaIyqT kHLsBVT0kLEYeE6zty2YHvjIEmAyaJxr2HezWquQ9xQOezDl1s3wjVGRFJ6xZDKn uMHI4j2i5UWA8B73inh0 =kjyq -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration-pull-request' into staging Pull request - Rebase last pull request - Drop multifd - several other minor fixesLaLaLa # gpg: Signature made Mon 25 Mar 2019 17:46:29 GMT # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full] # gpg: aka "Juan Quintela <quintela@trasno.org>" [full] # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration-pull-request: migration/postcopy: Update the bandwidth during postcopy Migration/colo.c: Make user obtain the last COLO mode info after failover Migration/colo.c: Add the necessary checks for colo_do_failover Migration/colo.c: Add new COLOExitReason to handle all failover state Migration/colo.c: Fix COLO failover status error migration/rdma: Check qemu_rdma_init_one_block migration: add support for a "tls-authz" migration parameter multifd: Drop x- multifd: Add some padding multifd: Change default packet size multifd: Be flexible about packet size multifd: Drop x-multifd-page-count parameter multifd: Create new next_packet_size field multifd: Rename "size" member to pages_alloc multifd: Only send pages when packet are not empty Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-25 18:15:43 +00:00
Dr. David Alan Gilbert	c38c1c142e	migration/postcopy: Update the bandwidth during postcopy The recently added max-postcopy-bandwidth parameter is only read at the transition from precopy->postcopy where as the older max-bandwidth parameter updates the migration bandwidth when changed even if the migration is already running. Fix this discrepency so that: a) You can change the bandwidth during postcopy by setting max-postcopy-bandwidth b) Changing max-bandwidth during postcopy has no effect (it currently changes the postcopy bandwidth which isn't expected). Fixes: `7e555c6c` bz: https://bugzilla.redhat.com/show_bug.cgi?id=1686321 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:46:03 +01:00
Zhang Chen	5ed0deca41	Migration/colo.c: Make user obtain the last COLO mode info after failover Add the last_colo_mode to save the status after failover. This patch can solve the issue that user want to get last colo mode use query_colo_status after failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:45:46 +01:00
Zhang Chen	82cd368ccd	Migration/colo.c: Add the necessary checks for colo_do_failover Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:45:38 +01:00
Zhang Chen	3a43ac4757	Migration/colo.c: Add new COLOExitReason to handle all failover state In this patch we add the processing state for COLOExitReason, because we have to identify COLO in the failover processing state or failover error state. In the way, we can handle all the failover state. We have improved the description of the COLOExitReason by the way. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:45:30 +01:00
Zhang Chen	1fe6ab267f	Migration/colo.c: Fix COLO failover status error When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED. The origin codes misunderstand the FAILOVER_STATUS_REQUIRE. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:45:23 +01:00
Dr. David Alan Gilbert	281496bb8a	migration/rdma: Check qemu_rdma_init_one_block Actually it can't fail at the moment, but Coverity moans that it's the only place it's not checked, and it's an easy check. Reported-by: Coverity (CID 1399413) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:45:10 +01:00
Daniel P. Berrange	d2f1d29b95	migration: add support for a "tls-authz" migration parameter The QEMU instance that runs as the server for the migration data transport (ie the target QEMU) needs to be able to configure access control so it can prevent unauthorized clients initiating an incoming migration. This adds a new 'tls-authz' migration parameter that is used to provide the QOM ID of a QAuthZ subclass instance that provides the access control check. This is checked against the x509 certificate obtained during the TLS handshake. For example, when starting a QEMU for incoming migration, it is possible to give an example identity of the source QEMU that is intended to be connecting later: $QEMU \ -monitor stdio \ -incoming defer \ ...other args... (qemu) object_add tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ (qemu) object_add authz-simple,id=auth0,identity=CN=laptop.example.com,,\ O=Example Org,,L=London,,ST=London,,C=GB \ (qemu) migrate_incoming tcp:localhost:9000 Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:47 +01:00
Juan Quintela	cbfd6c957a	multifd: Drop x- We make it supported from now on. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:45 +01:00
Juan Quintela	5fbd8b4bbb	multifd: Add some padding Add some padding. MultifdInit_t is padded to 64 bytes. MultiFDPacket_t is padded to 320bytes (64 * 5). Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:44 +01:00
Juan Quintela	4b0c72645c	multifd: Change default packet size We moved from 64KB to 512KB, as it makes less locking contention without any downside in testing. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:43 +01:00
Juan Quintela	7ed379b286	multifd: Be flexible about packet size This way we can change the packet size in the future and everything will work. We choose an arbitrary big number (100 times configured size) as a limit about how big we will reallocate. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:42 +01:00
Juan Quintela	efd1a1d640	multifd: Drop x-multifd-page-count parameter Libvirt don't want to expose (and explain it). From now on we measure the number of packages in bytes instead of pages, so it is the same independently of architecture. We choose the page size of x86. Notice that in the following patch we make this variable. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:41 +01:00
Juan Quintela	2a34ee593b	multifd: Create new next_packet_size field We need to send this field when we add compression support. As we are still on x- stage, we can do this kind of changes. Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:39 +01:00
Juan Quintela	6f86269295	multifd: Rename "size" member to pages_alloc It really indicates what is the number of allocated pages for one packet. Once there rename "used" to "pages_used". Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:38 +01:00
Juan Quintela	ad24c7cb59	multifd: Only send pages when packet are not empty We send packages without pages sometimes for sysnchronizanion. The iov functions do the right thing, but we will be changing this code in future patches. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2019-03-25 18:13:37 +01:00
Markus Armbruster	dec9776049	trace-events: Fix attribution of trace points to source Some trace points are attributed to the wrong source file. Happens when we neglect to update trace-events for code motion, or add events in the wrong place, or misspell the file name. Clean up with help of cleanup-trace-events.pl. Same funnies as in the previous commit, of course. Manually shorten its change to linux-user/trace-events to */signal.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-id: 20190314180929.27722-6-armbru@redhat.com Message-Id: <20190314180929.27722-6-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Markus Armbruster	500016e5db	trace-events: Shorten file names in comments We spell out sub/dir/ in sub/dir/trace-events' comments pointing to source files. That's because when trace-events got split up, the comments were moved verbatim. Delete the sub/dir/ part from these comments. Gets rid of several misspellings. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20190314180929.27722-3-armbru@redhat.com Message-Id: <20190314180929.27722-3-armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-03-22 16:18:07 +00:00
Peter Maydell	f6c63c0dbf	* ASAN fixes -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlyHw88UHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroNL4Qf/UPunPKY/OK47evFGPG0ZMGF3IxOp OgM0MMBOPdSMaLuI+cgmI+U1+hOqw9Vf/eyyfRFZCTQXjr1BQL0exAG+KvBeLOSC h1hJmpecc0IS2D3DaXDI2SvlLr7AFAVIY2JR9lCdJW99mC6HROSeaWnjQ0XflxTM 2BSl1FDzO6bHz3OgUHM2NAPYzjpwTOq7ZnaTd20a7zE+7ef7iEJ3edRHEg+RmHtN gMwOkZw1Ip5Zn5hCjJbURZG+OMOKY4/mSqV6a9IByQ5Kws8rhb38f9wpA09C7y3S Q7Tv1XIT84sVg7B0eToQObzmkagA6NGJuNy+TleOeTemntEmzQGQ4fk6Zw== =ybUj -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * ASAN fixes # gpg: Signature made Tue 12 Mar 2019 14:35:59 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: test-migration: fix memory leak migration: fix memory leak test-bdrv-graph-mod: fix Error leak test-char: fix undefined behavior Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-03-14 12:02:12 +00:00
Eric Blake	796a3798ab	bitmaps: Fix typo in function name Commit `a88b179f` introduced the ability to set and query bitmap persistence, but with an atypical spelling. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 20190308205845.25734-1-eblake@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	3ae96d6684	block/dirty-bitmaps: add block_dirty_bitmap_check function Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:49 -04:00
John Snow	27a1b301a4	block/dirty-bitmaps: unify qmp_locked and user_locked calls These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
John Snow	50a47257f8	block/dirty-bitmaps: rename frozen predicate helper "Frozen" was a good description a long time ago, but it isn't adequate now. Rename the frozen predicate to has_successor to make the semantics of the predicate more clear to outside callers. In the process, remove some calls to frozen() that no longer semantically make sense. For bdrv_enable_dirty_bitmap_locked and bdrv_disable_dirty_bitmap_locked, it doesn't make sense to prohibit QEMU internals from performing this action when we only wished to prohibit QMP users from issuing these commands. All of the QMP API commands for bitmap manipulation already check against user_locked() to prohibit these actions. Several other assertions really want to check that the bitmap isn't in-use by another operation -- use the bitmap_user_locked function for this instead, which presently also checks for has_successor. This leaves some redundant checks of has_successor through different helpers that are addressed in forthcoming patches. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-3-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>	2019-03-12 12:05:48 -04:00
Paolo Bonzini	5e78bc6a47	migration: fix memory leak Reported by ASAN. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-03-12 15:18:40 +01:00
Marc-André Lureau	d890344166	slirp: use libslirp migration code slirp migration code uses QEMU vmstate so far, when building WITH_QEMU. Introduce slirp_state_{load,save,version}() functions to move the state saving handling to libslirp side. So far, the bitstream compatibility should remain equal with current QEMU, as this is effectively using the same code, with the same format etc. When libslirp is made standalone, we will need some mechanism to ensure bitstream compatibility regardless of the libslirp version installed. See the FIXME note in the code. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20190212162524.31504-3-marcandre.lureau@redhat.com> Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>	2019-03-07 12:46:31 +01:00
Zhang Chen	db00972922	Migration/colo.c: Make COLO node running after failover Delay to close COLO for auto start VM after failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190303145021.2962-4-chen.zhang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Zhang Chen	b8b5734b09	Migration/colo.c: Fix double close bug when occur COLO failover In migration_incoming_state_destroy(void) will check the mis->to_src_file to double close the mis->to_src_file when occur COLO failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190303145021.2962-2-chen.zhang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Wei Wang	6eeb63f740	migration/ram.c: add the free page optimization enable flag This patch adds the free page optimization enable flag, and a function to set this flag. When the free page optimization is enabled, not all the pages are needed to be sent in the bulk stage. Why using a new flag, instead of directly disabling ram_bulk_stage when the optimization is running? Thanks for Peter Xu's reminder that disabling ram_bulk_stage will affect the use of compression. Please see save_page_use_compression. When xbzrle and compression are used, if free page optimizaion causes the ram_bulk_stage to be disabled, save_page_use_compression will return false, which disables the use of compression. That is, if free page optimization avoids the sending of half of the guest pages, the other half of pages loses the benefits of compression in the meantime. Using a new flag to let migration_bitmap_find_dirty skip the free pages in the bulk stage will avoid the above issue. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-7-git-send-email-wei.w.wang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Wei Wang	bd2270608f	migration/ram.c: add a notifier chain for precopy This patch adds a notifier chain for the memory precopy. This enables various precopy optimizations to be invoked at specific places. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-6-git-send-email-wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Wei Wang	6bcb05fc42	migration: API to clear bits of guest free pages from the dirty bitmap This patch adds an API to clear bits corresponding to guest free pages from the dirty bitmap. Spilt the free page block if it crosses the QEMU RAMBlock boundary. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-5-git-send-email-wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Wei Wang	386a907b37	migration: use bitmap_mutex in migration_bitmap_clear_dirty The bitmap mutex is used to synchronize threads to update the dirty bitmap and the migration_dirty_pages counter. For example, the free page optimization clears bits of free pages from the bitmap in an iothread context. This patch makes migration_bitmap_clear_dirty update the bitmap and counter under the mutex. Signed-off-by: Wei Wang <wei.w.wang@intel.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1544516693-5395-4-git-send-email-wei.w.wang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:18 +00:00
Juan Quintela	9aca82ba31	migration: Create socket-address parameter It will be used to store the uri parameters. We want this only for tcp, so we don't set it for other uris. We need it to know what port is migration running. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Removed DummyStruct as suggested by Eric & Markus --	2019-03-06 10:49:17 +00:00
Yury Kotov	6cafc8e4dd	migration: Add capabilities validation Currently we don't check which capabilities set in the source QEMU. We just expect that the target QEMU has the same enabled capabilities. Add explicit validation for capabilities to make sure that the target VM has them too. This is enabled for only new capabilities to keep compatibily. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190215174548.2630-6-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Manual merge	2019-03-06 10:49:17 +00:00
Yury Kotov	fbd162e629	migration: Add an ability to ignore shared RAM blocks If ignore-shared capability is set then skip shared RAMBlocks during the RAM migration. Also, move qemu_ram_foreach_migratable_block (and rename) to the migration code, because it requires access to the migration capabilities. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:17 +00:00
Yury Kotov	18269069c3	migration: Introduce ignore-shared capability We want to use local migration to update QEMU for running guests. In this case we don't need to migrate shared (file backed) RAM. So, add a capability to ignore such blocks during live migration. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190215174548.2630-3-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:17 +00:00
Yury Kotov	754cb9c0eb	exec: Change RAMBlockIterFunc definition Currently, qemu_ram_foreach_* calls RAMBlockIterFunc with many block-specific arguments. But often iter func needs RAMBlock. This refactoring is needed for fast access to RAMBlock flags from qemu_ram_foreach_block's callback. The only way to achieve this now is to call qemu_ram_block_from_host (which also enumerates blocks). So, this patch reduces complexity of qemu_ram_foreach_block() -> cb() -> qemu_ram_block_from_host() from O(n^2) to O(n). Fix RAMBlockIterFunc definition and add some functions to read RAMBlock fields witch were passed. Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Message-Id: <20190215174548.2630-2-yury-kotov@yandex-team.ru> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:17 +00:00
Marcel Apfelbaum	9589e76301	migration/rdma: clang compilation fix Configuring QEMU with: ../configure --cc=clang --enable-rdma Leads to compilation error: CC migration/rdma.o CC migration/block.o qemu/migration/rdma.c:3615:58: error: taking address of packed member 'rkey' of class or structure 'RDMARegisterResult' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member] (uintptr_t)host_addr, NULL, &reg_result->rkey, ^~~~~~~~~~~~~~~~ Fix it by using a temp local variable. Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Message-Id: <20190304184923.24215-1-marcel.apfelbaum@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert	892ae715b6	migration: Cleanup during exit Currently we cleanup the migration object as we exit main after the main_loop finishes; however if there's a migration running things get messy and we can end up with the migration thread still trying to access freed structures. We now take a ref to the object around the migration thread itself, so the act of dropping the ref during exit doesn't cause us to lose the state until the thread quits. Cancelling the migration during migration also tries to get the thread to quit. We do this a bit earlier; so hopefully migration gets out of the way before all the devices etc are freed. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20190227164900.16378-1-dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert	cf75e26849	migration/rdma: Fix qemu_rdma_cleanup null check If the migration fails before the channel is open (e.g. a bad address) we end up in the cleanup with rdma->channel==NULL. Spotted by Coverity: CID 1398634 Fixes: `fbbaacab27` Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190214185351.5927-1-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert	c3c5eae6ac	migration: Fix cancel state During a cancelled migration there's a race where the fd can go into an error state before we get back around the migration loop and migration_detect_error transitions from cancelling->failed. Check for cancelled/cancelling and don't change the state. Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1608649 Fixes: `b23c2ade25` Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190219195928.12289-1-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2019-03-06 10:49:17 +00:00
Dr. David Alan Gilbert	7659505c16	migration: Switch to using announce timer Switch the announcements to using the new announce timer. Move the code that does it to announce.c rather than savevm because it really has nothing to do with the actual migration. Migration starts the announce from bh's and so they're all in the main thread/bql, and so there's never any racing with the timers themselves. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert	ee3d96baf3	migration: Add announce parameters Add migration parameters that control RARP/GARP announcement timeouts. Based on earlier patches by myself and Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2019-03-05 11:27:41 +08:00
Dr. David Alan Gilbert	50510ea2c2	net: Introduce announce timer The 'announce timer' will be used by migration, and explicit requests for qemu to perform network announces. Based on the work by Germano Veit Michel <germano@redhat.com> and Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2019-03-05 11:27:41 +08:00
Vladimir Sementsov-Ogievskiy	f556f37b11	migration/block: use qemu_iovec_init_buf Use new qemu_iovec_init_buf() instead of qemu_iovec_init_external( ... , 1), which simplifies the code. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20190218140926.333779-14-vsementsov@virtuozzo.com Message-Id: <20190218140926.333779-14-vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-02-22 09:42:13 +00:00
Xiao Guangrong	aecbfe9c64	migration: introduce pages-per-second It introduces a new statistic, pages-per-second, as bandwidth or mbps is not enough to measure the performance of posting pages out as we have compression, xbzrle, which can significantly reduce the amount of the data size, instead, pages-per-second is the one we want Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20190111063732.10484-2-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> With typo's Eric spotted fixed	2019-01-23 15:51:47 +00:00
Marc-André Lureau	de22ded044	vmstate: constify SaveVMHandlers Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181114133139.27346-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:51:47 +00:00
Dr. David Alan Gilbert	fbbaacab27	migration/rdma: unregister fd handler Unregister the fd handler before we destroy the channel, otherwise we've got a race where we might land in the fd handler just as we're closing the device. (The race is quite data dependent, you just have to have the right set of devices for it to trigger). Corresponds to RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1666601 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190122173111.29821-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:51:32 +00:00
Fei Li	6d99c2d41c	migration: unify error handling for process_incoming_migration_co In the current code, if process_incoming_migration_co() fails we do the same error handing: set the error state, close the source file, do the cleanup for multifd, and then exit(EXIT_FAILURE). To make the code clearer, add a "goto fail" to unify the error handling. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190113140849.38339-6-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Fei Li	91b02dc750	migration: add more error handling for postcopy_ram_enable_notify Call postcopy_ram_incoming_cleanup() to do the cleanup when postcopy_ram_enable_notify fails. Besides, report the error message when qemu_ram_foreach_migratable_block() fails. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190113140849.38339-5-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Fei Li	1398b2e3fe	migration: multifd_save_cleanup() can't fail, simplify multifd_save_cleanup() takes an Error ** argument and returns an error code even though it can't actually fail. Its callers dutifully check for failure. Remove the useless argument and return value, and simplify the callers. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20190113140849.38339-4-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Fei Li	49ed0d24a4	migration: fix the multifd code when receiving less channels In our current code, when multifd is used during migration, if there is an error before the destination receives all new channels, the source keeps running, however the destination does not exit but keeps waiting until the source is killed deliberately. Fix this by dumping the specific error and let users decide whether to quit from the destination side when failing to receive packet via some channel. And update the comment for multifd_recv_new_channel(). Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Fei Li <fli@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20190113140849.38339-3-lifei1214@126.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2019-01-23 15:02:07 +00:00
Aaron Lindsay	8c07559fc7	migration: Add post_save function to VMStateDescription In some cases it may be helpful to modify state before saving it for migration, and then modify the state back after it has been saved. The existing pre_save function provides half of this functionality. This patch adds a post_save function to provide the second half. Signed-off-by: Aaron Lindsay <aclindsa@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20181211151945.29137-2-aaron@os.amperecomputing.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-01-21 10:38:55 +00:00
Philippe Mathieu-Daudé	a346af9c88	migration: Use strnlen() for fixed-size string GCC 8 introduced the -Wstringop-overflow, which detect buffer overflow by string-modifying functions declared in <string.h>, such strncpy(), used in global_state_store_running(). GCC indeed found an incorrect use of strlen(), because this array is loaded by VMSTATE_BUFFER(runstate, GlobalState) then parsed using qapi_enum_parse which does not get the buffer length. Use strnlen() which returns sizeof(s->runstate) if the array is not NUL-terminated, assert the size is within range, and enforce the array to be NUL-terminated to avoid an overflow in qapi_enum_parse(). This fixes: CC migration/global_state.o qemu/migration/global_state.c: In function 'global_state_pre_save': qemu/migration/global_state.c:109:15: error: 'strlen' argument 1 declared attribute 'nonstring' [-Werror=stringop-overflow=] s->size = strlen((char )s->runstate) + 1; ^~~~~~~~~~~~~~~~~~~~~~~~~~~ qemu/migration/global_state.c:24:13: note: argument 'runstate' declared here uint8_t runstate[100] QEMU_NONSTRING; ^~~~~~~~ cc1: all warnings being treated as errors make: ** [qemu/rules.mak:69: migration/global_state.o] Error 1 Suggested-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-01-17 21:10:57 -05:00
Marc-André Lureau	0a5526a18b	migration: Fix stringop-truncation warning GCC 8 added a -Wstringop-truncation warning: The -Wstringop-truncation warning added in GCC 8.0 via r254630 for bug 81117 is specifically intended to highlight likely unintended uses of the strncpy function that truncate the terminating NUL character from the source string. This new warning leads to compilation failures: CC migration/global_state.o qemu/migration/global_state.c: In function 'global_state_store_running': qemu/migration/global_state.c:45:5: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy((char )global_state.runstate, state, sizeof(global_state.runstate)); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ make: ** [qemu/rules.mak:69: migration/global_state.o] Error 1 Adding an assert is enough to silence GCC. (alternatively, we could hard-code "running") Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [PMD: More verbose commit message] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2019-01-17 21:10:57 -05:00
Paolo Bonzini	b58deb344d	qemu/queue.h: leave head structs anonymous unless necessary Most list head structs need not be given a name. In most cases the name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV or reverse iteration, but this does not apply to lists of other kinds, and even for QTAILQ in practice this is only rarely needed. In addition, we will soon reimplement those macros completely so that they do not need a name for the head struct. So clean up everything, not giving a name except in the rare case where it is necessary. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-01-11 15:46:55 +01:00
Daniel Henrique Barboza	fb06411210	qmp hmp: Make system_wakeup check wake-up support and run state The qmp/hmp command 'system_wakeup' is simply a direct call to 'qemu_system_wakeup_request' from vl.c. This function verifies if runstate is SUSPENDED and if the wake up reason is valid before proceeding. However, no error or warning is thrown if any of those pre-requirements isn't met. There is no way for the caller to differentiate between a successful wakeup or an error state caused when trying to wake up a guest that wasn't suspended. This means that system_wakeup is silently failing, which can be considered a bug. Adding error handling isn't an API break in this case - applications that didn't check the result will remain broken, the ones that check it will have a chance to deal with it. Adding to that, the commit before previous created a new QMP API called query-current-machine, with a new flag called wakeup-suspend-support, that indicates if the guest has the capability of waking up from suspended state. Although such guest will never reach SUSPENDED state and erroring it out in this scenario would suffice, it is more informative for the user to differentiate between a failure because the guest isn't suspended versus a failure because the guest does not have support for wake up at all. All this considered, this patch changes qmp_system_wakeup to check if the guest is capable of waking up from suspend, and if it is suspended. After this patch, this is the output of system_wakeup in a guest that does not have wake-up from suspend support (ppc64): (qemu) system_wakeup wake-up from suspend is not supported by this guest (qemu) And this is the output of system_wakeup in a x86 guest that has the support but isn't suspended: (qemu) system_wakeup Unable to wake up: guest is not in suspended state (qemu) Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20181205194701.17836-4-danielhb413@gmail.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-12-18 07:55:47 +01:00
Marc-André Lureau	335d10cd8e	qapi: add conditions to REPLICATION type/commands on the schema Add #if defined(CONFIG_REPLICATION) in generated code, and adjust the code accordingly. Made conditional: * xen-set-replication, query-xen-replication-status, xen-colo-do-checkpoint Before the patch, we first register the commands unconditionally in generated code (requires a stub), then conditionally unregister in qmp_unregister_commands_hack(). Afterwards, we register only when CONFIG_REPLICATION. The command fails exactly the same, with CommandNotFound. Improvement, because now query-qmp-schema is accurate, and we're one step closer to killing qmp_unregister_commands_hack(). * enum BlockdevDriver value "replication" in command blockdev-add * BlockdevOptions variant @replication and related structures. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181213123724.4866-23-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-12-14 06:52:48 +01:00
Marc-André Lureau	03fee66fde	vmstate: constify VMStateField Because they are supposed to remain const. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181114132931.22624-1-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 15:35:15 +01:00
Paolo Bonzini	5aaac46793	migration: savevm: consult migration blockers There is really no difference between live migration and savevm, except that savevm does not require bdrv_invalidate_cache to be implemented by all disks. However, it is unlikely that savevm is used with anything except qcow2 disks, so the penalty is small and worth the improvement in catching bad usage of savevm. Only one place was taking care of savevm when adding a migration blocker, and it can be removed. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-11-27 15:06:14 +01:00
Zhang Chen	7e934f5b27	migration/migration.c: Add COLO dependency checks Current COLO mode(independent disk mode) need replication module work together. Suggested by Dr. David Alan Gilbert <dgilbert@redhat.com>. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20181114190912.7242-1-chen.zhang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-11-21 11:38:12 +00:00
Zhang Chen	3ebb9c4f52	migration/colo.c: Fix compilation issue when disable replication This compilation issue will occur when user use --disable-replication to config Qemu. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Message-Id: <20181101021226.6353-1-zhangckid@gmail.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-11-21 11:20:14 +00:00
Jia Lina	3d63da16fb	migration: avoid segmentfault when take a snapshot of a VM which being migrated During an active background migration, snapshot will trigger a segmentfault. As snapshot clears the "current_migration" struct and updates "to_dst_file" before it finds out that there is a migration task, Migration accesses the null pointer in "current_migration" struct and qemu crashes eventually. Signed-off-by: Jia Lina <jialina01@baidu.com> Signed-off-by: Chai Wen <chaiwen@baidu.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Message-Id: <20181026083620.10172-1-jialina01@baidu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-10-31 09:38:59 +00:00
Vladimir Sementsov-Ogievskiy	9c98f145df	dirty-bitmaps: clean-up bitmaps loading and migration logic This patch aims to bring the following behavior: 1. We don't load bitmaps, when started in inactive mode. It's the case of incoming migration. In this case we wait for bitmaps migration through migration channel (if 'dirty-bitmaps' capability is enabled) or for invalidation (to load bitmaps from the image). 2. We don't remove persistent bitmaps on inactivation. Instead, we only remove bitmaps after storing. This is the only way to restore bitmaps, if we decided to resume source after [failed] migration with 'dirty-bitmaps' capability enabled (which means, that bitmaps were not stored). 3. We load bitmaps on open and any invalidation, it's ok for all cases: - normal open - migration target invalidation with dirty-bitmaps capability (bitmaps are migrating through migration channel, the are not stored, so they should have IN_USE flag set and will be skipped when loading. However, it would fail if bitmaps are read-only[1]) - migration target invalidation without dirty-bitmaps capability (normal load of the bitmaps, if migrated with shared storage) - source invalidation with dirty-bitmaps capability (skip because IN_USE) - source invalidation without dirty-bitmaps capability (bitmaps were dropped, reload them) [1]: to accurately handle this, migration of read-only bitmaps is explicitly forbidden in this patch. New mechanism for not storing bitmaps when migrate with dirty-bitmaps capability is introduced: migration filed in BdrvDirtyBitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:17 -04:00
John Snow	993edc0ce0	block/dirty-bitmaps: add user_locked status checker Instead of both frozen and qmp_locked checks, wrap it into one check. frozen implies the bitmap is split in two (for backup), and shouldn't be modified. qmp_locked implies it's being used by another operation, like being exported over NBD. In both cases it means we shouldn't allow the user to modify it in any meaningful way. Replace any usages where we check both frozen and qmp_locked with the new check. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20181002230218.13949-2-jsnow@redhat.com [w/edits Suggested-By: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>] Signed-off-by: John Snow <jsnow@redhat.com>	2018-10-29 16:23:16 -04:00
Peter Maydell	13399aad4f	Error reporting patches for 2018-10-22 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJbzcCHAAoJEDhwtADrkYZT3YsP/2qE4HNY/htj3IP6vNJuSaqw CLPRTz7zWmUBTE6FqSkvLsq3X2BMFFLeaIPA9EFcbyn2km6qPqBYgg9ElXXvPZBm 6hDeRIoC8FdRD0Apozd5MGC94/lE47PheDRV8V+4KrGLaaMXEPxMZ0wP4AfdS5pS 6Pt2xuF7nPu1+OWVxMk0fXadGjGLEuOQQmTh3B21J5RaynQ3gtd6h7XFC/LJyOGG LC/6GyPc0h7KU83VnvrRjH/EOpu1wENgrsvWsS0sem8op35Z+i9jU5BfCp4qFkDy gCHHUEyEeyexS+W+Tj87eBtK2gfrqQx9ovo8CIsWcUwpKbdD6AMK4FKGsDNMNHab Kg5u/M+O8nHCB7DuursF+3mqEbZHb05cfKe6JEtiq49EuORMV5hp4Ap966noSwTw UEU0NJNA1p8EdmXVudyyyYR7wpoSSmZpoenA+bJ3nthK8K0KcU4RUGk6ZEbxfJy+ 7ENl+3R2IxmxzgXv/x0tz0uFisaVW1rltTXtMte+ElQsO0qy74iHdfR7JHsmLxj9 CO/ABMVoYsWq2OJv8pWLrdKpT4v3HQLJdHhknyu0ZcJGDyICqX29ULLEhPrNEZvW rxVxAkiemlaqxlUjbrM46CDQQm+w03OCnk7aCYcV4oK+u5+o3mCag705gMPErapZ 6uOE3fAjiWw43sA31mek =kPZX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-10-22' into staging Error reporting patches for 2018-10-22 # gpg: Signature made Mon 22 Oct 2018 13:20:23 BST # gpg: using RSA key 3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-error-2018-10-22: (40 commits) error: Drop bogus "use error_setg() instead" admonitions vpc: Fail open on bad header checksum block: Clean up bdrv_img_create()'s error reporting vl: Simplify call of parse_name() vl: Fix exit status for -drive format=help blockdev: Convert drive_new() to Error vl: Assert drive_new() does not fail in default_drive() fsdev: Clean up error reporting in qemu_fsdev_add() spice: Clean up error reporting in add_channel() tpm: Clean up error reporting in tpm_init_tpmdev() numa: Clean up error reporting in parse_numa() vnc: Clean up error reporting in vnc_init_func() ui: Convert vnc_display_init(), init_keyboard_layout() to Error ui/keymaps: Fix handling of erroneous include files vl: Clean up error reporting in device_init_func() vl: Clean up error reporting in parse_fw_cfg() vl: Clean up error reporting in mon_init_func() vl: Clean up error reporting in machine_set_property() vl: Clean up error reporting in chardev_init_func() qom: Clean up error reporting in user_creatable_add_opts_foreach() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-10-23 17:20:23 +01:00
Markus Armbruster	4dd32b3dda	migration: Fix !replay_can_snapshot() error handling Calling error_report() in a function that takes an Error ** argument is suspicious. save_snapshot() and load_snapshot() do that, and then fail without setting an error. Wrong. The HMP commands survive this unscathed, since hmp_handle_error() does nothing when no error has been set. Callers main() (on behalf of -loadvm) and replay_vmstate_init() crash, but I'm not sure the error is possible there. Screwed up when commit `377b21ccea` (v2.12.0) added incorrect error handling right next to correct examples. Fix by calling error_setg() instead of error_report(). Fixes: `377b21ccea` Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20181017082702.5581-13-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
Markus Armbruster	4b5766488f	error: Fix use of error_prepend() with &error_fatal, &error_abort From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>	2018-10-19 14:51:34 +02:00
zhanghailiang	2518aec192	COLO: quick failover process by kick COLO thread COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem), while failover works begin, It's better to wakeup it to quick the process. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
zhanghailiang	7b3435309d	COLO: notify net filters about checkpoint/failover event Notify all net filters about the checkpoint and failover event. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
zhanghailiang	d1955d2219	COLO: flush host dirty ram from cache Don't need to flush all VM's ram from cache, only flush the dirty pages since last checkpoint Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	3f6df99d9d	savevm: split the process of different stages for loadvm/savevm There are several stages during loadvm/savevm process. In different stage, migration incoming processes different types of sections. We want to control these stages more accuracy, it will benefit COLO performance, we don't have to save type of QEMU_VM_SECTION_START sections everytime while do checkpoint, besides, we want to separate the process of saving/loading memory and devices state. So we add three new helper functions: qemu_load_device_state() and qemu_savevm_live_state() to achieve different process during migration. Besides, we make qemu_loadvm_state_main() and qemu_save_device_state() public, and simplify the codes of qemu_save_device_state() by calling the wrapper qemu_savevm_state_header(). Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	f56c0065b8	qapi: Add new command to query colo status Libvirt or other high level software can use this command query colo status. You can test this command like that: {'execute':'query-colo-status'} Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	41b6b77921	qapi/migration.json: Rename COLO unknown mode to none mode. Suggested by Markus Armbruster rename COLO unknown mode to none mode. Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
zhanghailiang	9ecff6d66e	qmp event: Add COLO_EXIT event to notify users while exited COLO If some errors happen during VM's COLO FT stage, it's important to notify the users of this event. Together with 'x-colo-lost-heartbeat', Users can intervene in COLO's failover work immediately. If users don't want to get involved in COLO's failover verdict, it is still necessary to notify users that we exited COLO mode. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	e6f4aa188c	COLO: Flush memory data from ram cache During the time of VM's running, PVM may dirty some pages, we will transfer PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint time. So, the content of SVM's RAM cache will always be same with PVM's memory after checkpoint. Instead of flushing all content of PVM's RAM cache into SVM's MEMORY, we do this in a more efficient way: Only flush any page that dirtied by PVM since last checkpoint. In this way, we can ensure SVM's memory same with PVM's. Besides, we must ensure flush RAM cache before load device state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	7d9acafa2c	ram/COLO: Record the dirty pages that SVM received We record the address of the dirty pages that received, it will help flushing pages that cached into SVM. Here, it is a trick, we record dirty pages by re-using migration dirty bitmap. In the later patch, we will start the dirty log for SVM, just like migration, in this way, we can record both the dirty pages caused by PVM and SVM, we only flush those dirty pages from RAM cache while do checkpoint. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	13af18f222	COLO: Load dirty pages into SVM's RAM cache firstly We should not load PVM's state directly into SVM, because there maybe some errors happen when SVM is receving data, which will break SVM. We need to ensure receving all data before load the state into SVM. We use an extra memory to cache these data (PVM's ram). The ram cache in secondary side is initially the same as SVM/PVM's memory. And in the process of checkpoint, we cache the dirty pages of PVM into this ram cache firstly, so this ram cache always the same as PVM's memory at every checkpoint, then we flush this cached ram to SVM after we receive all PVM's state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	aad555c229	COLO: Remove colo_state migration struct We need to know if migration is going into COLO state for incoming side before start normal migration. Instead by using the VMStateDescription to send colo_state from source side to destination side, we use MIG_CMD_ENABLE_COLO to indicate whether COLO is enabled or not. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	8e48ac9586	COLO: Add block replication into colo process Make sure master start block replication after slave's block replication started. Besides, we need to activate VM's blocks before goes into COLO state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Zhang Chen	131b2153fc	COLO: integrate colo compare with colo frame For COLO FT, both the PVM and SVM run at the same time, only sync the state while it needs. So here, let SVM runs while not doing checkpoint, change DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100. Besides, we forgot to release colo_checkpoint_semd and colo_delay_timer, fix them here. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>	2018-10-19 11:15:03 +08:00
Ilya Maximets	55d0fe8254	migration: Stop postcopy fault thread before notifying POSTCOPY_NOTIFY_INBOUND_END handlers will remove userfault fds from the postcopy_remote_fds array which could be still in use by the fault thread. Let's stop the thread before notification to avoid possible accessing wrong memory. Fixes: `46343570c0` ("vhost+postcopy: Wire up POSTCOPY_END notify") Cc: qemu-stable@nongnu.org Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Message-Id: <20181008160536.6332-2-i.maximets@samsung.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-10-11 19:58:26 +01:00
Peter Maydell	341ba0df4c	migration/ram.c: Avoid taking address of fields in packed MultiFDInit_t struct Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this: migration/ram.c:651:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:652:19: warning: taking address of packed member 'version' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:737:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:745:19: warning: taking address of packed member 'version' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] migration/ram.c:755:19: warning: taking address of packed member 'size' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member] Avoid the bug by not using the "modify in place" byteswapping functions. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180925161924.7832-1-peter.maydell@linaro.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Fei Li	05306935b1	migration: fix the compression code Add judgement in compress_threads_save_cleanup() to check whether the static CompressParam *comp_param has been allocated. If not, just return; or else segmentation fault will occur when using the NULL comp_param's parameters. One test case can reproduce this is: set the compression on and migrate to a wrong nonexistent host IP address. Our current code does not judge before handling comp_param[idx]'s quit and cond that whether they have been initialized. If not initialized, "qemu_mutex_lock_impl: Assertion `mutex->initialized' failed." will occur. Fix this by squashing the terminate_compression_threads() into compress_threads_save_cleanup() and employing the existing judgement condition. One test case can reproduce this error is: set the compression on and fail to fully setup the default eight compression thread in compress_threads_save_setup(). Signed-off-by: Fei Li <fli@suse.com> Message-Id: <20180925091440.18910-1-fli@suse.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Marc-André Lureau	0284a2a81c	migration: fix QEMUFile leak Spotted by ASAN while running: $ tests/migration-test -p /x86_64/migration/postcopy/recovery ================================================================= ==18034==ERROR: LeakSanitizer: detected memory leaks Direct leak of 33864 byte(s) in 1 object(s) allocated from: #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50) #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d) #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183 #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159 #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78 #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295 #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111 #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91 #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108 #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158 #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89 #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Dr. David Alan Gilbert	096c83b721	migration: cleanup in error paths in loadvm There's a couple of error paths in qemu_loadvm_state which happen early on but after we've initialised the load state; that needs to be cleaned up otherwise we can hit asserts if the state gets reinitialised later. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180914170430.54271-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Dr. David Alan Gilbert	9cf4bb8730	migration/postcopy: Clear have_listen_thread Clear have_listen_thread when we exit the thread. The fallout from this was that various things thought there was an ongoing postcopy after the postcopy had finished. The case that failed was postcopy->savevm->loadvm. This corresponds to RH bug https://bugzilla.redhat.com/show_bug.cgi?id=1608765 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180914170430.54271-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 17:29:01 +01:00
Xiao Guangrong	32b054954f	migration: use save_page_use_compression in flush_compressed_data It avoids to touch compression locks if xbzrle and compression are both enabled Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180906070101.27280-4-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:27:43 +01:00
Xiao Guangrong	76e030004f	migration: show the statistics of compression Currently, it includes: pages: amount of pages compressed and transferred to the target VM busy: amount of count that no free thread to compress data busy-rate: rate of thread busy compressed-size: amount of bytes after compression compression-rate: rate of compressed size Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180906070101.27280-3-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:27:27 +01:00
Xiao Guangrong	48df9d8002	migration: do not flush_compressed_data at the end of iteration flush_compressed_data() needs to wait all compression threads to finish their work, after that all threads are free until the migration feeds new request to them, reducing its call can improve the throughput and use CPU resource more effectively We do not need to flush all threads at the end of iteration, the data can be kept locally until the memory block is changed or memory migration starts over in that case we will meet a dirtied page which may still exists in compression threads's ring Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180906070101.27280-2-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:26:58 +01:00
Jose Ricardo Ziviani	827beacb47	Add a hint message to loadvm and exits on failure This patch adds a small hint for the failure case of the load snapshot process. It may be useful for users to remember that the VM configuration has changed between the save and load processes. (qemu) loadvm vm-20180903083641 Unknown savevm section or instance 'cpu_common' 4. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices Error -22 while loading VM state (qemu) device_add host-spapr-cpu-core,core-id=4 (qemu) loadvm vm-20180903083641 (qemu) c (qemu) info status VM status: running It also exits Qemu if the snapshot cannot be loaded before reaching the main loop (-loadvm in the command line). $ qemu-system-ppc64 ... -loadvm vm-20180903083641 qemu-system-ppc64: Unknown savevm section or instance 'cpu_common' 4. Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices qemu-system-ppc64: Error -22 while loading VM state $ Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com> Message-Id: <20180903162613.15877-1-joserz@linux.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:26:38 +01:00
Xiao Guangrong	e8f3735fa3	migration: handle the error condition properly ram_find_and_save_block() can return negative if any error hanppens, however, it is completely ignored in current code Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180903092644.25812-5-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:22:21 +01:00
Xiao Guangrong	be8b02edae	migration: fix calculating xbzrle_counters.cache_miss_rate As Peter pointed out: \| - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's \| per-guest-page granularity \| \| - RAMState.iterations is done for each ram_find_and_save_block(), so \| it's per-host-page granularity \| \| An example is that when we migrate a 2M huge page in the guest, we \| will only increase the RAMState.iterations by 1 (since \| ram_find_and_save_block() will be called once), but we might increase \| xbzrle_counters.cache_miss for 2M/4K=512 times (we'll call \| save_xbzrle_page() that many times) if all the pages got cache miss. \| Then IMHO the cache miss rate will be 512/1=51200% (while it should \| actually be just 100% cache miss). And he also suggested as xbzrle_counters.cache_miss_rate is the only user of rs->iterations we can adapt it to count target guest page numbers After that, rename 'iterations' to 'target_page_count' to better reflect its meaning Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180903092644.25812-3-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:21:56 +01:00
Dr. David Alan Gilbert	449f91b2c8	migration/rdma: Fix uninitialised rdma_return_path Clang correctly errors out moaning that rdma_return_path is used uninitialised in the earlier error paths. Make it NULL so that the error path ignores it. Fixes: `55cc1b5937` Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reported-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20180830173657.22939-1-dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-09-26 12:21:33 +01:00
Peter Xu	3ab72385b2	qapi: Drop qapi_event_send_FOO()'s Error argument The generated qapi_event_send_FOO() take an Error argument. They can't actually fail, because all they do with the argument is passing it to functions that can't fail: the QObject output visitor, and the @qmp_emit callback, which is either monitor_qapi_event_queue() or event_test_emit(). Drop the argument, and pass &error_abort to the QObject output visitor and @qmp_emit instead. Suggested-by: Eric Blake <eblake@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180815133747.25032-4-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message rewritten, update to qapi-code-gen.txt corrected] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-08-28 18:21:38 +02:00
Peter Maydell	17182bb47f	VFIO fixes 2018-08-23 - Fix coverity reported issue with use of realpath (Alex Williamson) - Cleanup file descriptor in error path (Alex Williamson) - Fix postcopy use of new balloon inhibitor (Alex Williamson) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJbfuTxAAoJECObm247sIsiO0sP/RJKVP52D5ZXChdv/Fxwo2N/ 5UulKRDVu1RAGKPFxww2JHN3hoo4Jz3keGSnK4l300pMBbxRLZsB8189IIb95ejH EzRFM73icK2//2zunFnv7V8CjcBS9Ol2ST6OdGmHk+kp5XFEb0wsysPn7lL+b0Uv +Ijext7xgPqqc6Lz7WiKcjcducD6Rm8ReIItvYw0S7ePVGviK8b/lM2BtfQOZY0W hlkp89za5l4fZigpHFy3XM9v1LEbtAUwnIWh+iHMelRU91og07cvnMuLINdaXyy9 lF52REiPL1jOTRQZMOuUUcuXBAHgeHtoVd837TRuZTI2kcresSC3b3SVgu7zG0To Z48fVIaFxmwjAcwJmzckash2Jhk3CfSE5HJvX1CODtWbSLh+o28MekxGGMPWpYM/ XJK8kTY7atace72j86f05INE4jPwQ3okak1jLb7FXz2LfRjpplPKeLEzSnhjm0dg 63e0c88D1eNTEad6H1WYrq9WZ7yjgEgu3jDBTtah+ZKKHPltUbOkc9j+ZkHWFsFK VIHNAD1X8TJCqj5vxSEAgiqT+XxIn8LfTrfCoeORJsoEfGJaRPTKMkdEGq7M1/k4 XMVJY7uK5C1p7loaUcqKs2J3RkIKuE4HvDWLXkeeAFHc1s9wtOzEBoKucREGt0UR V7L6J9paWJA4YQMy5SLw =aRcH -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20180823.1' into staging VFIO fixes 2018-08-23 - Fix coverity reported issue with use of realpath (Alex Williamson) - Cleanup file descriptor in error path (Alex Williamson) - Fix postcopy use of new balloon inhibitor (Alex Williamson) # gpg: Signature made Thu 23 Aug 2018 17:46:41 BST # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" # gpg: aka "Alex Williamson <alex@shazbot.org>" # gpg: aka "Alex Williamson <alwillia@redhat.com>" # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-fixes-20180823.1: postcopy: Synchronize usage of the balloon inhibitor vfio/pci: Fix failure to close file descriptor on error vfio/pci: Handle subsystem realpath() returning NULL Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-08-25 10:59:06 +01:00
Alex Williamson	154304cd6e	postcopy: Synchronize usage of the balloon inhibitor While the qemu_balloon_inhibit() interface appears rather general purpose, postcopy uses it in a last-caller-wins approach with no guarantee of balanced inhibits and de-inhibits. Wrap postcopy's usage of the inhibitor to give it one vote overall, using the same last-caller-wins approach as previously implemented at the balloon level. Fixes: `01ccbec7bd` ("balloon: Allow multiple inhibit users") Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>	2018-08-23 10:45:58 -06:00
Xiao Guangrong	ae526e32bd	migration: hold the lock only if it is really needed Try to hold src_page_req_mutex only if the queue is not empty Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	5e5fdcff28	migration: move handle of zero page to the thread Detecting zero page is not a light work, moving it to the thread to speed the main thread up, btw, handling ram_release_pages() for the zero page is moved to the thread as well Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6ef3771c0d	migration: drop the return value of do_compress_ram_page It is not used and cleans the code up a little Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:18 +02:00
Xiao Guangrong	6c97ec5f5a	migration: introduce save_zero_page_to_file It will be used by the compression threads Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:36:10 +02:00
Xiao Guangrong	980a19a929	migration: fix counting normal page for compression The compressed page is not normal page Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:21 +02:00
Xiao Guangrong	1d58872a91	migration: do not wait for free thread Instead of putting the main thread to sleep state to wait for free compression thread, we can directly post it out as normal page that reduces the latency and uses CPUs more efficiently A parameter, compress-wait-thread, is introduced, it can be enabled if the user really wants the old behavior Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:34:11 +02:00
Lidong Chen	923709896b	migration: poll the cm event for destination qemu The destination qemu only poll the comp_channel->fd in qemu_rdma_wait_comp_channel. But when source qemu disconnnect the rdma connection, the destination qemu should be notified. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:17:43 +02:00
Lidong Chen	54db882f07	migration: implement the shutdown for RDMA QIOChannel Because RDMA QIOChannel not implement shutdown function, If the to_dst_file was set error, the return path thread will wait forever. and the migration thread will wait return path thread exit. the backtrace of return path thread is: (gdb) bt #0 0x00007f372a76bb0f in ppoll () from /lib64/libc.so.6 #1 0x000000000071dc24 in qemu_poll_ns (fds=0x7ef7091d0580, nfds=2, timeout=100000000) at qemu-timer.c:325 #2 0x00000000006b2fba in qemu_rdma_wait_comp_channel (rdma=0xd424000) at migration/rdma.c:1501 #3 0x00000000006b3191 in qemu_rdma_block_for_wrid (rdma=0xd424000, wrid_requested=4000, byte_len=0x7ef7091d0640) at migration/rdma.c:1580 #4 0x00000000006b3638 in qemu_rdma_exchange_get_response (rdma=0xd424000, head=0x7ef7091d0720, expecting=3, idx=0) at migration/rdma.c:1726 #5 0x00000000006b3ad6 in qemu_rdma_exchange_recv (rdma=0xd424000, head=0x7ef7091d0720, expecting=3) at migration/rdma.c:1903 #6 0x00000000006b5d03 in qemu_rdma_get_buffer (opaque=0x6a57dc0, buf=0x5c80030 "", pos=8, size=32768) at migration/rdma.c:2714 #7 0x00000000006a9635 in qemu_fill_buffer (f=0x5c80000) at migration/qemu-file.c:232 #8 0x00000000006a9ecd in qemu_peek_byte (f=0x5c80000, offset=0) at migration/qemu-file.c:502 #9 0x00000000006a9f1f in qemu_get_byte (f=0x5c80000) at migration/qemu-file.c:515 #10 0x00000000006aa162 in qemu_get_be16 (f=0x5c80000) at migration/qemu-file.c:591 #11 0x00000000006a46d3 in source_return_path_thread ( opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1331 #12 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #13 0x00007f372a77635d in clone () from /lib64/libc.so.6 the backtrace of migration thread is: (gdb) bt #0 0x00007f372aa4af57 in pthread_join () from /lib64/libpthread.so.0 #1 0x00000000007d5711 in qemu_thread_join (thread=0xd826f8 <current_migration.37100+88>) at util/qemu-thread-posix.c:504 #2 0x00000000006a4bc5 in await_return_path_close_on_source ( ms=0xd826a0 <current_migration.37100>) at migration/migration.c:1460 #3 0x00000000006a53e4 in migration_completion (s=0xd826a0 <current_migration.37100>, current_active_state=4, old_vm_running=0x7ef7089cf976, start_time=0x7ef7089cf980) at migration/migration.c:1695 #4 0x00000000006a5c54 in migration_thread (opaque=0xd826a0 <current_migration.37100>) at migration/migration.c:1837 #5 0x00007f372aa49e25 in start_thread () from /lib64/libpthread.so.0 #6 0x00007f372a77635d in clone () from /lib64/libc.so.6 Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:14:45 +02:00
Lidong Chen	d5882995a1	migration: poll the cm event while wait RDMA work request completion If the peer qemu is crashed, the qemu_rdma_wait_comp_channel function maybe loop forever. so we should also poll the cm event fd, and when receive RDMA_CM_EVENT_DISCONNECTED and RDMA_CM_EVENT_DEVICE_REMOVAL, we consider some error happened. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Signed-off-by: Gal Shachaf <galsha@mellanox.com> Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:14:19 +02:00
Lidong Chen	5d5f4d8436	migration: invoke qio_channel_yield only when qemu_in_coroutine() when qio_channel_read return QIO_CHANNEL_ERR_BLOCK, the source qemu crash. The backtrace is: (gdb) bt #0 0x00007fb20aba91d7 in raise () from /lib64/libc.so.6 #1 0x00007fb20abaa8c8 in abort () from /lib64/libc.so.6 #2 0x00007fb20aba2146 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007fb20aba21f2 in __assert_fail () from /lib64/libc.so.6 #4 0x00000000008dba2d in qio_channel_yield (ioc=0x22f9e20, condition=G_IO_IN) at io/channel.c:460 #5 0x00000000007a870b in channel_get_buffer (opaque=0x22f9e20, buf=0x3d54038 "", pos=0, size=32768) at migration/qemu-file-channel.c:83 #6 0x00000000007a70f6 in qemu_fill_buffer (f=0x3d54000) at migration/qemu-file.c:299 #7 0x00000000007a79d0 in qemu_peek_byte (f=0x3d54000, offset=0) at migration/qemu-file.c:562 #8 0x00000000007a7a22 in qemu_get_byte (f=0x3d54000) at migration/qemu-file.c:575 #9 0x00000000007a7c46 in qemu_get_be16 (f=0x3d54000) at migration/qemu-file.c:647 #10 0x0000000000796db7 in source_return_path_thread (opaque=0x2242280) at migration/migration.c:1794 #11 0x00000000009428fa in qemu_thread_start (args=0x3e58420) at util/qemu-thread-posix.c:504 #12 0x00007fb20af3ddc5 in start_thread () from /lib64/libpthread.so.0 #13 0x00007fb20ac6b74d in clone () from /lib64/libc.so.6 This patch fixed by invoke qio_channel_yield only when qemu_in_coroutine(). Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:59 +02:00
Lidong Chen	4d9f675bcb	migration: implement io_set_aio_fd_handler function for RDMA QIOChannel if qio_channel_rdma_readv return QIO_CHANNEL_ERR_BLOCK, the destination qemu crash. The backtrace is: (gdb) bt #0 0x0000000000000000 in ?? () #1 0x00000000008db50e in qio_channel_set_aio_fd_handler (ioc=0x38111e0, ctx=0x3726080, io_read=0x8db841 <qio_channel_restart_read>, io_write=0x0, opaque=0x38111e0) at io/channel.c: #2 0x00000000008db952 in qio_channel_set_aio_fd_handlers (ioc=0x38111e0) at io/channel.c:438 #3 0x00000000008dbab4 in qio_channel_yield (ioc=0x38111e0, condition=G_IO_IN) at io/channel.c:47 #4 0x00000000007a870b in channel_get_buffer (opaque=0x38111e0, buf=0x440c038 "", pos=0, size=327 at migration/qemu-file-channel.c:83 #5 0x00000000007a70f6 in qemu_fill_buffer (f=0x440c000) at migration/qemu-file.c:299 #6 0x00000000007a79d0 in qemu_peek_byte (f=0x440c000, offset=0) at migration/qemu-file.c:562 #7 0x00000000007a7a22 in qemu_get_byte (f=0x440c000) at migration/qemu-file.c:575 #8 0x00000000007a7c78 in qemu_get_be32 (f=0x440c000) at migration/qemu-file.c:655 #9 0x00000000007a0508 in qemu_loadvm_state (f=0x440c000) at migration/savevm.c:2126 #10 0x0000000000794141 in process_incoming_migration_co (opaque=0x0) at migration/migration.c:366 #11 0x000000000095c598 in coroutine_trampoline (i0=84033984, i1=0) at util/coroutine-ucontext.c:1 #12 0x00007f9c0db56d40 in ?? () from /lib64/libc.so.6 #13 0x00007f96fe858760 in ?? () #14 0x0000000000000000 in ?? () RDMA QIOChannel not implement io_set_aio_fd_handler. so qio_channel_set_aio_fd_handler will access NULL pointer. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:11 +02:00
Lidong Chen	f5627c2af9	migration: Stop rdma yielding during incoming postcopy During incoming postcopy, the destination qemu will invoke qemu_rdma_wait_comp_channel in a seprate thread. So does not use rdma yield, and poll the completion channel fd instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:13:02 +02:00
Lidong Chen	74637e6f08	migration: implement bi-directional RDMA QIOChannel This patch implements bi-directional RDMA QIOChannel. Because different threads may access RDMAQIOChannel currently, this patch use RCU to protect it. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:26 +02:00
Lidong Chen	55cc1b5937	migration: create a dedicated connection for rdma return path If start a RDMA migration with postcopy enabled, the source qemu establish a dedicated connection for return path. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:16 +02:00
Lidong Chen	ccb7e1b5a6	migration: disable RDMA WRITE after postcopy started RDMA WRITE operations are performed with no notification to the destination qemu, then the destination qemu can not wakeup. This patch disable RDMA WRITE after postcopy started. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 12:12:07 +02:00
Li Qiang	4cbc9c7ffd	migrate/cpu-throttle: Add max-cpu-throttle migration parameter Currently, the default maximum CPU throttle for migration is 99(CPU_THROTTLE_PCT_MAX). This is too big and can make a remarkable performance effect for the guest. We see a lot of packets latency exceed 500ms when the CPU_THROTTLE_PCT_MAX reached. This patch set adds a new max-cpu-throttle parameter to limit the CPU throttle. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 11:42:34 +02:00
Peter Maydell	6f4923fcad	migration: Correctly handle subsections with no 'needed' function Currently the vmstate subsection handling code treats a subsection with no 'needed' function pointer as if it were the subsection list terminator, so the subsection is never transferred and nor is any subsection following it in the list. Handle NULL 'needed' function pointers in subsections in the same way that we do for top level VMStateDescription structures: treat the subsection as always being needed. This doesn't change behaviour for the current set of devices in the tree, because all subsections declare a 'needed' function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-08-22 11:40:47 +02:00
Junyan He	56eb90af39	migration/ram: ensure write persistence on loading all data to PMEM. Because we need to make sure the pmem kind memory data is synced after migration, we choose to call pmem_persist() when the migration finish. This will make sure the data of pmem is safe and will not lose if power is off. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Junyan He	469dd51bc6	migration/ram: Add check and info message to nvdimm post copy. The nvdimm kind memory does not support post copy now. We disable post copy if we have nvdimm memory and print some log hint to user. Signed-off-by: Junyan He <junyan.he@intel.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-08-10 13:29:39 +03:00
Lidong Chen	4b3fb65db9	migration: fix duplicate initialization for expected_downtime and cleanup_bh migrate_fd_connect duplicate initialize expected_downtime and cleanup_bh. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Message-Id: <1532434585-14732-2-git-send-email-lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:28:57 +01:00
Peter Xu	97ca211c62	migration: disallow recovery for release-ram Postcopy recovery won't work well with release-ram capability since release-ram will drop the page buffer as long as the page is put into the send buffer. So if there is a network failure happened, any page buffers that have not yet reached the destination VM but have already been sent from the source VM will be lost forever. Let's refuse the client from resuming such a postcopy migration. Luckily release-ram was designed to only be used when src and destination VMs are on the same host, so it should be fine. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:10:59 +01:00
Peter Xu	814bb08f17	migration: update recv bitmap only on dest vm We shouldn't update the received bitmap if we're the source VM. This fixes a breakage when release-ram is enabled on postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-2-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:10:41 +01:00
Dr. David Alan Gilbert	57225e5f32	migrate: Fix cancelling state warning We've been getting the warning: migration_iteration_finish: Unknown ending state 2 on a cancel. I think that's originally due to 39b9e17905c; although I've only seen the warning, I think that in some cases that we could find the VM stays paused after a cancel where it should restart. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180719092257.12703-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 17:00:45 +01:00
Peter Xu	4fcefd44a0	migration: fix potential overflow in multifd send I would guess it won't happen normally, but this should ease Coverity. >>> CID 1394385: Integer handling issues (OVERFLOW_BEFORE_WIDEN) >>> Potentially overflowing expression "pages->used * 8192U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). 854 transferred = pages->used * TARGET_PAGE_SIZE + p->packet_len; Fixes: CID 1394385 CC: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180720034713.11711-1-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-24 16:58:51 +01:00
Peter Xu	858b6d6224	migration: reorder MIG_CMD_POSTCOPY_RESUME It was accidently added before MIG_CMD_PACKAGED so it might break command compatibility when we run postcopy migration between old/new QEMUs. Fix that up quickly before the QEMU 3.0 release. Reported-by: Lukáš Doktor <ldoktor@redhat.com> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710094424.30754-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 15:23:23 +01:00
Peter Xu	3c9928d9f9	migration: show pause/recover state on dst host These two states will be missing when doing "query-migrate" on destination VM. Add these states so that we can get the query results as expected. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-5-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:37 +01:00
Peter Xu	a725ef9fe3	migration: fix incorrect bitmap size calculation The calculation on size of received bitmap is incorrect for postcopy recovery. Here we wanted to let the size to cover all the valid bits in the bitmap, we should use DIV_ROUND_UP() instead of a division. For example, a RAMBlock with size=4K (which contains only one single 4K page) will have nbits=1, then nbits/8=0, then the real bitmap won't be sent to source at all. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-4-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:18 +01:00
Peter Xu	fd037a656a	migration: loosen recovery check when load vm We were checking against -EIO, assuming that it will cover all IO failures. But actually it is not. One example is that in qemu_loadvm_section_start_full() we can have tons of places that will return -EINVAL even if the error is caused by IO failures on the network. Let's loosen the recovery check logic here to cover all the error cases happened by removing the explicit check against -EIO. After all we won't lose anything here if any other failure happened. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:56:07 +01:00
Peter Xu	1aa8367861	migration: simplify check to use qemu file buffer Firstly, renaming the old matching_page_sizes variable to matches_target_page_size, which suites more to what it did (it only checks against target page size rather than multiple page sizes). Meanwhile, simplify the check logic a bit, and enhance the comments. Should have no functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:55:59 +01:00
Peter Xu	a429e7f488	migration: unify incoming processing This is the 2nd patch to unbreak postcopy recovery. Let's unify the migration_incoming_process() call at a single place rather than calling it in connection setup codes. This fixes a problem that we will go into incoming migration procedure even if we are trying to recovery from a paused postcopy migration. Fixes: `36c2f8be2c` ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-5-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	884835fa1e	migration: unbreak postcopy recovery The whole postcopy recovery logic was accidentally broken. We need to fix it in two steps. This is the first step that we should do the recovery when needed. It was bypassed before after commit `36c2f8be2c`. Introduce postcopy_try_recovery() helper for the postcopy recovery logic. Call it both in migration_fd_process_incoming() and migration_ioc_process_incoming(). Fixes: `36c2f8be2c` ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-4-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	81e620531f	migration: move income process out of multifd Move the call to migration_incoming_process() out of multifd code. It's a bit strange that we can migration generic calls in multifd code. Instead, let multifd_recv_new_channel() return a boolean showing whether it's ready to continue the incoming migration. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Peter Xu	eed1cc7866	migration: delay postcopy paused state Before this patch we firstly setup the postcopy-paused state then we clean up the QEMUFile handles. That can be racy if there is a very fast "migrate-recover" command running in parallel. Fix that up. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-07-10 12:48:53 +01:00
Vladimir Sementsov-Ogievskiy	58f72b965e	dirty-bitmap: fix double lock on bitmap enabling Bitmap lock/unlock were added to bdrv_enable_dirty_bitmap in `8b1402ce80`, but some places were not updated correspondingly, which leads to trying to take this lock twice, which is dead-lock. Fix this. Actually, iotest 199 (about dirty bitmap postcopy migration) is broken now, and this fixes it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>	2018-07-04 02:12:49 -04:00
Peter Maydell	4a83bf2f33	migration/next for 20180627 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJbM4jhAAoJEPSH7xhYctcjFkoP/RkE3RhO/3JV+DKYU5KQwcrV n6FcXwsMIJ9EsCWLaZl4R+7nlbPYb4xPVG0UJnG1ntqIhJ5gN6RHzfi4l1wGCKUT EBRq7C5mEUbUVup7RPO1bEDixydvsvNrSHCL8AzALaag4t5HWrjneLcnJhrSykHx WovFn3Zyi/3LzUKHTzSQWLUoEKSBqNVJ1Ar3kIKwNDfS+jOhiZ6/Bm29hLn9UH0X b/iFxvCcalL2y/ulacocaS2dMe6Dx3/TvhKde8sZuyez2JeGCFFicrty7TOFhpoV INHjOQ3P4KLCWvJw8vs0jQmw2kikFQhqBXbJPVi3/0hV0MH/Uj4cC78/aGTn+Dt3 bNa63eBi6//2KUzgdA+tlrTsVrkifjIhd69a0keTy1oH0zQHRjKvJrgSu0Mcy1x3 YLbPnsjdIUrzuhCQwZcBAuzJP73OW0Q/Y+RSulFUjqkO2IBtJSUkx+AeLews5uhz +q9xom4FjHZMaRLwz6tWYUXxyfRlRyDTiDxZtL/9I+6XZWRHKqA8+/QZf0l2whTm NfJz43J+uL7Dlq9u8aMX3e+qHGwGj6b7o3zB5+xBtiHYFZMg7qPuO42Lstubdexp Zngb+EZl1H4SmYnOLhSVSY+Gus3f2xI6JXpuZuy1wo/bgTwFXM2o6z16t9sobKYl RPusi0XEBu2/XkomJxYB =kc0K -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180627' into staging migration/next for 20180627 # gpg: Signature made Wed 27 Jun 2018 13:53:53 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180627: migration: fix crash in when incoming client channel setup fails postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() migration: Stop sending whole pages through main channel migration: Remove not needed semaphore and quit migration: Wait for blocking IO migration: Start sending messages migration: Create ram_save_multifd_page migration: Create multifd_bytes ram_counter migration: Synchronize multifd threads with main thread migration: Add block where to send/receive packets migration: Multifd channels always wait on the sem migration: Add multifd traces for start/end thread migration: Abstract the number of bytes sent migration: Calculate mbps only during transfer time migration: Create multifd packet migration: Create multipage support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-28 15:31:42 +01:00
Daniel P. Berrangé	ca273df301	migration: fix crash in when incoming client channel setup fails The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit `428d89084c` Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit `36c2f8be2c` Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180619163552.18206-1-berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-27 13:29:35 +02:00
David Hildenbrand	c136180c90	postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() Not needed. Don't expose last_ram_page(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180620202736.21399-1-david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-27 13:28:31 +02:00
Juan Quintela	35374cbdff	migration: Stop sending whole pages through main channel We have to flush() the QEMUFile because now we sent really few data through that channel. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:30 +02:00
Juan Quintela	7a5cc33c48	migration: Remove not needed semaphore and quit We know quit with shutdwon in the QIO. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add comment Use shutdown() instead of unref()	2018-06-27 13:28:21 +02:00
Juan Quintela	4d22c148c9	migration: Wait for blocking IO We have three conditions here: - channel fails -> error - we have to quit: we close the channel and reads fails - normal read that success, we are in bussiness So forget the complications of waiting in a semaphore. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	8b2db7f5fd	migration: Start sending messages Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	b9ee2f7d70	migration: Create ram_save_multifd_page The function still don't use multifd, but we have simplified ram_save_page, xbzrle and RDMA stuff is gone. We have added a new counter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add last_page parameter Add commets for done and address Remove multifd field, it is the same than normal pages Merge next patch, now we send multiple pages at a time Remove counter for multifd pages, it is identical to normal pages Use iovec's instead of creating the equivalent. Clear memory used by pages (dave) Use g_new0(danp) define MULTIFD_CONTINUE now pages member is a pointer Fix off-by-one in number of pages in one packet Remove RAM_SAVE_FLAG_MULTIFD_PAGE s/multifd_pages_t/MultiFDPages_t/ add comment explaining what it means	2018-06-27 13:28:11 +02:00
Juan Quintela	a61c45bd22	migration: Create multifd_bytes ram_counter This will include how many bytes they are sent through multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	6df264ac5a	migration: Synchronize multifd threads with main thread We synchronize all threads each RAM_SAVE_FLAG_EOS. Bitmap synchronizations don't happen inside a ram section, so we are safe about two channels trying to overwrite the same memory. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- seq needs to be atomic now, will also be accessed from main thread. Fix the if (true \|\| ...) leftover We are back to non-atomics	2018-06-27 13:28:11 +02:00
Juan Quintela	0beb5ed327	migration: Add block where to send/receive packets Once there add tracepoints. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	d82628e4bd	migration: Multifd channels always wait on the sem Either for quit, sync or packet, we first wake them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	408ea6ae4c	migration: Add multifd traces for start/end thread We want to know how many pages/packets each channel has sent. Add counters for those. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- sort trace-events (dave)	2018-06-27 13:28:11 +02:00
Juan Quintela	0c8f0efdd4	migration: Abstract the number of bytes sent Right now we use the "position" inside the QEMUFile, but things like RDMA already do weird things to be able to maintain that counter right, and multifd will have some similar problems. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	6cde6fbe2b	migration: Calculate mbps only during transfer time We used to include in this calculation the setup time, but that can be quite big in rdma or multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Juan Quintela	2a26c979b1	migration: Create multifd packet We still don't put anything there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- fix magic (dave) check offset/ramblock (dave) s/seq/packet_num/ and make it 64bit	2018-06-27 13:28:11 +02:00
Juan Quintela	34c55a94b1	migration: Create multipage support We only create/destry the page list here. We will use it later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-27 13:28:11 +02:00
Stefan Hajnoczi	ec09f87753	trace: forbid floating point types Only one existing trace event uses a floating point type. Unfortunately float and double cannot be supported since SystemTap does not have floating point types. Remove float and double from the whitelist and document this limitation. Update the migrate_transferred trace event to use uint64_t instead of double. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20180621150254.4922-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2018-06-27 11:09:29 +01:00
Balamuruhan S	650af8907b	migration: calculate expected_downtime with ram_bytes_remaining() expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	e03a34f8f3	migration/postcopy: Wake rate limit sleep on postcopy request Use the 'urgent request' mechanism added in the previous patch for entries added to the postcopy request queue for RAM. Ignore the rate limiting while we have requests. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	ad767bed5a	migration: Wake rate limiting for urgent requests Rate limiting sleeps the migration thread for a while when it runs out of bandwidth; but sometimes we want to wake up to get on with something more urgent (like a postcopy request). Here we use a semaphore with a timedwait instead of a simple sleep; Incrementing the sempahore will wake it up sooner. Anything that consumes these urgent events must decrement the sempahore. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	7e555c6c58	migration/postcopy: Add max-postcopy-bandwidth parameter Limit the background transfer bandwidth during the postcopy phase to the value set on this new parameter. The default, 0, corresponds to the existing behaviour which is unlimited bandwidth. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	b734035b61	migration: introduce migration_update_rates It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Xiao Guangrong	e0e7a45d7f	migration: fix counting xbzrle cache_miss_rate Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Vladimir Sementsov-Ogievskiy	a36f6ff46f	migration/block-dirty-bitmap: fix dirty_bitmap_load dirty_bitmap_load_header return code is obtained but not handled. Fix this. Bug was introduced in `b35ebdf076` "migration: add postcopy migration of dirty bitmaps" with the whole function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180530112424.204835-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	343f632c70	migration: Poison ramblock loops in migration The migration code should be using the RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable not the all-block versions; poison them so that we can't accidentally use them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Dr. David Alan Gilbert	ff0769a4ad	migration: Fixes for non-migratable RAMBlocks There are still a few cases where migration code is using the macros and functions that do all RAMBlocks rather than just the migratable blocks; fix those up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Greg Kurz	ea134caa08	typedefs: add QJSON Since commit `83ee768d62`, we now have two places that define the QJSON type: $ git grep 'typedef struct QJSON QJSON' include/migration/vmstate.h:typedef struct QJSON QJSON; migration/qjson.h:typedef struct QJSON QJSON; This breaks docker-test-build@centos6: In file included from /tmp/qemu-test/src/migration/savevm.c:59: /tmp/qemu-test/src/migration/qjson.h:16: error: redefinition of typedef 'QJSON' /tmp/qemu-test/src/include/migration/vmstate.h:30: note: previous declaration of 'QJSON' was here make: *** [migration/savevm.o] Error 1 This happens because CentOS 6 has an old GCC 4.4.7. Even if redefining a typedef with the same type is permitted since GCC 4.6, unless -pedantic is passed, we don't really need to do that on purpose. Let's have a single definition in <qemu/typedefs.h> instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <152844714981.11789.3657734445739553287.stgit@bahia.lan> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-06-15 14:40:56 +01:00
Peter Maydell	b74588a493	migration/next for 20180604 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJbFLygAAoJEPSH7xhYctcjszUP/j7/21sHXwPyB5y4CQHsivLL 18MnXTkjgSlue3fuPKI9ZzD2VZq19AtwarLuW3IFf15adKfeV0hhZsIMUmu0XTAz jgol8YYgoYlk3/on8Ux8PWV9RWKHh7vWAF7hjwJOnI5CFF7gXz6kVjUx03A7RSnB 1a92lvgl5P8AbK5HZ/i08nOaAxCUWIVIoQeZf3BlZ5WdJE0j+d0Mvp/S8NEq5VPD P6oYJZy64QFGotmV1vtYD4G/Hqe8XBw2LoSVgjPSmQpd1m4CXiq0iyCxtC8HT+iq 0P0eHATRglc5oGhRV6Mi5ixH2K2MvpshdyMKVAbkJchKBHn4k3m6jefhnUXuucwC fq9x7VIYzQ80S+44wJri683yUWP3Qmbb6mWYBb1L0jgbTkY1wx431IYMtDVv5q4R oXtbe77G6lWKHH0SFrU8TY/fqJvBwpat+QwOoNsluHVjzAzpOpcVokB8pLj592/1 bHkDwvv7i+B37muOF3PFy++BoliwpRiNbVC5Btw4mnbOoY+AeJHMRBuhJHU6EyJv GOFiQ5w5XEXEu98giNqHcc0AC8w7iTRGI472UzpS0AG20wpc1/2jF0yM99stauwD oIu3MU4dQlz6QRy8KFnJShqjYNYuOs59USLiYM79ABa9J+JxJxxTAaqRbUFojQR6 KN2i0QQJ+6MO4Skjbu/3 =K0eJ -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180604' into staging migration/next for 20180604 # gpg: Signature made Mon 04 Jun 2018 05:14:24 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180604: migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration: remove unnecessary variables len in QIOChannelRDMA migration: Don't activate block devices if using -S migration: discard non-migratable RAMBlocks migration: introduce decompress-error-check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 12:54:00 +01:00
Peter Maydell	f67c9b693a	acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJbEXNvAAoJECgfDbjSjVRpc8gH/R8xrcFrV+k9wwbgYcOcGb6Y LWjseE31pqJcxRV80vLOdzYEuLStZQKQQY7xBDMlA5vdyvZxIA6FLO2IsiJSbFAk EK8pclwhpwQAahr8BfzenabohBv2UO7zu5+dqSvuJCiMWF3jGtPAIMxInfjXaOZY odc1zY2D2EgsC7wZZ1hfraRbISBOiRaez9BoGDKPOyBY9G1ASEgxJgleFgoBLfsK a1XU+fDM6hAVdxftfkTm0nibyf7PWPDyzqghLqjR9WXLvZP3Cqud4p8N29mY51pR KSTjA4FYk6Z9EVMltyBHfdJs6RQzglKjxcNGdlrvacDfyFi79fGdiosVllrjfJM= =3+V0 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-04 10:15:16 +01:00
Lidong Chen	c5e76115cc	migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Lidong Chen	f38f6d4155	migration: remove unnecessary variables len in QIOChannelRDMA Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked by different threads concurrently, this patch removes unnecessary variables len in QIOChannelRDMA and use local variable instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Lidong Chen <jemmy858585@gmail.com>	2018-06-04 05:46:15 +02:00
Dr. David Alan Gilbert	0f073f44df	migration: Don't activate block devices if using -S Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. This change is tied to the new migration capability 'late-block-activate' that defaults to off, keeping the old behaviour by default. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Cédric Le Goater	b895de5027	migration: discard non-migratable RAMBlocks On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Xiao Guangrong	f548222c24	migration: introduce decompress-error-check QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-06-04 05:46:15 +02:00
Peter Maydell	afd76ffba9	* Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAlsRd2UUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPG8Qf+M85E8xAQ/bhs90tAymuXkUUsTIFF uI76K8eM0K3b2B+vGckxh1gyN5O3GQaMEDL7vITfqbX+EOH5U2lv8V9JRzf2YvbG Zahjd4pOCYzR0b9JENA1r5U/J8RntNrBNXlKmGTaXOaw9VCXlZyvgVd9CE3z/e2M 0jSXMBdF4LB3UzECI24Va8ejJxdSiJcqXA2j3J+pJFxI698i+Z5eBBKnRdo5TVe5 jl0TYEsbS6CLwhmbLXmt3Qhq+ocZn7YH9X3HjkHEdqDUeYWyT9jwUpa7OHFrIEKC ikWm9er4YDzG/vOC0dqwKbShFzuTpTJuMz5Mj4v8JjM/iQQFrp4afjcW2g== =RS/B -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Linux header upgrade (Peter) * firmware.json definition (Laszlo) * IPMI migration fix (Corey) * QOM improvements (Alexey, Philippe, me) * Memory API cleanups (Jay, me, Tristan, Peter) * WHPX fixes and improvements (Lucian) * Chardev fixes (Marc-André) * IOMMU documentation improvements (Peter) * Coverity fixes (Peter, Philippe) * Include cleanup (Philippe) * -clock deprecation (Thomas) * Disable -sandbox unless CONFIG_SECCOMP (Yi Min Zhao) * Configurability improvements (me) # gpg: Signature made Fri 01 Jun 2018 17:42:13 BST # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (56 commits) hw: make virtio devices configurable via default-configs/ hw: allow compiling out SCSI memory: Make operations using MemoryRegionIoeventfd struct pass by pointer. char: Remove unwanted crlf conversion qdev: Remove DeviceClass::init() and ::exit() qdev: Simplify the SysBusDeviceClass::init path hw/i2c: Use DeviceClass::realize instead of I2CSlaveClass::init hw/i2c/smbus: Use DeviceClass::realize instead of SMBusDeviceClass::init target/i386/kvm.c: Remove compatibility shim for KVM_HINTS_REALTIME Update Linux headers to 4.17-rc6 target/i386/kvm.c: Handle renaming of KVM_HINTS_DEDICATED scripts/update-linux-headers: Handle kernel license no longer being one file scripts/update-linux-headers: Handle __aligned_u64 virtio-gpu-3d: Define VIRTIO_GPU_CAPSET_VIRGL2 elsewhere gdbstub: Prevent fd leakage docs/interop: add "firmware.json" ipmi: Use proper struct reference for KCS vmstate vmstate: Add a VSTRUCT type tcg: remove softfloat from --disable-tcg builds qemu-options: Mark the non-functional -clock option as deprecated ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-06-01 18:24:16 +01:00
Michael S. Tsirkin	53d37d36ca	migration: use local path for local headers When pulling in headers that are in the same directory as the C file (as opposed to one in include/), we should use its relative path, without a directory. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com>	2018-06-01 19:20:38 +03:00
Michael S. Tsirkin	83ee768d62	migration: drop an unused include In the vmstate.h file, we just need a struct name. Use a forward declaration instead of an include, then adjust the one affected .c file to include the file that is no longer implicit from the header. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-06-01 19:20:37 +03:00
Corey Minyard	2dc6660bd8	vmstate: Add a VSTRUCT type The VMS_STRUCT has no way to specify which version of a structure to use. Add a type and a new field to allow the specific version of a structure to be used. Signed-off-by: Corey Minyard <cminyard@mvista.com> Message-Id: <1524670052-28373-2-git-send-email-minyard@acm.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-06-01 15:13:46 +02:00
Peter Xu	bf269906f5	migration: use g_free for ram load bitmap Buffers allocated with bitmap_new() should be freed with g_free(). Both reported by Coverity: *** CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 3517 in ram_dirty_bitmap_reload() 3511 * the last one to sync, we need to notify the main send thread. 3512 / 3513 ram_dirty_bitmap_reload_notify(s); 3514 3515 ret = 0; 3516 out: >>> CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 3517 free(le_bitmap); 3518 return ret; 3519 } 3520 3521 static int ram_resume_prepare(MigrationState s, void opaque) 3522 { ** CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 249 in ramblock_recv_bitmap_send() 243 * Mark as an end, in case the middle part is screwed up due to 244 * some "misterious" reason. 245 */ 246 qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); 247 qemu_fflush(file); 248 >>> CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 249 free(le_bitmap); 250 251 if (qemu_file_get_error(file)) { 252 return qemu_file_get_error(file); 253 } 254 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180525015042.31778-1-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-25 15:29:48 +02:00
Juan Quintela	0efc91423a	migration: fix exec/fd migrations Commit: commit `36c2f8be2c` Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines Missed tcp and fd transports. This fix its. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Tested-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20180523091411.1073-1-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-25 15:29:47 +02:00
Dr. David Alan Gilbert	8b7bf2bada	Migration+TLS: Fix crash due to double cleanup During a TLS connect we see: migration_channel_connect calls migration_tls_channel_connect (calls after TLS setup) migration_channel_connect My previous error handling fix made migration_channel_connect call migrate_fd_connect in all cases; unfortunately the above means it gets called twice and crashes doing double cleanup. Fixes: `688a3dcba9` Reported-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180430185943.35714-1-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 22:13:08 +02:00
Lidong Chen	71cd73061c	migration: update index field when delete or qsort RDMALocalBlock rdma_delete_block function deletes RDMALocalBlock base on index field, but not update the index field. So when next time invoke rdma_delete_block, it will not work correctly. If start and cancel migration repeatedly, some RDMALocalBlock not invoke ibv_dereg_mr to decrease kernel mm_struct vmpin. When vmpin is large than max locked memory limitation, ibv_reg_mr will failed, and migration can not start successfully again. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1525618499-1560-1-git-send-email-lidongchen@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Lidong Chen <jemmy858585@gmail.com>	2018-05-15 22:13:08 +02:00
Peter Xu	bfbf89c2b5	migration/qmp: add command migrate-pause It pauses an ongoing migration. Currently it only supports postcopy. Note that this command will work on either side of the migration. Basically when we trigger this on one side, it'll interrupt the other side as well since the other side will get notified on the disconnect event. However, it's still possible that the other side is not notified, for example, when the network is totally broken, or due to some firewall configuration changes. In that case, we will also need to run the same command on the other side so both sides will go into the paused state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-24-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 22:12:57 +02:00
Peter Xu	62df066fff	migration: introduce lock for to_dst_file Let's introduce a lock for that QEMUFile since we are going to operate on it in multiple threads. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-23-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 22:12:41 +02:00
Peter Xu	02affd41b1	qmp/migration: new command migrate-recover The first allow-oob=true command. It's used on destination side when the postcopy migration is paused and ready for a recovery. After execution, a new migration channel will be established for postcopy to continue. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-21-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 22:11:45 +02:00
Peter Xu	e1b1b1bc36	migration: init dst in migration_object_init too Though we may not need it, now we init both the src/dst migration objects in migration_object_init() so that even incoming migration object would be thread safe (it was not). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-20-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:57:01 +02:00
Peter Xu	9419069695	migration: final handshake for the resume Finish the last step to do the final handshake for the recovery. First source sends one MIG_CMD_RESUME to dst, telling that source is ready to resume. Then, dest replies with MIG_RP_MSG_RESUME_ACK to source, telling that dest is ready to resume (after switch to postcopy-active state). When source received the RESUME_ACK, it switches its state to postcopy-active, and finally the recovery is completed. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-19-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:57:00 +02:00
Peter Xu	08614f3497	migration: setup ramstate for resume After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:59 +02:00
Peter Xu	edd090c728	migration: synchronize dirty bitmap for resume This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:57 +02:00
Peter Xu	d1b8eadbc4	migration: introduce SaveVMHandlers.resume_prepare This is hook function to be called when a postcopy migration wants to resume from a failure. For each module, it should provide its own recovery logic before we switch to the postcopy-active state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-16-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:55 +02:00
Peter Xu	13955b89ce	migration: new message MIG_RP_MSG_RESUME_ACK Creating new message to reply for MIG_CMD_POSTCOPY_RESUME. One uint32_t is used as payload to let the source know whether destination is ready to continue the migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-15-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:53 +02:00
Peter Xu	3f5875eca5	migration: new cmd MIG_CMD_POSTCOPY_RESUME Introducing this new command to be sent when the source VM is ready to resume the paused migration. What the destination does here is basically release the fault thread to continue service page faults. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-14-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:52 +02:00
Peter Xu	a335debb35	migration: new message MIG_RP_MSG_RECV_BITMAP Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:51 +02:00
Peter Xu	f25d42253c	migration: new cmd MIG_CMD_RECV_BITMAP Add a new vm command MIG_CMD_RECV_BITMAP to request received bitmap for one ramblock. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-12-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:49 +02:00
Peter Xu	d96c9e8d78	migration: wakeup dst ram-load-thread for recover On the destination side, we cannot wake up all the threads when we got reconnected. The first thing to do is to wake up the main load thread, so that we can continue to receive valid messages from source again and reply when needed. At this point, we switch the destination VM state from postcopy-paused back to postcopy-recover. Now we are finally ready to do the resume logic. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-11-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:56:48 +02:00
Peter Xu	135b87b4f0	migration: new state "postcopy-recover" Introducing new migration state "postcopy-recover". If a migration procedure is paused and the connection is rebuilt afterward successfully, we'll switch the source VM state from "postcopy-paused" to the new state "postcopy-recover", then we'll do the resume logic in the migration thread (along with the return path thread). This patch only do the state switch on source side. Another following up patch will handle the state switching on destination side using the same status bit. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-10-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.11/2.13/	2018-05-15 20:56:30 +02:00
Peter Xu	d3e35b8f62	migration: rebuild channel on source This patch detects the "resume" flag of migration command, rebuild the channels only if the flag is set. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-9-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:55:19 +02:00
Peter Xu	7a4da28b26	qmp: hmp: add migrate "resume" option It will be used when we want to resume one paused migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-8-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> --- s/2.12/2.13/	2018-05-15 20:54:49 +02:00
Peter Xu	3a7804c306	migration: allow fault thread to pause Allows the fault thread to stop handling page faults temporarily. When network failure happened (and if we expect a recovery afterwards), we should not allow the fault thread to continue sending things to source, instead, it should halt for a while until the connection is rebuilt. When the dest main thread noticed the failure, it kicks the fault thread to switch to pause state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-7-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	14b1742eaa	migration: allow src return path to pause Let the thread pause for network issues. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-6-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	b411b844fb	migration: allow dst vm pause on postcopy When there is IO error on the incoming channel (e.g., network down), instead of bailing out immediately, we allow the dst vm to switch to the new POSTCOPY_PAUSE state. Currently it is still simple - it waits the new semaphore, until someone poke it for another attempt. One note is that here on ram loading thread we cannot detect the POSTCOPY_ACTIVE state, but we need to detect the more specific POSTCOPY_INCOMING_RUNNING state, to make sure we have already loaded all the device states. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-5-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	b23c2ade25	migration: implement "postcopy-pause" src logic Now when network down for postcopy, the source side will not fail the migration. Instead we convert the status into this new paused state, and we will try to wait for a rescue in the future. If a recovery is detected, migration_thread() will reset its local variables to prepare for that. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-4-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	a688d2c1ab	migration: new postcopy-pause state Introducing a new state "postcopy-paused", which can be used when the postcopy migration is paused. It is targeted for postcopy network failure recovery. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-3-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Peter Xu	e89f5ff2c3	migration: let incoming side use thread context The old incoming migration is running in main thread and default gcontext. With the new qio_channel_add_watch_full() we can now let it run in the thread's own gcontext (if there is one). Currently this patch does nothing alone. But when any of the incoming migration is run in another iothread (e.g., the upcoming migrate-recover command), this patch will bind the incoming logic to the iothread instead of the main thread (which may already get page faulted and hanged). RDMA is not considered for now since it's not even using the QIO watch framework at all. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-2-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	8c4598f2b1	migration: Define MultifdRecvParams sooner Once there, we don't need the struct names anywhere, just the typedefs. And now also document all fields. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	af8b7d2b09	migration: Transmit initial package through the multifd channels Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Be network agnostic. Add error checking for all values.	2018-05-15 20:24:27 +02:00
Juan Quintela	36c2f8be2c	migration: Delay start of migration main routines We need to make sure that we have started all the multifd threads. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	60df2d4ae5	migration: Create multifd channels In both sides. We still don't transmit anything through them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	3854956ad7	migration: Export functions to create send channels Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	62c1e0ca73	migration: Be sure all recv channels are created We need them before we start migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	667707078d	migration: terminate_* can be called for other threads Once there, make count field to always be accessed with atomic operations. To make blocking operations, we need to know that the thread is running, so create a bool to indicate that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Once here, s/terminate_multifd_-threads/multifd__terminate_threads/ This is consistente with every other function	2018-05-15 20:24:27 +02:00
Juan Quintela	71bb07dbfc	migration: Introduce multifd_recv_new_channel() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>	2018-05-15 20:24:27 +02:00
Juan Quintela	7a169d745c	migration: Set error state in case of error Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:27 +02:00
Xiao Guangrong	701b1876c0	migration: fix saving normal page even if it's been compressed Fix the bug introduced by `da3f56cb2e` (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180428081045.8878-1-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-05-15 20:24:00 +02:00
Peter Maydell	c8b7e627b4	nbd patches for 2018-05-04 - Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs - Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE - Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply -----BEGIN PGP SIGNATURE----- Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAABCAAGBQJa7F9jAAoJEKeha0olJ0NqwZIH/3LbaF7Q0CcuB6d+nQo3jYm2 fGIb8pV4pZLgC4D4qrelXP2Ttn0RNMLNdq2UR6F66/MAFj2/sF4gRE/p82exZRK3 be2hDQpzKOTgrF7SQN9ccI1df7nrMhvXgo1Y4rhSFZKtMTBPZirDFdAP/2xklSzI jMHE/Iq9Ng16303FR5KEiVtmAWxAaagapcvEKIrD0SpoiAX9jk6dAT5EwOHLi4Lf 2CCe/5iBFC96zLE2xJ8n+esZ1chJJp/2gubmYON/lwLx5fXqYVowywDVNzv+uc8A sg2VMjb/UySOLJ6IxdxgGdln0w46RB7u55nRnyH6LcU2IdBeladhkI7Oh/reOTI= =wnw9 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-05-04' into staging nbd patches for 2018-05-04 - Vladimir Sementsov-Ogievskiy: 0/2 fix coverity bugs - Eric Blake: nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE - Eric Blake: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply # gpg: Signature made Fri 04 May 2018 14:25:55 BST # gpg: using RSA key A7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" # gpg: aka "[jpeg image of size 6874]" # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2018-05-04: nbd/client: Relax handling of large NBD_CMD_BLOCK_STATUS reply nbd/client: Fix error messages during NBD_INFO_BLOCK_SIZE migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits nbd/client: fix nbd_negotiate_simple_meta_context Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-05-04 14:42:46 +01:00
Vladimir Sementsov-Ogievskiy	16a2227893	migration/block-dirty-bitmap: fix memory leak in dirty_bitmap_load_bits Release buf on error path too. Bug was introduced in `b35ebdf076` "migration: add postcopy migration of dirty bitmaps" with the whole function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180427142002.21930-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com>	2018-05-04 08:23:26 -05:00
Marc-André Lureau	cb3e7f08ae	qobject: Replace qobject_incref/QINCREF qobject_decref/QDECREF Now that we can safely call QOBJECT() on QObject * as well as its subtypes, we can have macros qobject_ref() / qobject_unref() that work everywhere instead of having to use QINCREF() / QDECREF() for QObject and qobject_incref() / qobject_decref() for its subtypes. The replacement is mechanical, except I broke a long line, and added a cast in monitor_qmp_cleanup_req_queue_locked(). Unlike qobject_decref(), qobject_unref() doesn't accept void *. Note that the new macros evaluate their argument exactly once, thus no need to shout them. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180419150145.24795-4-marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Rebased, semantic conflict resolved, commit message improved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2018-05-04 08:27:53 +02:00
Xiao Guangrong	da3f56cb2e	migration: remove ram_save_compressed_page() Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-11-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:14 +01:00
Xiao Guangrong	65dacaa04f	migration: introduce save_normal_page() It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-10-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:12 +01:00
Xiao Guangrong	d7400a3409	migration: move calling save_zero_page to the common place save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-9-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:11 +01:00
Xiao Guangrong	a8ec91f941	migration: move calling control_save_page to the common place The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-8-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:10 +01:00
Xiao Guangrong	1faa5665c0	migration: move some code to ram_save_host_page Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-7-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:09 +01:00
Xiao Guangrong	059ff0fb29	migration: introduce control_save_page() Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-6-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:09 +01:00
Xiao Guangrong	34ab9e9743	migration: detect compression and decompression errors Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the migration more robuster, we copy the page to a buffer first to avoid it being written by VM, then detect and handle the errors of both compression and decompression errors properly Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-5-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:08 +01:00
Xiao Guangrong	797ca154b4	migration: stop decompression to allocate and free memory frequently Current code uses uncompress() to decompress memory which manages memory internally, that causes huge memory is allocated and freed very frequently, more worse, frequently returning memory to kernel will flush TLBs So, we maintain the memory by ourselves and reuse it for each decompression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-4-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:07 +01:00
Xiao Guangrong	dcaf446ebd	migration: stop compression to allocate and free memory frequently Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which interacts with KVM MMU, that dramatically reduce the performance of VM So, we maintain the memory by ourselves and reuse it for each compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-3-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:06 +01:00
Xiao Guangrong	263a289ae6	migration: stop compressing page in migration thread As compression is a heavy work, do not do it in migration thread, instead, we post it out as a normal page Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-2-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:04:05 +01:00
Alexey Perevalov	65ace06045	migration: add postcopy total blocktime into query-migrate Postcopy total blocktime is available on destination side only. But query-migrate was possible only for source. This patch adds ability to call query-migrate on destination. To be able to see postcopy blocktime, need to request postcopy-blocktime capability. The query-migrate command will show following sample result: {"return": "postcopy-vcpu-blocktime": [115, 100], "status": "completed", "postcopy-blocktime": 100 }} postcopy_vcpu_blocktime contains list, where the first item is the first vCPU in QEMU. This patch has a drawback, it combines states of incoming and outgoing migration. Ongoing migration state will overwrite incoming state. Looks like better to separate query-migrate for incoming and outgoing migration or add parameter to indicate type of migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-7-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:02:17 +01:00
Alexey Perevalov	575b0b332e	migration: calculate vCPU blocktime on dst side This patch provides blocktime calculation per vCPU, as a summary and as a overlapped value for all vCPUs. This approach was suggested by Peter Xu, as an improvements of previous approch where QEMU kept tree with faulted page address and cpus bitmask in it. Now QEMU is keeping array with faulted page address as value and vCPU as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps list for blocktime per vCPU (could be traced with page_fault_addr) Blocktime will not calculated if postcopy_blocktime field of MigrationIncomingState wasn't initialized. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-4-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:02:13 +01:00
Alexey Perevalov	2a4c42f18c	migration: add postcopy blocktime ctx into MigrationIncomingState This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID, in case this feature is provided by kernel. PostcopyBlocktimeContext is encapsulated inside postcopy-ram.c, due to it being a postcopy-only feature. Also it defines PostcopyBlocktimeContext's instance live time. Information from PostcopyBlocktimeContext instance will be provided much after postcopy migration end, instance of PostcopyBlocktimeContext will live till QEMU exit, but part of it (vcpu_addr, page_fault_vcpu_time) used only during calculation, will be released when postcopy ended or failed. To enable postcopy blocktime calculation on destination, need to request proper compatibility (Patch for documentation will be at the tail of the patch set). As an example following command enable that capability, assume QEMU was started with -chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock option to control it [root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \ {\"execute\": \"migrate-set-capabilities\" , \"arguments\": { \"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\": true } ] } }" \| nc -U /var/lib/migrate-vm-monitor.sock Or just with HMP (qemu) migrate_set_capability postcopy-blocktime on Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-3-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:02:12 +01:00
Alexey Perevalov	f22f928ec9	migration: introduce postcopy-blocktime capability Right now it could be used on destination side to enable vCPU blocktime calculation for postcopy live migration. vCPU blocktime - it's time since vCPU thread was put into interruptible sleep, till memory page was copied and thread awake. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1521742647-25550-2-git-send-email-a.perevalov@samsung.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-25 18:02:12 +01:00
Dr. David Alan Gilbert	a18a73d747	Revert "migration: Don't activate block devices if using -S" This reverts commit `0746a92612`. Discussion with kwolf suggests this is actually an API change that we need to gate on a capability. Push to 2.13. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-04-10 15:28:42 +01:00
Dr. David Alan Gilbert	0746a92612	migration: Don't activate block devices if using -S Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180328170207.49512-1-dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-29 14:53:16 +01:00
Marc-André Lureau	fc6008f37a	migration: fix pfd leak Fix leak spotted by ASAN: Direct leak of 16 byte(s) in 1 object(s) allocated from: #0 0x7fe1abb80a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7fe1aaf1bf75 in g_malloc0 ../glib/gmem.c:124 #2 0x7fe1aaf1c249 in g_malloc0_n ../glib/gmem.c:355 #3 0x55f4841cfaa9 in postcopy_ram_fault_thread /home/elmarco/src/qemu/migration/postcopy-ram.c:596 #4 0x55f48479447b in qemu_thread_start /home/elmarco/src/qemu/util/qemu-thread-posix.c:504 #5 0x7fe1a043550a in start_thread (/lib64/libpthread.so.0+0x750a) Regression introduced with commit `00fa4fc85b`. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180321113644.21899-1-marcandre.lureau@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-29 14:53:16 +01:00
Dr. David Alan Gilbert	09576e74db	migration: Fix block migration flag case Fix the case where when a migration with a bad protocol is tried, we leave the block migration capability set. (This is a cut down version of my 'migration: Fix block failure cases' where it's other case was fixed by Peter's `dd0ee30cae` ) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180316202114.32345-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-23 18:24:11 +00:00
Peter Lieven	b47d1e9fe0	migration/block: compare only read blocks against the rate limiter only read_done blocks are in the queued to be flushed to the migration stream. submitted blocks are still in flight. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1520507908-16743-6-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-23 16:45:18 +00:00
Peter Lieven	44815334e1	migration/block: limit the number of parallel I/O requests the current implementation submits up to 512 I/O requests in parallel which is much to high especially for a background task. This patch adds a maximum limit of 16 I/O requests that can be submitted in parallel to avoid monopolizing the I/O device. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1520507908-16743-5-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-23 16:45:03 +00:00
Lidong Chen	e8a0f2f9a1	migration: Fix rate limiting issue on RDMA migration RDMA migration implement save_page function for QEMUFile, but ram_control_save_page do not increase bytes_xfer. So when doing RDMA migration, it will use whole bandwidth. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Message-Id: <1520692378-1835-1-git-send-email-lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-23 16:37:15 +00:00
Daniel P. Berrange	bdd847a026	migration: convert socket server to QIONetListener Instead of creating a QIOChannelSocket directly for the migration server socket, use a QIONetListener. This provides the ability to listen on multiple sockets at the same time, so enables full support for IPv4/IPv6 dual stack. For example, '$QEMU -incoming tcp::9000' now correctly listens on both 0.0.0.0 and :: at the same time, instead of only on 0.0.0.0. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20180312141714.7223-1-berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-23 16:27:24 +00:00
Peter Maydell	ed627b2ad3	virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJasR1rAAoJECgfDbjSjVRpOocH/R9A3g/TkpGjmLzJBrrX1NGO I/iq0ttHjqg4OBIChA4BHHjXwYUMs7XQn26B3efrk1otLAJhuqntZIIo3uU0WraA 5J+4DT46ogs5rZWNzDCZ0zAkSaATDA6h9Nfh7TvPc9Q2WpcIT0cTa/jOtrxRc9Vq 32hbUKtJSpNxRjwbZvk6YV21HtWo3Tktdaj9IeTQTN0/gfMyOMdgxta3+bymicbJ FuF9ybHcpXvrEctHhXHIL4/YVGEH/4shagZ4JVzv1dVdLeHLZtPomdf7+oc0+07m Qs+yV0HeRS5Zxt7w5blGLC4zDXczT/bUx8oln0Tz5MV7RR/+C2HwMOHC69gfpSc= =vomK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh	2018-03-20 15:48:34 +00:00
Dr. David Alan Gilbert	29d8fa7f73	postcopy: Allow shared memory Now that we have the mechanisms in here, allow shared memory in a postcopy. Note that QEMU can't tell who all the users of shared regions are and thus can't tell whether all the users of the shared regions have appropriate support for postcopy. Those devices that explicitly support shared memory (e.g. vhost-user) must check, but it doesn't stop weirder configurations causing problems. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:37 +02:00
Dr. David Alan Gilbert	46343570c0	vhost+postcopy: Wire up POSTCOPY_END notify Wire up a call to VHOST_USER_POSTCOPY_END message to the vhost clients right before we ask the listener thread to shutdown. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:37 +02:00
Dr. David Alan Gilbert	dedfb4b21a	vhost+postcopy: Call wakeups Cause the vhost-user client to be woken up whenever: a) We place a page in postcopy mode b) We get a fault and the page has already been received Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:37 +02:00
Dr. David Alan Gilbert	d488b349a3	postcopy: postcopy_notify_shared_wake Add a hook to allow a client userfaultfd to be 'woken' when a page arrives, and a walker that calls that hook for relevant clients given a RAMBlock and offset. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:37 +02:00
Dr. David Alan Gilbert	5efc356403	postcopy: helper for waking shared Provide a helper to send a 'wake' request on a userfaultfd for a shared process. The address in the clients address space is specified together with the RAMBlock it was resolved to. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:35 +02:00
Michael S. Tsirkin	c188c53927	postcopy-ram: add a stub for postcopy_request_shared_page This fixes the build on systems without userfaultfd. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 16:40:10 +02:00
Dr. David Alan Gilbert	096bf4c852	vhost+postcopy: Helper to send requests to source for shared pages Provide a helper to be used by shared waker functions to request shared pages from the source. The last_rb pointer is moved into the incoming state since this helper can update it as well as the main fault thread function. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:29 +02:00
Dr. David Alan Gilbert	1cba9f6e66	migration/ram: ramblock_recv_bitmap_test_byte_offset Utility for testing the map when you already know the offset in the RAMBlock. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:28 +02:00
Dr. David Alan Gilbert	6864a7b5ac	vhost+postcopy: Transmit 'listen' to slave Notify the vhost-user slave on reception of the 'postcopy-listen' event from the source. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:28 +02:00
Dr. David Alan Gilbert	00fa4fc85b	postcopy: Allow registering of fd handler Allow other userfaultfd's to be registered into the fault thread so that handlers for shared memory can get responses. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:28 +02:00
Dr. David Alan Gilbert	d3dff7a5a1	vhost-user: Add 'VHOST_USER_POSTCOPY_ADVISE' message Wire up a notifier to send a VHOST_USER_POSTCOPY_ADVISE message on an incoming advise. Later patches will fill in the behaviour/contents of the message. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:27 +02:00
Dr. David Alan Gilbert	1693c64c27	postcopy: Add notifier chain Add a notifier chain for postcopy with a 'reason' flag and an opportunity for a notifier member to return an error. Call it when enabling postcopy. This will initially used to enable devices to declare they're unable to postcopy and later to notify of devices of stages within postcopy. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:27 +02:00
Dr. David Alan Gilbert	2ce16640b4	postcopy: use UFFDIO_ZEROPAGE only when available Use a flag on the RAMBlock to state whether it has the UFFDIO_ZEROPAGE capability, use it when it's available. This allows the use of postcopy on tmpfs as well as hugepage backed files. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2018-03-20 05:03:27 +02:00
Peter Maydell	9cc7d0cf6a	-----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaqD6PAAoJEH3vgQaq/DkO9FIP/3pAW3xJUDGYsONiebX1IbhA VpoQCcjks3cHD18AUoVHufayJBUVfed1LhYPP8xoDuSRmKs1xU1O9FknxMQaL+Dw kbliBY7GjN8A2EcCjW+ZwyNT/KpjyXXwuZ2PSnOSSiN3JK6wrLCzeZyKyOYewLCS u9fKscnqWkg+awbCfDlVs92AaBAKoOP9loOq6e2J/jVY8HSDGb2owRnsxaWg8gJ8 J9BlnXENQ14jEwickD3sluPfWkhu9xh7cCocH8cfgXL5veGUELz0Ugx4RHcsAF9Q SVDg/EhRRN11cvOkLnlggETaLbGtEE64AL4HhjxzCLraHsnEazPDwFgetB9mOhhF Nqu8HuGcVvRgn89au89mxAvTSWX9KFq4oF8Vi+FZZHkLilRx6NJnMpUpd9zkSJDq yjR2/BV0A9Ep1gvWX/rhpPrN5dALYHcaxoiSB497Yj4SI2ZSyzfrneteYdPv4EEc 3CSJ3l6NCGAE2dNXuVZTVqHyXOSl7mJQQmT53dtsSNipCMEsVr0mOx3DPNY26LIc DUdnX6JOyZPU0wzOj8xjFNV72/gBEkqVZ5p9UJ+lrIYwOsTobpzfDtYquu4asda8 IN44mcbRCZRFIiZZOGEdnwf34vIpQKMiZAtszAaan9KXwTXV9LbipaomBEN88vUD IgI5XsZTfiD2uIjnREWv =ISfR -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging # gpg: Signature made Tue 13 Mar 2018 21:11:43 GMT # gpg: using RSA key 7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/bitmaps-pull-request: iotests: add dirty bitmap postcopy test iotests: add dirty bitmap migration test migration: add postcopy migration of dirty bitmaps migration: allow qmp command migrate-start-postcopy for any postcopy migration: add is_active_iterate handler migration/qemu-file: add qemu_put_counted_string() migration: include migrate_dirty_bitmaps in migrate_postcopy qapi: add dirty-bitmaps migration capability migration: introduce postcopy-only pending dirty-bitmap: add locked state block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-03-16 14:15:18 +00:00
Vladimir Sementsov-Ogievskiy	b35ebdf076	migration: add postcopy migration of dirty bitmaps Postcopy migration of dirty bitmaps. Only named dirty bitmaps are migrated. If destination qemu is already containing a dirty bitmap with the same name as a migrated bitmap (for the same node), then, if their granularities are the same the migration will be done, otherwise the error will be generated. If destination qemu doesn't contain such bitmap it will be created. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-12-vsementsov@virtuozzo.com [Changed '+' to '*' as per list discussion. --js] Signed-off-by: John Snow <jsnow@redhat.com>	2018-03-13 17:06:09 -04:00
Vladimir Sementsov-Ogievskiy	16b0fd3252	migration: allow qmp command migrate-start-postcopy for any postcopy Allow migrate-start-postcopy for any postcopy type Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-11-vsementsov@virtuozzo.com	2018-03-13 17:06:03 -04:00
Vladimir Sementsov-Ogievskiy	c865d84872	migration: add is_active_iterate handler Only-postcopy savevm states (dirty-bitmap) don't need live iteration, so to disable them and stop transporting empty sections there is a new savevm handler. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180313180320.339796-10-vsementsov@virtuozzo.com	2018-03-13 17:05:58 -04:00
Vladimir Sementsov-Ogievskiy	f0d64cb729	migration/qemu-file: add qemu_put_counted_string() Add function opposite to qemu_get_counted_string. qemu_put_counted_string puts one-byte length of the string (string should not be longer than 255 characters), and then it puts the string, without last zero byte. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180313180320.339796-9-vsementsov@virtuozzo.com	2018-03-13 17:05:55 -04:00
Vladimir Sementsov-Ogievskiy	dd6bb91450	migration: include migrate_dirty_bitmaps in migrate_postcopy Enable postcopy if dirty bitmap migration is enabled. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180313180320.339796-8-vsementsov@virtuozzo.com	2018-03-13 17:05:51 -04:00
Vladimir Sementsov-Ogievskiy	55efc8c2ff	qapi: add dirty-bitmaps migration capability Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180313180320.339796-7-vsementsov@virtuozzo.com	2018-03-13 17:05:45 -04:00
Vladimir Sementsov-Ogievskiy	4799502640	migration: introduce postcopy-only pending There would be savevm states (dirty-bitmap) which can migrate only in postcopy stage. The corresponding pending is introduced here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com	2018-03-13 17:05:41 -04:00
Pavel Dovgalyuk	377b21ccea	replay: fix save/load vm for non-empty queue This patch does not allows saving/loading vmstate when replay events queue is not empty. There is no reliable way to save events queue, because it describes internal coroutine state. Therefore saving and loading operations should be deferred to another record/replay step. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20180227095214.1060.32939.stgit@pasha-VirtualBox> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>	2018-03-12 16:12:50 +01:00
Peter Xu	dd0ee30cae	migration: fix applying wrong capabilities When setting migration capabilities via QMP/HMP, we'll apply them even if the capability check failed. Fix it. Fixes: `4a84214ebe` ("migration: provide migrate_caps_check()", 2017-07-18) Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180305094938.31374-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Peter Lieven	ef9c5160a1	migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS this actually limits (as the original commit mesage suggests) the number of I/O buffers that can be allocated and not the number of parallel (inflight) I/O requests. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1520507908-16743-4-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Peter Lieven	86b124bc76	migration/block: reset dirty bitmap before read in bulk phase Reset the dirty bitmap before reading to make sure we don't miss any new data. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1520507908-16743-3-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Peter Lieven	b255734531	migration: do not transfer ram during bulk storage migration this patch makes the bulk phase of a block migration to take place before we start transferring ram. As the bulk block migration can take a long time its pointless to transfer ram during that phase. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <1520507908-16743-2-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Marc-André Lureau	ab105cc138	migration: fix minor finalize leak Spotted thanks to ASAN: QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/migration-test -p /x86_64/migration/bad_dest ==30302==ERROR: LeakSanitizer: detected memory leaks Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1a38 in __interceptor_calloc (/lib64/libasan.so.4+0xdea38) #1 0x7f60eef3cf75 in g_malloc0 ../glib/gmem.c:124 #2 0x55ca9094702c in error_copy /home/elmarco/src/qemu/util/error.c:203 #3 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 #4 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 #5 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 #6 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 #7 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 #8 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 #9 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 #10 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 #11 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 #12 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 #13 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #14 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #15 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #16 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 #17 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 #18 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Indirect leak of 45 byte(s) in 1 object(s) allocated from: #0 0x7f60efba1850 in malloc (/lib64/libasan.so.4+0xde850) #1 0x7f60eef3cf0c in g_malloc ../glib/gmem.c:94 #2 0x7f60eef3d1cf in g_malloc_n ../glib/gmem.c:331 #3 0x7f60eef596eb in g_strdup ../glib/gstrfuncs.c:363 #4 0x55ca90947085 in error_copy /home/elmarco/src/qemu/util/error.c:204 #5 0x55ca9037a30f in migrate_set_error /home/elmarco/src/qemu/migration/migration.c:1139 #6 0x55ca9037a462 in migrate_fd_error /home/elmarco/src/qemu/migration/migration.c:1150 #7 0x55ca9038162b in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:2411 #8 0x55ca90386e41 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:81 #9 0x55ca9038335e in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:85 #10 0x55ca9083dd3a in qio_task_complete /home/elmarco/src/qemu/io/task.c:142 #11 0x55ca9083d6cc in gio_task_thread_result /home/elmarco/src/qemu/io/task.c:88 #12 0x7f60eef37317 in g_idle_dispatch ../glib/gmain.c:5552 #13 0x7f60eef3490b in g_main_dispatch ../glib/gmain.c:3182 #14 0x7f60eef357ac in g_main_context_dispatch ../glib/gmain.c:3847 #15 0x55ca90927231 in glib_pollfds_poll /home/elmarco/src/qemu/util/main-loop.c:214 #16 0x55ca90927420 in os_host_main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:261 #17 0x55ca909275fa in main_loop_wait /home/elmarco/src/qemu/util/main-loop.c:515 #18 0x55ca8fc1c2a4 in main_loop /home/elmarco/src/qemu/vl.c:1942 #19 0x55ca8fc2eb3a in main /home/elmarco/src/qemu/vl.c:4724 #20 0x7f60e4082009 in __libc_start_main (/lib64/libc.so.6+0x21009) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180306170959.3921-1-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-03-09 17:39:25 +00:00
Peter Xu	1939ccdaa6	qio: non-default context for TLS handshake A new parameter "context" is added to qio_channel_tls_handshake() is to allow the TLS to be run on a non-default context. Still, no functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-06 10:19:07 +00:00
Peter Xu	8005fdd8fa	qio: non-default context for async conn We have worked on qio_task_run_in_thread() already. Further, let all the qio channel APIs use that context. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>	2018-03-06 10:19:06 +00:00
Markus Armbruster	112ed241f5	qapi: Empty out qapi-schema.json The previous commit improved compile time by including less of the generated QAPI headers. This is impossible for stuff defined directly in qapi-schema.json, because that ends up in headers that that pull in everything. Move everything but include directives from qapi-schema.json to new sub-module qapi/misc.json, then include just the "misc" shard where possible. It's possible everywhere, except: * monitor.c needs qmp-command.h to get qmp_init_marshal() * monitor.c, ui/vnc.c and the generated qapi-event-FOO.c need qapi-event.h to get enum QAPIEvent Perhaps we'll get rid of those some other day. Adding a type to qapi/migration.json now recompiles some 120 instead of 2300 out of 5100 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-25-armbru@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Markus Armbruster	9af2398977	Include less of the generated modular QAPI headers In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>	2018-03-02 13:45:50 -06:00
Peter Xu	3e0c8050eb	migration: pass MigrationState to migrate_init() Let the callers take the object, then pass it to migrate_init(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-12-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:37:09 +00:00
Peter Xu	d6208e35e4	migration: allow send_rq to fail We will not allow failures to happen when sending data from destination to source via the return path. However it is possible that there can be errors along the way. This patch allows the migrate_send_rp_message() to return error when it happens, and further extended it to migrate_send_rp_req_pages(). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-9-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:36:52 +00:00
Peter Xu	9ab7ef9b66	migration: provide postcopy_fault_thread_notify() A general helper to notify the fault thread. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-4-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:36:02 +00:00
Peter Xu	64f615fe34	migration: reuse mis->userfault_quit_fd It was only used for quitting the page fault thread before. Let it be something more useful - now we can use it to notify a "wake" for the page fault thread (for any reason), and it only means "quit" if the fault_thread_quit is set. Since we changed what it does, renaming it to userfault_event_fd. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:35:45 +00:00
Peter Xu	7a9ddfbfae	migration: better error handling with QEMUFile If the postcopy down due to some reason, we can always see this on dst: qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000 However in most cases that's not the real issue. The problem is that qemu_get_be16() has no way to show whether the returned data is valid or not, and we are _always_ assuming it is valid. That's possibly not wise. The best approach to solve this would be: refactoring QEMUFile interface to allow the APIs to return error if there is. However it needs quite a bit of work and testing. For now, let's explicitly check the validity first before using the data in all places for qemu_get_*(). This patch tries to fix most of the cases I can see. Only if we are with this, can we make sure we are processing the valid data, and also can we make sure we can capture the channel down events correctly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:34:56 +00:00
Dr. David Alan Gilbert	b9ccaf6d74	migration: Fix early failure cleanup Avoid crash in cleanup after a very early migration failure (possibly due to my `688a3dcba9` 'Route errors down ...') Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180212160340.15333-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2018-02-14 10:31:01 +00:00
Ross Lagerwall	96994fd1e4	migration/xen: Check return value of qemu_fclose QEMUFile uses buffered IO so when writing small amounts (such as the Xen device state file), the actual write call and any errors that may occur only happen as part of qemu_fclose(). Therefore, report IO errors when saving the device state under Xen by checking the return value of qemu_fclose(). Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Message-Id: <20180206163039.23661-1-ross.lagerwall@citrix.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-14 10:18:42 +00:00
Markus Armbruster	452fcdbc49	Include qapi/qmp/qdict.h exactly where needed This cleanup makes the number of objects depending on qapi/qmp/qdict.h drop from 4550 (out of 4743) to 368 in my "build everything" tree. For qapi/qmp/qobject.h, the number drops from 4552 to 390. While there, separate #include from file comment with a blank line. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-13-armbru@redhat.com>	2018-02-09 13:52:15 +01:00
Markus Armbruster	15280c360e	qdict qlist: Make most helper macros functions The macro expansions of qdict_put_TYPE() and qlist_append_TYPE() need qbool.h, qnull.h, qnum.h and qstring.h to compile. We include qnull.h and qnum.h in the headers, but not qbool.h and qstring.h. Works, because we include those wherever the macros get used. Open-coding these helpers is of dubious value. Turn them into functions and drop the includes from the headers. This cleanup makes the number of objects depending on qapi/qmp/qnum.h from 4551 (out of 4743) to 46 in my "build everything" tree. For qapi/qmp/qnull.h, the number drops from 4552 to 21. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-10-armbru@redhat.com>	2018-02-09 13:52:15 +01:00
Markus Armbruster	e688df6bc4	Include qapi/error.h exactly where needed This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit `34e304e975` resolved, OSX breakage fixed]	2018-02-09 13:50:17 +01:00
Markus Armbruster	522ece32d2	Drop superfluous includes of qapi-types.h and test-qapi-types.h Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-4-armbru@redhat.com>	2018-02-09 05:05:11 +01:00
Greg Kurz	875fcd013a	migration: incoming postcopy advise sanity checks If postcopy-ram was set on the source but not on the destination, migration doesn't occur, the destination prints an error and boots the guest: qemu-system-ppc64: Expected vmdescription section, but got 0 We end up with two running instances. This behaviour was introduced in 2.11 by commit `58110f0acb` "migration: split common postcopy out of ram postcopy" to prepare ground for the upcoming dirty bitmap postcopy support. It adds a new case where the source may send an empty postcopy advise because dirty bitmap doesn't need to check page sizes like RAM postcopy does. If the source has enabled postcopy-ram, then it sends an advise with the page size values. If the destination hasn't enabled postcopy-ram, then loadvm_postcopy_handle_advise() leaves the page size values on the stream and returns. This confuses qemu_loadvm_state() later on and causes the destination to start execution. As discussed several times, postcopy-ram should be enabled both sides to be functional. This patch changes the destination to perform some extra checks on the advise length to ensure this is the case. Otherwise an error is returned and migration is aborted. Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <151791621042.19120.3103118434734245776.stgit@bahia> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 14:53:02 +00:00
Ross Lagerwall	032b79f717	migration: Don't leak IO channels Since qemu_fopen_channel_{in,out}put take references on the underlying IO channels, make sure to release our references to them. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Message-Id: <20171101142526.1006-2-ross.lagerwall@citrix.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 14:53:02 +00:00
Dr. David Alan Gilbert	6039dd5b1c	migration: Recover block devices if failure in device state In `e91d895` I added the new pause-before-switchover mechanism to allow migration completion to be delayed; this changes the last state prior to completion to MIGRATE_STATUS_DEVICE rather than MIGRATE_STATUS_ACTIVE. Fix the failure path in migration_completion to recover the block devices if it fails in MIGRATE_STATUS_DEVICE, not just the MIGRATE_STATUS_ACTIVE that it previously had. This corresponds to rh bz: https://bugzilla.redhat.com/show_bug.cgi?id=1538494 whose symptom is an occasional source crash on a failed migration. Fixes: `e91d8951d5` Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 14:53:02 +00:00
Juan Quintela	7faccdc3e7	migration: Drop current address parameter from save_zero_page() It already has RAMBlock and offset, it can calculate it itself. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:13 +00:00
Wei Wang	0781c1ed1c	migration: use s->threshold_size inside migration_update_counters Fixes: `b15df1ae50` ("migration: cleanup stats update into function") The threshold size is changed to be recorded in s->threshold_size. Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:13 +00:00
Daniel Henrique Barboza	ee555cdf4d	migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32 MAX_VM_CMD_PACKAGED_SIZE is a constant used in qemu_savevm_send_packaged and loadvm_handle_cmd_packaged to determine whether a package is too big to be sent or received. qemu_savevm_send_packaged is called inside postcopy_start (migration/migration.c) to send the MigrationState in a single blob to the destination, using the MIG_CMD_PACKAGED subcommand, which will read it up using loadvm_handle_cmd_packaged. If the blob is larger than MAX_VM_CMD_PACKAGED_SIZE, an error is thrown and the postcopy migration is aborted. Both MAX_VM_CMD_PACKAGED_SIZE and MIG_CMD_PACKAGED were introduced by commit `11cf1d984b` ("MIG_CMD_PACKAGED: Send a packaged chunk ..."). The constant has its original value of 1ul << 24 (16MB). The current MAX_VM_CMD_PACKAGED_SIZE value is not enough to support postcopy migration of bigger pseries guests. The blob size for a postcopy migration of a pseries guest with the following setup: qemu-system-ppc64 --nographic -vga none -machine pseries,accel=kvm -m 64G \ -smp 1,maxcpus=32 -device virtio-blk-pci,drive=rootdisk \ -drive file=f27.qcow2,if=none,cache=none,format=qcow2,id=rootdisk \ -netdev user,id=u1 -net nic,netdev=u1 Goes around 12MB. Bumping the RAM to 128G makes the blob sizes goes to 20MB. With 256G the blob goes to 37MB - more than twice the current maximum size. At this moment the pseries machine can handle guests with up to 1TB of RAM, making this postcopy blob goes to 128MB of size approximately. Following the discussions made in [1], there is a need to understand what devices are aggressively consuming the blob in that manner and see if that can be mitigated. Until then, we can set MAX_VM_CMD_PACKAGED_SIZE to the maximum value allowed. Since the size is a 32 bit int variable, we can set it as 1ul << 32, giving a maximum blob size of 4G that is enough to support postcopy migration of 32TB RAM guests given the above constraints. [1] https://lists.nongnu.org/archive/html/qemu-devel/2018-01/msg06313.html Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:13 +00:00
Dr. David Alan Gilbert	688a3dcba9	migration: Route errors down through migration_channel_connect Route async errors (especially from sockets) down through migration_channel_connect and on to migrate_fd_connect where they can be cleaned up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:12 +00:00
Dr. David Alan Gilbert	cce8040bb0	migration: Allow migrate_fd_connect to take an Error * Allow whatever is performing the connection to pass migrate_fd_connect an error to indicate there was a problem during connection, an allow us to clean up. The caller must free the error. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2018-02-06 10:55:12 +00:00
Peter Maydell	52483b067c	Pull request for various patches that have been reviewed and laying on the mailing list for a while, but apparently no maintainer feels really responsible for picking up. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJaZcaYAAoJEC7Z13T+cC21ab0P/3fE52pp0BEWRfM3MkTyJCgs c3ZDR2raLsrwl5MMoeV6TCZ8ILp3RR5ipdnpsxAlUKi0e953hduFXJ/A9meElogu i1BdCl7SBYUWXg9WKqH5cX9LGiGiRLQ53KehCB6wa4nXBjkL1bGtbprbp5kCb+Sn tavBoIqkwxC2VvqJHL23uS7n/3bPkr4XA1bA/VWezm7be6f5bEqBzORdxPabRC3f M7b1ntl2Xj9PXpwKZkHgET8Wg1Ne5kCUvvx9o22iMuHhBHsxAmMc06Q96wihDUI3 /CwzaErGrykGRX95y++yaBMUEYMSk90dv9cXHTMryDw/0id0OMnpcGm6SeQHlcQT ATrhnH1VzEcgGJPYpqKxNvb0pJZ7t7gYUSi0HMC83PG2S9wD/kBvtB8rqumTolKB cvI6l7PFfCZIr3FsTyGHX1KVRHX8PWljnKIvAbyUEuK4XDnSX7hT+7jQvuxj4HSZ /ZlksnSIHcUf5gx6zG2StVKo5TEnY6JUhf8CuQeILW0ZGj6V/aUFbR5aMmgVinyj p/2OOUsze6rYCcpVE2kI7hSMrSXk4QfvPyIvjGf86EVxTM4SK03bP4QVFwtZtFzA HXXFIgKNcE7I7BhghC6PLcmHaYBiK8A5EIvXpgi4tD7L3PFLos6wtR6T2ze3mBn2 JyKxGw9rExlAJMi5/WE9 =JCAM -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/huth/tags/pull-request-2018-01-22' into staging Pull request for various patches that have been reviewed and laying on the mailing list for a while, but apparently no maintainer feels really responsible for picking up. # gpg: Signature made Mon 22 Jan 2018 11:10:16 GMT # gpg: using RSA key 0x2ED9D774FE702DB5 # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" # gpg: aka "Thomas Huth <thuth@redhat.com>" # gpg: aka "Thomas Huth <huth@tuxfamily.org>" # gpg: aka "Thomas Huth <th.huth@posteo.de>" # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/huth/tags/pull-request-2018-01-22: hw/isa: Replace fprintf(stderr, "\n" with error_report() hw/ipmi: Replace fprintf(stderr, "\n" with error_report() hw/bt: Replace fprintf(stderr, "*\n" with error_report() Fixes after renaming __FUNCTION__ to __func__ Replace all occurances of __FUNCTION__ with __func__ tests/cpu-plug-test: Test CPU hot-plugging on s390x tests/cpu-plug-test: Check CPU hot-plugging on ppc64, too tests/cpu-plug-test: Check the CPU hot-plugging with device_add, too tests: Rename pc-cpu-test.c to cpu-plug-test.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-01-23 10:15:09 +00:00
Peter Maydell	ee86981bda	migration: Revert postcopy-blocktime commit set This reverts commits `ca6011c` migration: add postcopy total blocktime into query-migrate `5f32dc8` migration: add blocktime calculation into migration-test `2f7dae9` migration: postcopy_blocktime documentation `3be98be` migration: calculate vCPU blocktime on dst side `01a87f0` migration: add postcopy blocktime ctx into MigrationIncomingState `31bf06a` migration: introduce postcopy-blocktime capability as they don't build on ppc32 due to trying to do atomic accesses on types that are larger than the host pointer type. Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-01-23 10:08:05 +00:00
Alistair Francis	a89f364ae8	Replace all occurances of __FUNCTION__ with __func__ Replace all occurs of __FUNCTION__ except for the check in checkpatch with the non GCC specific __func__. One line in hcd-musb.c was manually tweaked to pass checkpatch. Signed-off-by: Alistair Francis <alistair.francis@xilinx.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> [THH: Removed hunks related to pxa2xx_mmci.c (fixed already)] Signed-off-by: Thomas Huth <thuth@redhat.com>	2018-01-22 09:46:18 +01:00
Peter Maydell	c1d5b9add7	* QemuMutex tracing improvements (Alex) * ram_addr_t optimization (David) * SCSI fixes (Fam, Stefan, me) * do {} while (0) fixes (Eric) * KVM fix for PMU (Jan) * memory leak fixes from ASAN (Marc-André) * migration fix for HPET, icount, loadvm (Maria, Pavel) * hflags fixes (me, Tao) * block/iscsi uninitialized variable (Peter L.) * full support for GMainContexts in character devices (Peter Xu) * more boot-serial-test (Thomas) * Memory leak fix (Zhecheng) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJaXgkRAAoJEL/70l94x66DA3EIAI8z8Y+1NAmbLqiHhrrN9Ji/ b8EHQ8wf0pwwrHuRVKYZvKUU8yvp/CRIoVWZwfeGjRbZC+l7l+BAwdOx42Bj/dUW VopNzcJMu3s5SNwoYLvs01OjhciBYNXWTXBkIiErwurF0Ow7oYR7trkLwOw0veSO L4qFAGoIBI/7b6BZ3YRQXshhzdSQ6dvHrDness2V1c0crLG+yhvjKJ8PJ2tJyNZO DbsrCd7hS6e6liSUqdLj9XgRySFj9R5kgjaLjckjg1SC6kmhLN9hyke8iXgH7uvz WGnRPmKjKexFHVYgR0rRFlazcQclAczHuIi/OZe0HLi6trg2YKBkolMaQLQdgfk= =HTyS -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * QemuMutex tracing improvements (Alex) * ram_addr_t optimization (David) * SCSI fixes (Fam, Stefan, me) * do {} while (0) fixes (Eric) * KVM fix for PMU (Jan) * memory leak fixes from ASAN (Marc-André) * migration fix for HPET, icount, loadvm (Maria, Pavel) * hflags fixes (me, Tao) * block/iscsi uninitialized variable (Peter L.) * full support for GMainContexts in character devices (Peter Xu) * more boot-serial-test (Thomas) * Memory leak fix (Zhecheng) # gpg: Signature made Tue 16 Jan 2018 14:15:45 GMT # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (51 commits) scripts/analyse-locks-simpletrace.py: script to analyse lock times util/qemu-thread-: add qemu_lock, locked and unlock trace events cpu: flush TB cache when loading VMState block/iscsi: fix initialization of iTask in iscsi_co_get_block_status find_ram_offset: Align ram_addr_t allocation on long boundaries find_ram_offset: Add comments and tracing cpu_physical_memory_sync_dirty_bitmap: Another alignment fix checkpatch: Enforce proper do/while (0) style maint: Fix macros with broken 'do/while(0); ' usage tests: Avoid 'do/while(false); ' in vhost-user-bridge chardev: Clean up previous patch indentation chardev: Use goto/label instead of do/break/while(0) mips: Tweak location of ';' in macros net: Drop unusual use of do { } while (0); irq: fix memory leak cpus: unify qemu__wait_io_event icount: fixed saving/restoring of icount warp timers scripts/qemu-gdb/timers.py: new helper to dump timer state scripts/qemu-gdb: add simple tcg lock status helper target-i386: update hflags on Hypervisor.framework ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-01-16 15:45:15 +00:00
Eric Blake	2562755ee7	maint: Fix macros with broken 'do/while(0); ' usage The point of writing a macro embedded in a 'do { ... } while (0)' loop (particularly if the macro has multiple statements or would otherwise end with an 'if' statement) is so that the macro can be used as a drop-in statement with the caller supplying the trailing ';'. Although our coding style frowns on brace-less 'if': if (cond) statement; else something else; that is the classic case where failure to use do/while(0) wrapping would cause the 'else' to pair with any embedded 'if' in the macro rather than the intended outer 'if'. But conversely, if the macro includes an embedded ';', then the same brace-less coding style would now have two statements, making the 'else' a syntax error rather than pairing with the outer 'if'. Thus, even though our coding style with required braces is not impacted, ending a macro with ';' makes our code harder to port to projects that use brace-less styles. The change should have no semantic impact. I was not able to fully compile-test all of the changes (as some of them are examples of the ugly bit-rotting debug print statements that are completely elided by default, and I didn't want to recompile with the necessary -D witnesses - cleaning those up is left as a bite-sized task for another day); I did, however, audit that for all files touched, all callers of the changed macros DID supply a trailing ';' at the callsite, and did not appear to be used as part of a brace-less conditional. Found mechanically via: $ git grep -B1 'while (0);' \| grep -A1 \\\\ Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20171201232433.25193-7-eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-01-16 14:54:52 +01:00
Peter Xu	816306826a	migration: remove notify in fd_error It is already called in migrate_fd_cleanup. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:13 +01:00
Peter Xu	26978faf2f	migration: remove some block_cleanup_parameters() Keep the one in migrate_fd_cleanup() would be enough. Removing the other two. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:12 +01:00
Peter Xu	199aa6d4e4	migration: put the finish part into a new function This patch only moved the last part of migration_thread() into a new function migration_iteration_finish() to make it much shorter. With previous works to remove some local variables, now it's fairly easy to do that. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:11 +01:00
Peter Xu	2ad873057e	migration: major cleanup for migrate iterations The major work for migration iterations are to move RAM/block/... data via qemu_savevm_state_iterate(). Generalize those part into a single function. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:11 +01:00
Peter Xu	b15df1ae50	migration: cleanup stats update into function We have quite a few lines in migration_thread() that calculates some statistics for the migration interations. Isolate it into a single function to improve readability. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:10 +01:00
Peter Xu	39b9e17905	migration: use switch at the end of migration It converts the old if clauses into switch, explicitly mentions the possible migration states. The old nested "if"s are not clear on what we do on different states. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:10 +01:00
Peter Xu	cf011f082d	migration: introduce migrate_calculate_complete Generalize the calculation part when migration complete into a function to simplify migration_thread(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:09 +01:00
Peter Xu	64909f9740	migration: introduce downtime_start Introduce MigrationState.downtime_start to replace the local variable "start_time" in migration_thread to avoid passing things around. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:09 +01:00
Peter Xu	7287cbd46e	migration: move vm_old_running into global state Firstly, it was passed around. Let's just move it into MigrationState just like many other variables as state of migration, renaming it to vm_was_running. One thing to mention is that for postcopy, we actually don't need this knowledge at all since postcopy can't resume a VM even if it fails (we can see that from the old code too: when we try to resume we also check against "entered_postcopy" variable). So further we do this: - in postcopy_start(), we don't update vm_old_running since useless - in migration_thread(), we don't need to check entered_postcopy when resume, since it's only used for precopy. Comment this out too for that variable definition. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:08 +01:00
Peter Xu	4af246a34e	migration: split use of MigrationState.total_time It was used either to: 1. store initial timestamp of migration start, and 2. store total time used by last migration Let's provide two parameters for each of them. Mix use of the two is slightly misleading. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:08 +01:00
Peter Xu	deb74fb670	migration: remove "enable_colo" var It's only used once, clean it up a bit. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:07 +01:00
Peter Xu	0ceccd858a	migration: qemu_savevm_state_cleanup() in cleanup Moving existing callers all into migrate_fd_cleanup(). It simplifies migration_thread() a bit. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:06 +01:00
Peter Xu	0d649a0e95	migration: assert colo instead of check When reaching here if we are still "active" it means we must be in colo state. After a quick discussion offlist, we decided to use the safer error_report(). Finally I want to use "switch" here rather than lots of complicated if clauses. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:06 +01:00
Vladimir Sementsov-Ogievskiy	1f8956041a	migration: finalize current_migration object current_migration has .instance_finalize callback, but it is not called, because nobody unrefs current_migration. Fix that. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:05 +01:00
Dr. David Alan Gilbert	bae416e5ba	migration: Guard ram_bytes_remaining against early call Calling ram_bytes_remaining during the early part of setup is unsafe because the ram_state isn't yet initialised. This can happen in the sequence: migrate migrate_cancel info migrate if the migrate sticks trying to connect (e.g. to an unresponsive destination due to the connect timeout). Here 'info migrate' sees a state of CANCELLING and so assumes the migrate has partially happened. partial fix for: RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1525899 Reported-by: Xianxian Wang <xianwang@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:04 +01:00
Alexey Perevalov	ca6011c232	migration: add postcopy total blocktime into query-migrate Postcopy total blocktime is available on destination side only. But query-migrate was possible only for source. This patch adds ability to call query-migrate on destination. To be able to see postcopy blocktime, need to request postcopy-blocktime capability. The query-migrate command will show following sample result: {"return": "postcopy-vcpu-blocktime": [115, 100], "status": "completed", "postcopy-blocktime": 100 }} postcopy_vcpu_blocktime contains list, where the first item is the first vCPU in QEMU. This patch has a drawback, it combines states of incoming and outgoing migration. Ongoing migration state will overwrite incoming state. Looks like better to separate query-migrate for incoming and outgoing migration or add parameter to indicate type of migration. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:04 +01:00
Alexey Perevalov	3be98be4e9	migration: calculate vCPU blocktime on dst side This patch provides blocktime calculation per vCPU, as a summary and as a overlapped value for all vCPUs. This approach was suggested by Peter Xu, as an improvements of previous approch where QEMU kept tree with faulted page address and cpus bitmask in it. Now QEMU is keeping array with faulted page address as value and vCPU as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps list for blocktime per vCPU (could be traced with page_fault_addr) Blocktime will not calculated if postcopy_blocktime field of MigrationIncomingState wasn't initialized. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:01 +01:00
Alexey Perevalov	01a87f0bd3	migration: add postcopy blocktime ctx into MigrationIncomingState This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID, in case this feature is provided by kernel. PostcopyBlocktimeContext is encapsulated inside postcopy-ram.c, due to it being a postcopy-only feature. Also it defines PostcopyBlocktimeContext's instance live time. Information from PostcopyBlocktimeContext instance will be provided much after postcopy migration end, instance of PostcopyBlocktimeContext will live till QEMU exit, but part of it (vcpu_addr, page_fault_vcpu_time) used only during calculation, will be released when postcopy ended or failed. To enable postcopy blocktime calculation on destination, need to request proper compatibility (Patch for documentation will be at the tail of the patch set). As an example following command enable that capability, assume QEMU was started with -chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock option to control it [root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \ {\"execute\": \"migrate-set-capabilities\" , \"arguments\": { \"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\": true } ] } }" \| nc -U /var/lib/migrate-vm-monitor.sock Or just with HMP (qemu) migrate_set_capability postcopy-blocktime on Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:48:00 +01:00
Alexey Perevalov	31bf06a9d6	migration: introduce postcopy-blocktime capability Right now it could be used on destination side to enable vCPU blocktime calculation for postcopy live migration. vCPU blocktime - it's time since vCPU thread was put into interruptible sleep, till memory page was copied and thread awake. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2018-01-15 12:47:59 +01:00
Juan Quintela	9102d27e33	migration: free addr in the same function that we created it Otherwise, we can't use it after calling socket_start_incoming_migration Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2018-01-15 12:47:55 +01:00
Juan Quintela	6f0f642835	migration: print features as on off Once there, do one thing for line Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2018-01-15 12:47:54 +01:00
Juan Quintela	741d4086c8	migration: Use proper types in json We use int for everything (int64_t), and then we check that value is between 0 and 255. Change it to the valid types. This change only happens for HMP. QMP always use bytes and similar. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2018-01-15 12:47:53 +01:00
Ladi Prosek	3c254ab8d7	Remove empty statements Thanks to Laszlo Ersek for spotting the double semicolon in target/i386/kvm.c I have trivially grepped the tree for ';;' in C files. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-12-18 17:07:02 +03:00
Daniel Henrique Barboza	acab30b85d	migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_END When migrating a VM with 'migrate_set_capability postcopy-ram on' a postcopy_state is set during the process, ending up with the state POSTCOPY_INCOMING_END when the migration is over. This postcopy_state is taken into account inside ram_load to check how it will load the memory pages. This same ram_load is called when in a loadvm command. Inside ram_load, the logic to see if we're at postcopy_running state is: postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING postcopy_state_get() returns this enum type: typedef enum { POSTCOPY_INCOMING_NONE = 0, POSTCOPY_INCOMING_ADVISE, POSTCOPY_INCOMING_DISCARD, POSTCOPY_INCOMING_LISTENING, POSTCOPY_INCOMING_RUNNING, POSTCOPY_INCOMING_END } PostcopyState; In the case where ram_load is executed and postcopy_state is POSTCOPY_INCOMING_END, postcopy_running will be set to 'true' and ram_load will behave like a postcopy is in progress. This scenario isn't achievable in a migration but it is reproducible when executing savevm/loadvm after migrating with 'postcopy-ram on', causing loadvm to fail with Error -22: Source: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate tcp:127.0.0.1:4444 Dest: (qemu) migrate_set_capability postcopy-ram on (qemu) ubuntu1704-intel login: Ubuntu 17.04 ubuntu1704-intel ttyS0 ubuntu1704-intel login: (qemu) (qemu) savevm test1 (qemu) loadvm test1 Unknown combination of migration flags: 0x4 (postcopy mode) error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state (qemu) This patch fixes this problem by changing the existing logic for postcopy_advised and postcopy_running in ram_load, making them 'false' if we're at POSTCOPY_INCOMING_END state. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-11-22 08:50:37 +01:00
Anthony PERARD	5d6c599fe1	migration, xen: Fix block image lock issue on live migration When doing a live migration of a Xen guest with libxl, the images for block devices are locked by the original QEMU process, and this prevent the QEMU at the destination to take the lock and the migration fail. >From QEMU point of view, once the RAM of a domain is migrated, there is two QMP commands, "stop" then "xen-save-devices-state", at which point a new QEMU is spawned at the destination. Release locks in "xen-save-devices-state" so the destination can takes them, if it's a live migration. This patch add the "live" parameter to "xen-save-devices-state" which default to true so older version of libxenlight can work with newer version of QEMU. Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-11-21 19:42:26 +01:00
Kevin Wolf	2b624fe079	block: Add errp to bdrv_all_goto_snapshot() Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: John Snow <jsnow@redhat.com>	2017-11-21 14:48:22 +01:00
Max Reitz	5e003f17ec	block: Make bdrv_next() keep strong references On one hand, it is a good idea for bdrv_next() to return a strong reference because ideally nearly every pointer should be refcounted. This fixes intermittent failure of iotest 194. On the other, it is absolutely necessary for bdrv_next() itself to keep a strong reference to both the BB (in its first phase) and the BDS (at least in the second phase) because when called the next time, it will dereference those objects to get a link to the next one. Therefore, it needs these objects to stay around until then. Just storing the pointer to the next in the iterator is not really viable because that pointer might become invalid as well. Both arguments taken together means we should probably just invoke bdrv_ref() and blk_ref() in bdrv_next(). This means we have to assert that bdrv_next() is always called from the main loop, but that was probably necessary already before this patch and judging from the callers, it also looks to actually be the case. Keeping these strong references means however that callers need to give them up if they decide to abort the iteration early. They can do so through the new bdrv_next_cleanup() function. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20171110172545.32609-1-mreitz@redhat.com Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2017-11-17 18:21:31 +01:00
Juan Quintela	73af8dd8d7	migration: Make xbzrle_cache_size a migration parameter Right now it is a variable in MigrationState instead of a MigrationParameter. The change allows to set it as the rest of the Migration parameters, from the command line, with query_migration_paramters, set_migrate_parameters, etc. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-29 14:06:15 +01:00
Juan Quintela	c9dede2d48	migration: No need to return the size of the cache After the previous commits, we make sure that the value passed is right, or we just drop an error. So now we return if there is one error or we have setup correctly the value passed. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Improve error messasge Return 0 always for success	2017-10-29 14:06:15 +01:00
Juan Quintela	2a313e5cf6	migration: Don't play games with the requested cache size Now that we check that the value passed is a power of 2, we don't need to play games when comparing what is the size that is going to take the cache. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-29 14:06:15 +01:00
Juan Quintela	bab01ed4e8	migration: Make sure that we pass the right cache size Instead of passing silently round down the number of pages, make it an error that the cache size is not a power of 2. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-29 14:06:15 +01:00
Juan Quintela	87db1a7d89	migration: Improve migration thread error handling We now report errors also when we finish migration, not only on info migrate. We plan to use this error from several places, and we want the first error to happen to win, so we add an mutex to order it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-23 18:03:43 +02:00
Alexey Perevalov	f949461489	migration: add bitmap for received page This patch adds ability to track down already received pages, it's necessary for calculation vCPU block time in postcopy migration feature, and for recovery after postcopy migration failure. Also it's necessary to solve shared memory issue in postcopy livemigration. Information about received pages will be transferred to the software virtual bridge (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for already received pages. fallocate syscall is required for remmaped shared memory, due to remmaping itself blocks ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT error (struct page is exists after remmap). Bitmap is placed into RAMBlock as another postcopy/precopy related bitmaps. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:41 +02:00
Alexey Perevalov	727b9d7e49	migration: introduce qemu_ufd_copy_ioctl helper Just for placing auxilary operations inside helper, auxilary operations like: track received pages, notify about copying operation in futher patches. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:40 +02:00
Alexey Perevalov	8be4620be2	migration: postcopy_place_page factoring out Need to mark copied pages as closer as possible to the place where it tracks down. That will be necessary in futher patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:39 +02:00
Peter Xu	d6eff5d75d	migration: new ram_init_bitmaps() Rearrange the bitmap initialization and the first sync. Since at it, make sure the locks are taken/released in correct order (I moved RCU unlock upper - though it may not affect much). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:38 +02:00
Peter Xu	84593a0807	migration: clean up xbzrle cache init/destroy Let's further simplify ram_init_all() and ram_save_cleanup() by abstract all the XBZRLE related codes into their own functions. When allocating xbzrle cache, we are always very careful on -ENOMEM; which makes sense. Replacing the last g_malloc0() with g_try_malloc0(), then refactor the logic a bit. This patch should be fixing some memory leaks when some memory allocation failed for XBZRLE in the past. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:37 +02:00
Peter Xu	7d7c96be7b	migration: provide ram_state_cleanup There are two Mutexes that are created but not yet destroyed for RAMState. Fix that. Since we are at it, provide helper function to clean up RAMState. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:36 +02:00
Peter Xu	7d00ee6ad6	migration: provide ram_state_init() The old ram_state_init() is not really initializing the RAMState only, but including lots of other stuff that is RAM-related. Renaming it to ram_init_all(). Instead, provide a real ram_state_init(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:34 +02:00
Dr. David Alan Gilbert	0331c8cabf	migration: pause-before-switchover for postcopy Add pause-before-switchover support for postcopy. After starting postcopy it will transition active->pre-switchover->postcopy_active Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:33 +02:00
Dr. David Alan Gilbert	a7b36b486d	migration: allow cancel to unpause If a migration_cancel is issued during the new paused state, kick the pause_sem to get to unpause so it can cancel. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:32 +02:00
Dr. David Alan Gilbert	89cfc02cb6	migration: migrate-continue A new qmp command allows the caller to continue from a given paused state. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:30 +02:00
Dr. David Alan Gilbert	e91d8951d5	migration: Wait for semaphore before completing migration Wait for a semaphore before completing the migration, if the previously added capability was enabled. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:29 +02:00
Dr. David Alan Gilbert	31e060774c	migration: Add 'pre-switchover' and 'device' statuses Add two statuses for use when the 'pause-before-switchover' capability is enabled. 'pre-switchover' is the state that we wait in for management to allow us to continue. 'device' is the state we enter while serialising the devices after management gives us the OK. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:28 +02:00
Dr. David Alan Gilbert	93fbd0314e	migration: Add 'pause-before-switchover' capability When 'pause-before-switchover' is enabled, the outgoing migration will pause before invalidating the block devices and serializing the device state. At this point the management layer gets the chance to clean up any device jobs or other device users before the migration completes. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-10-23 18:03:27 +02:00
Juan Quintela	80f8dfde97	migration: Make cache_init() take an error parameter Once there, take a total size instead of the size of the pages. We move the check that the new_size is bigger than one page from xbzrle_cache_resize(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix typo spotted by Peter Xu	2017-10-23 18:03:25 +02:00
Juan Quintela	8acabf69ea	migration: Move xbzrle cache resize error handling to xbzrle_cache_resize Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-10-23 18:03:24 +02:00
Juan Quintela	9ca3f96394	migration: Make cache size elements use the right types Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-10-23 18:03:23 +02:00
Juan Quintela	ceaaecb49f	migratiom: Remove max_item_age parameter It was not used at all since commit: `27af7d6ea5` which replaced its use by the dirty sync count. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-10-23 18:03:22 +02:00
Juan Quintela	5e7577a101	migration: Fix migrate_test_apply for multifd parameters They were missing when introduced on the tree Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-10-23 18:03:21 +02:00
Eric Blake	e0d7f73e63	dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes Some of the callers were already scaling bytes to sectors; others can be easily converted to pass byte offsets, all in our shift towards a consistent byte interface everywhere. Making the change will also make it easier to write the hold-out callers to use byte rather than sectors for their iterations; it also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Although all callers happen to pass sector-aligned values, make the internal scaling robust to any sub-sector requests. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	3b5d4df0c6	dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes Half the callers were already scaling bytes to sectors; the other half can eventually be simplified to use byte iteration. Both callers were already using the result as a bool, so make that explicit. Making the change also makes it easier for a future dirty-bitmap patch to offload scaling over to the internal hbitmap. Remember, asking whether a byte is dirty is effectively asking whether the entire granularity containing the byte is dirty, since we only track dirtiness by granularity. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Eric Blake	9a46dba7b7	dirty-bitmap: Change bdrv_get_dirty_count() to report bytes Thanks to recent cleanups, all callers were scaling a return value of sectors into bytes; do the scaling internally instead. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-10-06 16:28:58 +02:00
Dr. David Alan Gilbert	2f168d0708	migration: Route more error paths vmstate_save_state is called in lots of places. Route error returns from the easier cases back up; there are lots of more complex cases where their own error paths need fixing. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-7-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Commit message fix up as Peter's review	2017-09-27 11:44:18 +01:00
Dr. David Alan Gilbert	687433f611	migration: Route errors up through vmstate_save Route the errors from vsmtate_save_state back up through vmstate_save and out to the normal device state path. That's the normal error path done. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-6-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:41:03 +01:00
Dr. David Alan Gilbert	f3cadd39c4	migration: wire vmstate_save_state errors up to vmstate_subsection_save Route the errors from vmstate_save_state up through vmstate_subsection_save (and back down, all rather recursive). Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-5-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Commit message fixed up as per Peter's review	2017-09-27 11:38:21 +01:00
Dr. David Alan Gilbert	88b0faf185	migration: Check field save returns Check the return values from vmstate_save_state for fields and also the return values from 'put' for fields that use that. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:37:11 +01:00
Dr. David Alan Gilbert	551dbd0846	migration: check pre_save return in vmstate_save_state Check the return value of pre_save state and fail vmstate_save_state if the pre_save failed. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:36:31 +01:00
Dr. David Alan Gilbert	44b1ff319c	migration: pre_save return int Modify the pre_save method on VMStateDescription to return an int rather than void so that it potentially can fail. Changed zillions of devices to make them return 0; the only case I've made it return non-0 is hw/intc/s390_flic_kvm.c that already had an error_report/return case. Note: If you add an error exit in your pre_save you must emit an error_report to say why. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170925112917.21340-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:35:59 +01:00
Peter Lieven	9ac78b6171	migration: disable auto-converge during bulk block migration auto-converge and block migration currently do not play well together. During block migration the auto-converge logic detects that ram migration makes no progress and thus throttles down the vm until it nearly stalls completely. Avoid this by disabling the throttling logic during the bulk phase of the block migration. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-27 11:27:14 +01:00
Alexey Perevalov	54ae0886b1	migration: split ufd_version_check onto receive/request features part This modification is necessary for userfault fd features which are required to be requested from userspace. UFFD_FEATURE_THREAD_ID is a one of such "on demand" feature, which will be introduced in the next patch. QEMU have to use separate userfault file descriptor, due to userfault context has internal state, and after first call of ioctl UFFD_API it changes its state to UFFD_STATE_RUNNING (in case of success), but kernel while handling ioctl UFFD_API expects UFFD_STATE_WAIT_API. So only one ioctl with UFFD_API is possible per ufd. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:29 +02:00
Alexey Perevalov	5553499f04	migration: fix hardcoded function name in error report Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:28 +02:00
Alexey Perevalov	d7651f150d	migration: pass MigrationIncomingState* into migration check functions That tiny refactoring is necessary to be able to set UFFD_FEATURE_THREAD_ID while requesting features, and then to create downtime context in case when kernel supports it. Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:27 +02:00
Vladimir Sementsov-Ogievskiy	58110f0acb	migration: split common postcopy out of ram postcopy Split common postcopy staff from ram postcopy staff. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:27 +02:00
Vladimir Sementsov-Ogievskiy	86e1167e9a	migration: fix ram_save_pending Fill postcopy-able pending only if ram postcopy is enabled. It is necessary because of there will be other postcopy-able states and when ram postcopy is disabled, it should not spoil common postcopy related pending. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:26 +02:00
Vladimir Sementsov-Ogievskiy	c646762736	migration: add has_postcopy savevm handler Now postcopy-able states are recognized by not NULL save_live_complete_postcopy handler. But when we have several different postcopy-able states, it is not convenient. Ram postcopy may be disabled, while some other postcopy enabled, in this case Ram state should behave as it is not postcopy-able. This patch add separate has_postcopy handler to specify behaviour of savevm state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-09-22 14:11:25 +02:00
Juan Quintela	e595a01ab6	migration: Split migration_fd_process_incoming We need that on later patches. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-22 14:11:23 +02:00
Juan Quintela	f986c3d256	migration: Create multifd migration threads Creation of the threads, nothing inside yet. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Use pointers instead of long array names Move to use semaphores instead of conditions as paolo suggestion Put all the state inside one struct. Use a counter for the number of threads created. Needed during cancellation. Add error return to thread creation Add id field Rename functions to multifd_save/load_setup/cleanup Change recv parameters to a pointer to struct Change back to a struct Use Error * for _cleanup	2017-09-22 14:11:22 +02:00
Juan Quintela	0fb86605ea	migration: Create x-multifd-page-count parameter Indicates how many pages we are going to send in each batch to a multifd thread. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Be consistent with defaults and documentation Use new DEFINE_PROP_* Rename x-multifd-group to x-multifd-page-count	2017-09-22 14:11:21 +02:00
Juan Quintela	4075fb1ca4	migration: Create x-multifd-channels parameter Indicates the number of channels that we will create. By default we create 2 channels. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Catch inconsistent defaults (eric). Improve comment stating that number of threads is the same than number of sockets Use new DEFIN_PROP_* Rename x-multifd-threads to x-multifd-threads	2017-09-22 14:11:21 +02:00
Juan Quintela	30126bbf1f	migration: Add multifd capability Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> -- Use new DEFINE_PROP	2017-09-22 14:11:20 +02:00
Juan Quintela	428d89084c	migration: Create migration_has_all_channels This function allows us to decide when to close the listener socket. For now, we only need one connection. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-22 14:11:19 +02:00
Juan Quintela	8e1a1931ca	migration: Add comments to channel functions Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>	2017-09-22 14:11:18 +02:00
Juan Quintela	2a543bfdfa	migration: Teach it about G_SOURCE_REMOVE As this is defined on glib 2.32, add compatibility macros for older glibs. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-09-22 14:11:18 +02:00
Juan Quintela	4f0fae7f2b	migration: Create migration_ioc_process_incoming() We pass the ioc instead of the fd. This will allow us to have more than one channel open. We also make sure that we set the from_src_file sooner, so we don't need to pass it as a parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> -- Do not assing mis->from_src_file (peterxu)	2017-09-22 14:11:17 +02:00
Fam Zheng	392fb64351	buildsys: Move rdma libs to per object Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170907084230.26493-1-famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-09-22 10:20:34 +08:00
Peter Xu	a31fedeed7	migration: dump str in migrate_set_state trace Strings are more readable for debugging. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1504081950-2528-5-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up merge with 977c73	2017-09-06 16:36:38 +01:00
Dr. David Alan Gilbert	5089e1862f	migration: Reset rather than destroy main_thread_load_event migration_incoming_state_destroy doesn't really destroy, it cleans up. After a loadvm it's called, but the loadvm command can be run twice, and so destroying an init-once mutex breaks on the second loadvm. Reported-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170825141940.20740-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-06 15:18:21 +01:00
Markus Armbruster	d0834eb539	xbzrle: Drop unused cache_resize() Unused since commit fd8cec XBZRLE: Fix qemu crash when resize the xbzrle cache. Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1501148776-16890-2-git-send-email-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-06 14:36:54 +01:00
Dr. David Alan Gilbert	c232bf58cb	migration: Report when bdrv_inactivate_all fails If the bdrv_inactivate_all fails near the end of the migration, the migration will fail and often the only diagnostics in the log are an I/O error which you can't distinguish from an error on the socket connection. Add an error so we know when it's actually a block problem. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170822170212.27347-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-09-06 14:25:26 +01:00
Marc-André Lureau	f7abe0ecd4	qapi: Change data type of the FOO_lookup generated for enum FOO Currently, a FOO_lookup is an array of strings terminated by a NULL sentinel. A future patch will generate enums with "holes". NULL-termination will cease to work then. To prepare for that, store the length in the FOO_lookup by wrapping it in a struct and adding a member for the length. The sentinel will be dropped next. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170822132255.23945-13-marcandre.lureau@redhat.com> [Basically redone] Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-16-git-send-email-armbru@redhat.com> [Rebased]	2017-09-04 13:09:13 +02:00
Markus Armbruster	977c736f80	qapi: Mechanically convert FOO_lookup[...] to FOO_str(...) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-09-04 13:09:13 +02:00
Markus Armbruster	5b5f825d44	qapi: Generate FOO_str() macro for QAPI enum FOO The next commit will put it to use. May look pointless now, but we're going to change the FOO_lookup's type, and then it'll help. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-13-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>	2017-09-04 13:09:13 +02:00
Markus Armbruster	06c60b6c46	qapi: Drop superfluous qapi_enum_parse() parameter max The lookup tables have a sentinel, no need to make callers pass their size. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-3-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Rebased, commit message corrected]	2017-09-04 13:09:13 +02:00
Peter Xu	2dfaf12ebb	migration: fix comment disorder in RAMState Comments for "migration_dirty_pages" and "bitmap_mutex" are switched. Fix it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1501666880-10159-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-08-02 11:27:28 +01:00
Marc-André Lureau	b91bf5e488	migration: fix small leaks Spotted thanks to valgrind and tests/device-introspect-test: ==11711== 1 bytes in 1 blocks are definitely lost in loss record 6 of 14,537 ==11711== at 0x4C2EB6B: malloc (vg_replace_malloc.c:299) ==11711== by 0x1E0CDBD8: g_malloc (gmem.c:94) ==11711== by 0x1E0E696E: g_strdup (gstrfuncs.c:363) ==11711== by 0x695693: migration_instance_init (migration.c:2226) ==11711== by 0x717C4B: object_init_with_type (object.c:344) ==11711== by 0x717E80: object_initialize_with_type (object.c:375) ==11711== by 0x7182EB: object_new_with_type (object.c:483) ==11711== by 0x718328: object_new (object.c:493) ==11711== by 0x4B8A29: qmp_device_list_properties (qmp.c:542) ==11711== by 0x4A9561: qmp_marshal_device_list_properties (qmp-marshal.c:1425) ==11711== by 0x819D4A: do_qmp_dispatch (qmp-dispatch.c:104) ==11711== by 0x819E82: qmp_dispatch (qmp-dispatch.c:131) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170801160419.14180-1-marcandre.lureau@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-08-02 11:26:09 +01:00
Vladimir Sementsov-Ogievskiy	8908eb1a4a	trace-events: fix code style: print 0x before hex numbers The only exception are groups of numers separated by symbols '.', ' ', ':', '/', like 'ab.09.7d'. This patch is made by the following: > find . -name trace-events \| xargs python script.py where script.py is the following python script: ========================= #!/usr/bin/env python import sys import re import fileinput rhex = '%[-+ .0-9](?:[hljztL]\|ll\|hh)?(?:x\|X\|"\sPRI[xX][^"]"?)' rgroup = re.compile('((?:' + rhex + '[.:/ ])+' + rhex + ')') rbad = re.compile('(?<!0x)' + rhex) files = sys.argv[1:] for fname in files: for line in fileinput.input(fname, inplace=True): arr = re.split(rgroup, line) for i in range(0, len(arr), 2): arr[i] = re.sub(rbad, '0x\g<0>', arr[i]) sys.stdout.write(''.join(arr)) ========================= Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Message-id: 20170731160135.12101-5-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-08-01 12:13:07 +01:00
Philippe Mathieu-Daudé	87e0331c5a	docs: fix broken paths to docs/devel/tracing.txt With the move of some docs/ to docs/devel/ on `ac06724a71`, no references were updated. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2017-07-31 13:12:53 +03:00
Markus Armbruster	01fa559826	migration: Use JSON null instead of "" to reset parameter to default migrate-set-parameters sets migration parameters according to is arguments like this: * Present means "set the parameter to this value" * Absent means "leave the parameter unchanged" * Except for parameters tls_creds and tls_hostname, "" means "reset the parameter to its default value The first two are perfectly normal: presence of the parameter makes the command do something. The third one overloads the parameter with a second meaning. The overloading is implicit, i.e. it's not visible in the types. Works here, because "" is neither a valid TLS credentials ID, nor a valid host name. Pressing argument values the schema accepts, but are semantically invalid, into service to mean "reset to default" is not general, as suitable invalid values need not exist. I also find it ugly. To clean this up, we could add a separate flag argument to ask for "reset to default", or add a distinct value to @tls_creds and @tls_hostname. This commit implements the latter: add JSON null to the values of @tls_creds and @tls_hostname, deprecate "". Because we're so close to the 2.10 freeze, implement it in the stupidest way possible: have qmp_migrate_set_parameters() rewrite null to "" before anything else can see the null. The proper way to do it would be rewriting "" to null, but that requires fixing up code to work with null. Add TODO comments for that. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-07-24 13:35:11 +02:00
Markus Armbruster	1bda8b3c69	migration: Unshare MigrationParameters struct for now Commit `de63ab6` "migrate: Share common MigrationParameters struct" reused MigrationParameters for the arguments of migrate-set-parameters, with the following rationale: It is rather verbose, and slightly error-prone, to repeat the same set of parameters for input (migrate-set-parameters) as for output (query-migrate-parameters), where the only difference is whether the members are optional. We can just document that the optional members will always be present on output, and then share a common struct between both commands. The next patch can then reduce the amount of code needed on input. I need to unshare them to correct a design flaw in a stupid, but minimally invasive way, in the next commit. We can restore the sharing when we redo that patch in a less stupid way. Add a suitable TODO comment. Note that I revert only the sharing part of commit `de63ab6`, not the part that made the members of query-migrate-parameters' result optional. The schema (and thus introspection) remains inaccurate for query-migrate-parameters. If we decide not to restore the sharing, we should revert that part, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-07-24 13:35:11 +02:00
Markus Armbruster	e87fae4c48	migration: Add TODO comments on duplication of QAPI_CLONE() qmp_query_migrate_parameters() and qmp_migrate_set_parameters() effectively duplicate QAPI_CLONE() inline. Add suitable TODO comments. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-07-24 13:35:11 +02:00
Markus Armbruster	8cc99dcdc2	migration: Clean up around tls_creds, tls_hostname Optional MigrationParameters members tls_creds and tls_hostname can't actually be absent outside qmp_migrate_set_parameters() since commit `4af245d` (v2.9.0). Note that commit `4af245d` reverted the part of commit `de63ab6` (v2.8.0) that made tls_creds and tls_hostname absent instead of "" in the value of query-migrate-parameters, even though commit `de63ab6` called that a mistake. What a mess. Drop the redundant tests for presence, and update documentation. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-24 13:35:11 +02:00
Peter Xu	6b19a7d91c	migration: check global caps for validity Checks validity for all the capabilities that we enabled with command line. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1500349150-13240-11-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:26 +02:00
Peter Xu	4e4a3d3aa6	migration: provide migrate_cap_add() Abstracted from migrate_set_block_enabled() to allocate MigrationCapabilityStatusList properly. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-10-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:26 +02:00
Peter Xu	4a84214ebe	migration: provide migrate_caps_check() Abstract helper function to check migration capabilities (from the old qmp_migrate_set_capabilities). Prepare to be used somewhere else. There is side effect on the change: when applying the capabilities, we were skipping the invalid ones, but still applying the valid ones (if they are provided in the same QMP request). After this refactoring, we'll ignore all the capabilities if we detected invalid setup along the way. However, I don't think it is a problem since general users should not provide anything invalid after all. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-9-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:25 +02:00
Peter Xu	fd198f9002	migration: remove check against colo support Since commit `a15215f3` ("build: remove --enable-colo/--disable-colo"), colo is always supported. We don't need any colo_supported() now since it is always true. Removing any extra code that depends on it. CC: Paolo Bonzini <pbonzini@redhat.com> CC: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Zhang Chen<zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-8-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:24 +02:00
Peter Xu	8b0b29dcec	migration: check global params for validity Adding validity check for the migration parameters passed in via global properties. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1500349150-13240-7-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:23 +02:00
Peter Xu	476c72aa91	migration: provide migrate_params_apply() Abstracted from qmp_migrate_set_parameters(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-6-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:22 +02:00
Peter Xu	16d063bc2a	migration: introduce migrate_params_check() Helper to check the parameters. Abstracted from qmp_migrate_set_parameters(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-5-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:21 +02:00
Peter Xu	2081475841	migration: export capabilities to props Do the same thing to migration capabilities, just like what we did in previous patch for migration parameters. Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-4-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:21 +02:00
Peter Xu	89632fafdc	migration: export parameters to props Export migration parameters to qdev properties. Then we can use, for example: -global migration.x-cpu-throttle-initial=xxx To specify migration parameters during init. Prefix "x-" is appended for each parameter exported to show that this is not a stable interface, and only for debugging/testing purpose. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1500349150-13240-3-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:20 +02:00
Dr. David Alan Gilbert	32bce19634	migration/rdma: Send error during cancelling When we issue a cancel and clean up the RDMA channel send a CONTROL_ERROR to get the destination to quit. The rdma_cleanup code waits for the event to come back from the rdma_disconnect; but that wont happen until the destination quits and there's currently nothing to force it. Note this makes the case of a cancel work while the destination is alive, and it already works if the destination is truly dead. Note it doesn't fix the case where the destination is hung (we get stuck waiting for the rdma_disconnect event). Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20170717110936.23314-7-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:18 +02:00
Dr. David Alan Gilbert	482a33c53c	migration/rdma: Safely convert control types control_desc[] is an array of strings that correspond to a series of message types; they're used only for error messages, but if the message type is seriously broken then we could go off the end of the array. Convert the array to a function control_desc() that bound checks. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170717110936.23314-6-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:17 +02:00
Dr. David Alan Gilbert	9c98cfbe72	migration/rdma: Allow cancelling while waiting for wrid When waiting for a WRID, if the other side dies we end up waiting for ever with no way to cancel the migration. Cure this by poll()ing the fd first with a timeout and checking error flags and migration state. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20170717110936.23314-5-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:17 +02:00
Dr. David Alan Gilbert	0b3c15f097	migration/rdma: fix qemu_rdma_block_for_wrid error paths The two places that 'goto err_block_for_wrid' weren't setting ret and so would end up returning 0 even though we've failed. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20170717110936.23314-4-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:16 +02:00
Dr. David Alan Gilbert	3a0f2ceaed	migration: Close file on failed migration load Closing the file before exit on a failure allows the source to cleanup better, especially with RDMA. Partial fix for https://bugs.launchpad.net/qemu/+bug/1545052 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170717110936.23314-3-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:15 +02:00
Dr. David Alan Gilbert	9cf2bab2ed	migration/rdma: Fix race on source Fix a race where the destination might try and send the source a WRID_READY before the source has done a post-recv for it. rdma_post_recv has to happen after the qp exists, and we're OK since we've already called qemu_rdma_source_init that calls qemu_alloc_qp. This corresponds to: https://bugzilla.redhat.com/show_bug.cgi?id=1285044 The race can be triggered by adding a few ms wait before this post_recv_control (which was originally due to me turning on loads of debug). Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20170717110936.23314-2-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-07-18 17:36:14 +02:00
Juan Quintela	f0afa331ce	migration: Make compression_threads use save/load_setup/cleanup() Once there, be consistent and use compress_thread_{save,load}_{setup,cleanup}. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-6-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	f265e0e437	migration: Convert ram to use new load_setup()/load_cleanup() Once there, I rename ram_migration_cleanup() to ram_save_cleanup(). Notice that this is the first pass, and I only passed XBZRLE to the new scheme. Moved decoded_buf to inside XBZRLE struct. As a bonus, I don't have to export xbzrle functions from ram.c. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- loaded_data pointer was needed because called can change it (dave) spell loaded correctly in comment (dave) Message-Id: <20170628095228.4661-5-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	acb5ea8697	migration: Create load_setup()/cleanup() methods We need to do things at load time and at cleanup time. Signed-off-by: Juan Quintela <quintela@redhat.com> -- Move the printing of the error message so we can print the device giving the error. Add call to postcopy stuff Message-Id: <20170628095228.4661-4-quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	70f794fcfa	migration: Rename cleanup() to save_cleanup() We need a cleanup for loads, so we rename here to be consistent. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Rename htab_cleanup to htap_save_cleanup as dave suggestion Message-Id: <20170628095228.4661-3-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Juan Quintela	9907e842d7	migration: Rename save_live_setup() to save_setup() We are going to use it now for more than save live regions. Once there rename qemu_savevm_state_begin() to qemu_savevm_state_setup(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170628095228.4661-2-quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Peter Xu	c8d3ff384f	doc: update TYPE_MIGRATION documents [Peter collected Eduardo's patch comment and formatted into patch] Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1499242883-2184-5-git-send-email-peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Peter Xu	b605c47b57	migration: fix handling for --only-migratable MigrateState object is not ready at that time, so we'll get an assertion. Use qemu_global_option() instead. Reported-by: Eduardo Habkost <ehabkost@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Fixes: `3df663e` ("migration: move only_migratable to MigrationState") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1499242883-2184-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-07-10 17:52:21 +01:00
Eric Blake	d6a644bbfe	block: Make bdrv_is_allocated() byte-based We are gradually moving away from sector-based interfaces, towards byte-based. In the common case, allocation is unlikely to ever use values that are not naturally sector-aligned, but it is possible that byte-based values will let us be more precise about allocation at the end of an unaligned file that can do byte-based access. Changing the signature of the function to use int64_t pnum ensures that the compiler enforces that all callers are updated. For now, the io.c layer still assert()s that all callers are sector-aligned on input and that pnum is sector-aligned on return to the caller, but that can be relaxed when a later patch implements byte-based block status. Therefore, this code adds usages like DIV_ROUND_UP(,BDRV_SECTOR_SIZE) to callers that still want aligned values, where the call might reasonbly give non-aligned results in the future; on the other hand, no rounding is needed for callers that should just continue to work with byte alignment. For the most part this patch is just the addition of scaling at the callers followed by inverse scaling at bdrv_is_allocated(). But some code, particularly bdrv_commit(), gets a lot simpler because it no longer has to mess with sectors; also, it is now possible to pass NULL if the caller does not care how much of the image is allocated beyond the initial offset. Leave comments where we can further simplify once a later patch eliminates the need for sector-aligned requests through bdrv_is_allocated(). For ease of review, bdrv_is_allocated_above() will be tackled separately. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-07-10 13:18:07 +02:00
Peter Xu	c788ada816	migration: add "return-path" capability When this capability is enabled, QEMU will use the return path even for precopy migration. This is helpful at least in one case when destination failed to load the image while source quited without confirmation. With return path, source will wait for the last response from destination, and if destination fails, it'll fail the migration on source, then the guest can be run again on the source (rather than assuming to be good, then the guest will be lost after source quits). It needs to be enabled explicitly on source, otherwise disabled. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498472935-14461-1-git-send-email-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:51:10 +02:00
Halil Pasic	d2164ad35c	vmstate: error hint for failed equal checks In some cases a failing VMSTATE_*_EQUAL does not mean we detected a bug, but it's actually the best we can do. Especially in these cases a verbose error message is required. Let's introduce infrastructure for specifying a error hint to be used if equal check fails. Let's do this by adding a parameter to the _EQUAL macros called _err_hint. Also change all current users to pass NULL as last parameter so nothing changes for them. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170623144823.42936-1-pasic@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:44 +02:00
Peter Xu	01f6e14c78	migration: add comment for TYPE_MIGRATE It'll be strange that the migration object inherits TYPE_DEVICE. Add some explanations to it. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498634144-26508-1-git-send-email-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	9d18af93b3	migration: hmp: dump globals Now we have some globals that can be configured for migration. Dump them in HMP info migration for better debugging. (we can also use this to monitor whether COMPAT fields are applied correctly on compatible machines) Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-11-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	4ffdb337e7	migration: merge enforce_config_section somewhat These two parameters: - MachineState::enforce_config_section - MigrationState::send_configuration are playing similar role here. This patch merges the first one into second, then we'll have a single place to reference whether we need to send the configuration section. I didn't remove the MachineState.enforce_config_section field since when applying that machine property (in machine_set_property()) we haven't yet initialized global properties and migration object. Then, it's still not easy to pass that boolean to MigrationState at such an early time. A natural benefit for current patch is that now we kept the meaning of "enforce-config-section" since it'll still have the highest priority (that's what "enforce" mean I guess). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-10-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	15c3850325	migration: move skip_section_footers Move it into MigrationState, revert its meaning and renaming it to send_section_footer, with a property bound to it. Same trick is played like previous patches. Removing savevm_skip_section_footers(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-9-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:39 +02:00
Peter Xu	71dd4c1a56	migration: move skip_configuration out It was in SaveState but now moved to MigrationState altogether, reverted its meaning, then renamed to "send_configuration". Again, using HW_COMPAT_2_3 for old PC/SPAPR machines, and accel_register_prop() for xen_init(). Removing savevm_skip_configuration(). Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-8-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Peter Xu	3df663e575	migration: move only_migratable to MigrationState One less global variable, and it does only matter with migration. We keep the old "--only-migratable" option, but also now we support: -global migration.only-migratable=true Currently still keep the old interface. Hmm, now vl.c has no way to access migrate_get_current(). Export a function for it to setup only_migratable. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-7-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Peter Xu	5272298c48	migration: move global_state.optional out Put it into MigrationState then we can use the properties to specify whether to enable storing global state. Removing global_state_set_optional() since now we can use HW_COMPAT_2_3 for x86/power, and AccelClass.global_props for Xen. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-6-git-send-email-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Peter Xu	e5cb7e7677	migration: let MigrationState be a qdev Let the old man "MigrationState" join the object family. Direct benefit is that we can start to use all the property features derived from current QDev, like: HW_COMPAT_* bits, command line setup for migration parameters (so will never need to set them up each time using HMP/QMP, this is really, really attractive for test writters), etc. I see no reason to disallow this happen yet. So let's start from this one, to see whether it would be anything good. Now we init the MigrationState struct statically in main() to make sure it's initialized after global properties are applied, since we'll use them during creation of the object. No functional change at all. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1498536619-14548-5-git-send-email-peterx@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-28 11:18:38 +02:00
Stefan Hajnoczi	1575829d2a	migration: hold AioContext lock for loadvm qemu_fclose() migration_incoming_state_destroy() uses qemu_fclose() on the vmstate file. Make sure to call it inside an AioContext acquire/release region. This fixes an 'qemu: qemu_mutex_unlock: Operation not permitted' abort in loadvm. This patch closes the vmstate file before ending the drained region. Previously we closed the vmstate file after ending the drained region. The order does not matter. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Stefan Hajnoczi	8649f2f9b2	migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all() blk/bdrv_drain_all() only takes effect for a single instant and then resumes block jobs, guest devices, and other external clients like the NBD server. This can be handy when performing a synchronous drain before terminating the program, for example. Monitor commands usually need to quiesce I/O across an entire code region so blk/bdrv_drain_all() is not suitable. They must use bdrv_drain_all_begin/end() to mark the region. This prevents new I/O requests from slipping in or worse - block jobs completing and modifying the graph. I audited other blk/bdrv_drain_all() callers but did not find anything that needs a similar fix. This patch fixes the savevm/loadvm commands. Although I haven't encountered a read world issue this makes the code safer. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Stefan Hajnoczi	17e2a4a47d	migration: avoid recursive AioContext locking in save_vmstate() AioContext was designed to allow nested acquire/release calls. It uses a recursive mutex so callers don't need to worry about nesting...or so we thought. BDRV_POLL_WHILE() is used to wait for block I/O requests. It releases the AioContext temporarily around aio_poll(). This gives IOThreads a chance to acquire the AioContext to process I/O completions. It turns out that recursive locking and BDRV_POLL_WHILE() don't mix. BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread will not be able to acquire the AioContext if it was acquired multiple times. Instead of trying to release AioContext n times in BDRV_POLL_WHILE(), this patch simply avoids nested locking in save_vmstate(). It's the simplest fix and we should step back to consider the big picture with all the recent changes to block layer threading. This patch is the final fix to solve 'savevm' hanging with -object iothread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-06-26 14:51:13 +02:00
Peter Maydell	65a0e3e842	-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEtBAABCAAXBQJZQyPmEBxmYW16QHJlZGhhdC5jb20ACgkQyjViTGqRccaWrgf/ SCAHpi4gzWbr7AN03jP16Qy/kqNik6F7LTNSqrRbvBPb3TNchDd4z44SAghK5m/r +IlYQc20sBZ60tRHIHAUSF2WNcea2pj1v3ZVgjrI7hiJ3DXPiqqt/dAR/W/BLIDO tAHAVF6Pnrjm9DC4d2zATLDHvcHMzWOsnePh7XcOm44REbwUr3GDg6bf2+j+5yfS 9ewmXfh8z4w1IvSn+f5B+IeCvGvJNA1D55dqcGo8Ivlg9PnElziXFaXO2s7UiLIM mF3eTSIbJQNNN+E+0lpRpnqQiq+Txxggu61Q4f8bOTBhEOPa3etj1ydnXMVbvX25 6SUuBfGh51tyOIZOJz3GtA== =9b+J -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/famz/tags/docker-and-block-pull-request' into staging # gpg: Signature made Fri 16 Jun 2017 01:18:46 BST # gpg: using RSA key 0xCA35624C6A9171C6 # gpg: Good signature from "Fam Zheng <famz@redhat.com>" # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6 * remotes/famz/tags/docker-and-block-pull-request: (23 commits) block: make accounting thread-safe block: split BlockAcctStats creation and setup block: introduce block_account_one_io block: protect modification of dirty bitmaps with a mutex migration/block: reset dirty bitmap before reading block: introduce dirty_bitmap_mutex block: protect tracked_requests and flush_queue with reqs_lock block: access write_gen with atomics block: use Stat64 for wr_highest_offset util: add stats64 module throttle-groups: protect throttled requests with a CoMutex throttle-groups: do not use qemu_co_enter_next throttle-groups: only start one coroutine from drained_begin block: access io_plugged with atomic ops block: access wakeup with atomic ops block: access serialising_in_flight with atomic ops block: access io_limits_disabled with atomic ops block: access quiesce_counter with atomic ops block: access copy_on_read with atomic ops docker: Add flex and bison to centos6 image ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-20 16:01:15 +01:00
Fam Zheng	a1fbe750fd	migration: Fix race of image locking between src and dst Previously, dst side will immediately try to lock the write byte upon receiving QEMU_VM_EOF, but at src side, bdrv_inactivate_all() is only done after sending it. If the src host is under load, dst may fail to acquire the lock due to racing with the src unlocking it. Fix this by hoisting the bdrv_inactivate_all() operation before QEMU_VM_EOF. N.B. A further improvement could possibly be done to cleanly handover locks between src and dst, so that there is no window where a third QEMU could steal the locks and prevent src and dst from running. N.B. This commit includes a minor improvement to the error handling by using qemu_file_set_error(). Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 20170616160658.32290-1-famz@redhat.com Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> [PMM: noted qemu_file_set_error() use in commit as suggested by Daniel] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-06-19 17:53:33 +01:00
Paolo Bonzini	b64bd51efa	block: protect modification of dirty bitmaps with a mutex Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-17-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	c0bad49946	migration/block: reset dirty bitmap before reading Any data that is returned by read may be stale already, the bitmap has to be cleared before issuing the read. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-16-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Paolo Bonzini	2119882c7e	block: introduce dirty_bitmap_mutex It protects only the list of dirty bitmaps; in the next patch we will also protect their content. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20170605123908.18777-15-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>	2017-06-16 07:55:00 +08:00
Juan Quintela	3416ab5bb4	migration: Don't create decompression threads if not enabled Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- I removed the [HACK] part because previous patch just check that compression pages are not received.	2017-06-14 11:11:06 +02:00
Juan Quintela	edc60127e4	migration: Test for disabled features on reception Right now, if we receive a compressed page while this features are disabled, Bad Things (TM) can happen. Just add a test for them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- I had XBZRLE here also, but it don't need extra resources on destination, only on source. Additionally libvirt don't enable it on destination, so don't put it here. - initialize invalid_flags at declaration time. - remove extra space (peter)	2017-06-14 11:11:06 +02:00
Juan Quintela	1adc1ceef7	migration: Remove unneeded includes Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-14 11:10:19 +02:00
Peter Xu	62a0265852	migration: fix incorrect enable return path `0425dc9` is actually v1 of that patch, but it was accidentally merged (while there was a v2). That will cause problem when we try to migrate to some old QEMUs when return path is not really there. Let's fix it, then squashing this patch with `0425dc9` will be exactly patch content of v2. Fixes: `0425dc9` ("migration: isolate return path on src") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-14 11:09:38 +02:00
Juan Quintela	6666c96aac	migration: Move migration.h to migration/ Nothing uses it outside of migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	c4b63b7cc5	migration: Move remaining exported functions to migration/misc.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	84a899de8c	migration: create global_state.c It don't belong anywhere else, just the global state where everybody can stick other things. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	2ce3bf1aa9	migration: ram_control_* are implemented in qemu_file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	da6f17903f	migration: Commands are only used inside migration.c So, move them there. Notice that we export functions that send commands, not the command themselves. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	c3d2e2e76c	migration: Move constants to savevm.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:45 +02:00
Juan Quintela	f2a8f0a631	migration: Split registration functions from vmstate.h They are indpendent, and nowadays almost every device register things with qdev->vmsd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-13 11:00:44 +02:00
Juan Quintela	f8d806c992	migration: Move self_announce_delay() to misc.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:44 +02:00
Juan Quintela	543147116e	migration: Remove MigrationState from migration_channel_incomming() All callers were calling migrate_get_current(), so do it inside the function. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:44 +02:00
Juan Quintela	c8f9f4f402	ram: Now POSTCOPY_ACTIVE is the same that STATUS_ACTIVE Merge them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-06-13 11:00:44 +02:00
Juan Quintela	930ac04c22	ram: Print block stats also in the complete case Once there, create populate_disk_info. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> -- - create populate_disk_info instead of "abusing" populate_ram_info	2017-06-13 11:00:44 +02:00
Eduardo Habkost	250561e1ae	migration: Don't try to set errp directly Assigning directly to errp is not valid, as errp may be NULL, &error_fatal, or &error_abort. Use error_propagate() instead. Cc: Juan Quintela <quintela@redhat.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-13 11:00:44 +02:00
Peter Xu	0425dc9762	migration: isolate return path on src There are some places that binded "return path" with postcopy. Let's be prepared for its usage even without postcopy. This patch mainly did this on source side. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-13 11:00:44 +02:00
Kevin Wolf	362fdf170c	migration/block: Clean up BBs in block_save_complete() We need to release any block migrations BlockBackends on the source before successfully completing the migration because otherwise inactivating the images will fail (inactivation only tolerates device BBs). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-06-09 11:45:03 +02:00
Kevin Wolf	f07fa4cbf0	migration: Inactivate images after .save_live_complete_precopy() Block migration may still access the image during its .save_live_complete_precopy() implementation, so we should only inactivate the image afterwards. Another reason for the change is that inactivating an image fails when there is still a non-device BlockBackend using it, which includes the BBs used by block migration. We want to give block migration a chance to release the BBs before trying to inactivate the image (this will be done in another patch). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2017-06-09 11:45:03 +02:00
QingFeng Hao	eefff991d0	qemu/migration: fix the double free problem on from_src_file In load_snapshot, mis->from_src_file is freed twice, the first free is by qemu_fclose, the second is by migration_incoming_state_destroy and it causes Illegal instruction exception. The fix is just to remove the first free. This problem is found by qemu-iotests case 068 since commit "660819b migration: shut src return path unconditionally". The error is: 068 1s ... - output mismatch (see 068.out.bad) --- tests/qemu-iotests/068.out 2017-05-06 01:00:26.417270437 +0200 +++ 068.out.bad 2017-06-03 13:59:55.360274640 +0200 @@ -6,6 +6,8 @@ QEMU X.Y.Z monitor - type 'help' for more information (qemu) savevm 0 (qemu) quit +./common.config: line 107: 242472 Illegal instruction (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then + echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid"; +fi; exec "$QEMU_PROG" $QEMU_OPTIONS "$@" ) QEMU X.Y.Z monitor - type 'help' for more information -(qemu) quit -* done +(qemu) * done Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-07 10:20:56 +02:00
Juan Quintela	53518d9448	ram: Make RAMState dynamic We create the variable while we are at migration and we remove it after migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-07 10:20:55 +02:00
Juan Quintela	9360447d34	ram: Use MigrationStats for statistics RAM Statistics need to survive migration to make info migrate work, so we need to store them outside of RAMState. As we already have an struct with those fields, just used them. (MigrationStats and XBZRLECacheStats). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:54 +02:00
Juan Quintela	c00e092832	ram: Move ZERO_TARGET_PAGE inside XBZRLE It was only used by XBZRLE anyways. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:54 +02:00
Juan Quintela	83c13382e4	ram: Call migration_page_queue_free() at ram_migration_cleanup() We shouldn't be using memory later than that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-06-07 10:20:53 +02:00
Juan Quintela	338182c83c	ram: We only print throttling information sometimes Change it to be consistent with everything else. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-06-07 10:20:52 +02:00
Juan Quintela	114f5aee02	ram: Unfold get_xbzrle_cache_stats() into populate_ram_info() They were called consecutively always. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-06-07 10:20:52 +02:00
David Gibson	75e972dab5	migration: Mark CPU states dirty before incoming migration/loadvm As a rule, CPU internal state should never be updated when !cpu->kvm_vcpu_dirty (or the HAX equivalent). If that is done, then subsequent calls to cpu_synchronize_state() - usually safe and idempotent - will clobber state. However, we routinely do this during a loadvm or incoming migration. Usually this is called shortly after a reset, which will clear all the cpu dirty flags with cpu_synchronize_all_post_reset(). Nothing is expected to set the dirty flags again before the cpu state is loaded from the incoming stream. This means that it isn't safe to call cpu_synchronize_state() from a post_load handler, which is non-obvious and potentially inconvenient. We could cpu_synchronize_all_state() before the loadvm, but that would be overkill since a) we expect the state to already be synchronized from the reset and b) we expect to completely rewrite the state with a call to cpu_synchronize_all_post_init() at the end of qemu_loadvm_state(). To clear this up, this patch introduces cpu_synchronize_pre_loadvm() and associated helpers, which simply marks the cpu state as dirty without actually changing anything. i.e. it says we want to discard any existing KVM (or HAX) state and replace it with what we're going to load. Cc: Juan Quintela <quintela@redhat.com> Cc: Dave Gilbert <dgilbert@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Juan Quintela <quintela@redhat.com>	2017-06-06 08:53:24 +10:00
Laurent Vivier	1b6e748246	migration: remove register_savevm() We can replace the four remaining calls of register_savevm() by calls to register_savevm_live(). So we can remove the function and as we don't allocate anymore the ops pointer with g_new0() we don't have to free it then. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2017-06-06 08:53:24 +10:00
Juan Quintela	2c9e6fec89	migration: Move include/migration/block.h into migration/ All functions were internal, except blk_mig_init() that is exported in misc.h now. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:24 +02:00
Juan Quintela	7b1e1a2202	migration: Export ram.c functions in its own file All functions are internal except for ram_mig_init(). Create migration/misc.h for this kind of functions. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	5e22479ae2	migration: Create include for migration snapshots Start removing migration code from sysemu/sysemu.h. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	e1a3ecee3b	migration: Export rdma.c functions in its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	41d64227ed	migration: Export tls.c functions in its own file Just for the functions exported from tls.c. Notice that we can't remove the migration/migration.h include from tls.c because it access directly MigrationState for the tls params. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	61e8b14880	migration: Export socket.c functions in its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:23 +02:00
Juan Quintela	7fcac4a2cc	migration: Export fd.c functions in its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:22 +02:00
Juan Quintela	f4dbe1bf34	migration: Export exec.c functions in its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:22 +02:00
Juan Quintela	08a0aee15c	migration: Split qemu-file.h Split the file into public and internal interfaces. I have to rename the external one because we can't have two include files with the same name in the same directory. Build system gets confused. The only exported functions are the ones that handle basic types. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:49:22 +02:00
Peter Xu	660819b1df	migration: shut src return path unconditionally We were do the shutting off only for postcopy. Now we do this as long as the source return path is there. Moving the cleanup of from_src_file there too. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-01 18:49:12 +02:00
Peter Xu	3482655bbc	migration: fix leak of src file on dst The return path channel is possibly leaked. Fix it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-06-01 18:48:58 +02:00
Juan Quintela	3a011c26bc	migration: Remove section_id parameter from vmstate_load Everything else assumes that we always load a device from its own savevm handler. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:31:13 +02:00
Juan Quintela	c2355ad47d	migration: loadvm handlers are not used So we remove all traces of them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-06-01 18:31:13 +02:00
Juan Quintela	0f42f65781	migration: Use savevm_handlers instead of loadvm copy There is no reason for having the loadvm_handlers at all. There is only one use, and we can use the savevm handlers. We will remove the loadvm handlers on a following patch. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- - Added load_version_id: version_id read from the stream (laurent) - Added load_section_id: section_id read from the stream (dave)	2017-06-01 18:31:13 +02:00
Felipe Franciosi	b4a3c64b16	migration: use dirty_rate_high_cnt more aggressively The commit message from `070afca25` suggests that dirty_rate_high_cnt should be used more aggressively to start throttling after two iterations instead of four. The code, however, only changes the auto convergence behaviour to throttle after three iterations. This makes the behaviour more aggressive by kicking off throttling after two iterations as originally intended. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	d2a4d85a8a	migration: set bytes_xfer_* outside of autoconverge logic The bytes_xfer_now/prev counters are only used by the auto convergence logic. However, they are used alongside the dirty_pages_rate counter, which is calculated (and required) outside of this logic. The problem with this approach is that if the auto convergence capability is changed while a migration is ongoing, the relationship of the counters will be broken. This moves the management of bytes_xfer_now/prev counters outside of the auto convergence logic to address this issue. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	d693c6f10f	migration: set dirty_pages_rate before autoconverge logic Currently, a "period" in the RAM migration logic is at least a second long and accounts for what happened since the last period (or the beginning of the migration). The dirty_pages_rate counter is calculated at the end this logic. If the auto convergence capability is enabled from the start of the migration, it won't be able to use this counter the first time around. This calculates dirty_pages_rate as soon as a period is deemed over, which allows for it to be used immediately. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Felipe Franciosi	9884db2814	migration: keep bytes_xfer_prev init'd to zero The first time migration_bitmap_sync() is called, bytes_xfer_prev is set to ram_state.bytes_transferred which is, at this point, zero. The next time migration_bitmap_sync() is called, an iteration has happened and bytes_xfer_prev is set to 'x' bytes. Most likely, more than one second has passed, so the auto converge logic will be triggered and bytes_xfer_now will also be set to 'x' bytes. This condition is currently masked by dirty_rate_high_cnt, which will wait for a few iterations before throttling. It would otherwise always assume zero bytes have been copied and therefore throttle the guest (possibly) prematurely. Given bytes_xfer_prev is only used by the auto convergence logic, it makes sense to only set its value after a check has been made against bytes_xfer_now. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-31 09:39:20 +02:00
Juan Quintela	20a519a05a	migration: Create savevm.h for functions exported from savevm.c This removes last trace of migration functions from sysemu/sysemu.h. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-05-31 09:39:19 +02:00
Stefan Hajnoczi	d0eda02938	QAPI patches for 2017-05-23 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZJB4MAAoJEDhwtADrkYZTjhkP+wRaiZj9h4IJvcWoNEzfyuA1 kd7+Kx6QgfCmZE9vL2/mlOFddWL0fPtPffL/ZRu5UNgIILaCSPFsGkOGvXLZhaUW he5sqLCqMc2mxgB98HpbT0dzt0cOSCjdM5BxkFXeq/yPoDa0IiZiD8cpvj+FVwKi D0qGdrKKGCR3RteL4gr/kaXY/LXAZfuEjbAtylQx1aMHJ6CKmdSIVVVU2JJVIYhQ +dT/Xst0PSkJYk90wgmwpzPCqKR/N5zHFe8CyUoE67FxBhegdw19O3wlzU9DJ3N5 8Az+fbEjifWoMytTZR4H3snPJGwl6wxsh2UVj9SMCvebc0y278UPlGqiszvWBepa 1iZHHULH+yygHyUmX6CxjHOUW498ES2KGHx7qJJe8ebeJ4XuU7JcE+Sf4GQEAm8Q p6P5s3qXpuVjekCjmerUAtybr+hxEQC9fbAGqPq+r489jwjvUiETrMLbmEHyy/Xa fSUaW+f5kGI0GJS9FYcbcMy9w2130lTK2k4bZM0mSVlSsHA7W0GBDnzxUDtxo6uH oqMQgKIFWOBU5GkRUiL43vpiTIpiLCuG6PbQlgefQRPWdoODVxykuu2bq5hVaax8 8XMkkq7isG/J5esFc55L1qEUyrUDtVYx/LiHj0XXJikkGirXtp7b7l/TmFLZGsex UWWzFRbZnCVf2CKwdV6h =DNqn -----END PGP SIGNATURE----- Merge remote-tracking branch 'armbru/tags/pull-qapi-2017-05-23' into staging QAPI patches for 2017-05-23 # gpg: Signature made Tue 23 May 2017 12:33:32 PM BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * armbru/tags/pull-qapi-2017-05-23: qapi-schema: Remove obsolete note from ObjectTypeInfo block: Use QDict helpers for --force-share shutdown: Expose bool cause in SHUTDOWN and RESET events shutdown: Add source information to SHUTDOWN and RESET shutdown: Preserve shutdown cause through replay shutdown: Prepare for use of an enum in reset/shutdown_request shutdown: Simplify shutdown_signal sockets: Plug memory leak in socket_address_flatten() scripts/qmp/qom-set: fix the value argument passed to srv.command() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-30 09:33:40 +01:00
Eric Blake	aedbe19297	shutdown: Prepare for use of an enum in reset/shutdown_request We want to track why a guest was shutdown; in particular, being able to tell the difference between a guest request (such as ACPI request) and host request (such as SIGINT) will prove useful to libvirt. Since all requests eventually end up changing shutdown_requested in vl.c, the logical change is to make that value track the reason, rather than its current 0/1 contents. Since command-line options control whether a reset request is turned into a shutdown request instead, the same treatment is given to reset_requested. This patch adds an internal enum ShutdownCause that describes reasons that a shutdown can be requested, and changes qemu_system_reset() to pass the reason through, although for now nothing is actually changed with regards to what gets reported. The enum could be exported via QAPI at a later date, if deemed necessary, but for now, there has not been a request to expose that much detail to end clients. For the most part, we turn 0 into SHUTDOWN_CAUSE_NONE, and 1 into SHUTDOWN_CAUSE_HOST_ERROR; the only specific case where we have enough information right now to use a different value is when we are reacting to a host signal. It will take a further patch to edit all call-sites that can trigger a reset or shutdown request to properly pass in any other reasons; this patch includes TODOs to point such places out. qemu_system_reset() trades its 'bool report' parameter for a 'ShutdownCause reason', with all non-zero values having the same effect; this lets us get rid of the weird #defines for VMRESET_* as synonyms for bools. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20170515214114.15442-3-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-23 13:28:17 +02:00
Juan Quintela	46d702b106	migration: Make savevm.c target independent It only needed TARGET_PAGE_SIZE/BITS/BITS_MIN values, so just export them from exec.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-18 19:21:00 +02:00
Juan Quintela	51180423a2	exec: Create include for target_page_size() That is the only function that we need from exec.c, and having to include the whole sysemu.h for this. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- /me leans to be less sloppy with copyright notices thanks Dave	2017-05-18 19:20:59 +02:00
Juan Quintela	987772d9e7	migration: Remove vmstate.h from migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> --- Minor rearrangements due to rebase	2017-05-18 19:20:59 +02:00
Juan Quintela	82b9d0f06a	migration: Remove qemu-file.h from vmstate.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- minor rearangements due to the rebase	2017-05-18 19:20:59 +02:00
Juan Quintela	576d1abc20	migration: Split vmstate-types.c from vmstate.c Now one just has the interperter, and the other has the basic types. Once there, add copyright boilerplate. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Use GPL v2 or later. Detected by David.	2017-05-18 19:20:59 +02:00
Juan Quintela	05b98c22f8	migration: Move qjson.h to migration/ It is only used for migration code. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-18 19:20:59 +02:00
Juan Quintela	c59be019e9	migration: Remove migration.h from colo.h migration.h is not included in any includes now. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-18 19:20:59 +02:00
Juan Quintela	40014d81f2	migration: Export qemu-file-channel.c functions in its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-18 19:20:50 +02:00
Juan Quintela	dd4339c540	migration: Split migration/channel.c for channel operations Create an include for its exported functions. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Add proper header	2017-05-18 19:20:24 +02:00
Juan Quintela	709e3fe825	migration: Create migration/xbzrle.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-18 18:04:54 +02:00
Dr. David Alan Gilbert	ed1701c6a5	block migration: Allow compile time disable Many users now prefer to use drive_mirror over NBD as an alternative to the older migrate -b option; drive_mirror is more complex to setup but gives you more options (e.g. only migrating some of the disks if some of them are shared). Allow the large chunk of block migration code to be compiled out for those who don't use it. Based on a downstream-patch we've had for a while by Jeff Cody. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> -- - When compiled out, allow seting block only with false value (eric)	2017-05-18 18:04:54 +02:00
Juan Quintela	a0762d9e34	migration: Remove old MigrationParams Not used anymore after moving block migration to use capabilities. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-18 18:04:54 +02:00
Juan Quintela	ce7c817c85	migration: Remove use of old MigrationParams We have change in the previous patch to use migration capabilities for it. Notice that we continue using the old command line flags from migrate command from the time being. Remove the set_params method as now it is empty. For savevm, one can't do a: savevm -b/-i foo but now one can do: migrate_set_capability block on savevm foo And we can't use block migration. We could disable block capability unconditionally, but it would not be much better. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> --- - Maintain shared/enabled dependency (Xu suggestion) - Now we maintain the dependency on the setter functions - improve error messages	2017-05-18 18:04:54 +02:00
Juan Quintela	2833c59b94	migration: Create block capability Create one capability for block migration and one parameter for incremental block migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> --- - address all Markus comments - use Markus and Eric text descriptions - change logic another time - improve text messages	2017-05-18 18:04:54 +02:00
Dr. David Alan Gilbert	5d214a92ac	postcopy: Require RAMBlocks that are whole pages It turns out that it's legal to create a VM with RAMBlocks that aren't a multiple of the pagesize in use; e.g. a 1025M main memory using 2M host pages. That breaks postcopy's atomic placement of pages, so disallow it. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-18 18:04:53 +02:00
Dr. David Alan Gilbert	1eb3fc0a0b	migration: Fix non-multiple of page size migration Unfortunately it's legal to create a VM with a RAM size that's not a multiple of the underlying host page or huge page size. Recently I'd changed things to always send host sized pages, and that breaks if we have say a 1025MB guest on 2MB hugepages. Unfortunately we can't just make that illegal since it would break migration from/to existing oddly configured VMs. Symptom: qemu-system-x86_64: Illegal RAM offset 40100000 as it transmits the fraction of the hugepage after the end of the RAMBlock (may also cause a crash on the source - possibly due to clearing bits after the bitmap) Reported-by: Yumei Huang <yuhuang@redhat.com> Red Hat bug: https://bugzilla.redhat.com/show_bug.cgi?id=1449037 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-18 18:04:53 +02:00
Stefan Hajnoczi	56821559f0	HMP pull -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZHJB7AAoJEAUWMx68W/3nSr8P/3uw04AxuCiEpwVYjtYaMGgs C/1zNXFPRIWDx5kNxi7reH0aknYSjHVijxuGUiAqP9gM+VpGO485gEWDrmSo6tqy 9s3HbudTzL/Qfp7ZwJY+v8gFxjwCZYhDh3WI+3yIjWDLHpGhTdRuYIC4DEA1EkGY zvG8fHISMqZUh1DYE7ttCX1zLlZLaAA1znjYbnXCYNjoRdeeY2+uJsaPhdjqi1la 5qhKWtKS68cfiKSkntBASKMs2+z1+4iyGVVzgd8E64O4JbdirB2JHnjSwm3fJ5yY dhJLY6nSn7eNdBmVHSe0ZBDHfwoVbOlPC9+z0Ob+UxiGcXd0KSYaE3IaD2Ref2LS Wju92rSLcADNYBQtNAxwKpDeixf+WbfetiKDPx71N2rcpdpZg1I3kwu4ippMwh6g jpgjSV2Nkcnv4PQBD9kPSt6jJBmcyN3Y98VvaOFxyOv1u1Xm7ZbhkmxOQKQvG4jc OlBbA20EzOC0axl9ktVF7GljiziCqDaMFljBxNlpdAqzjou7ztEEvXBwcl8I8eBH ThocuOf25CPLrTAwggbEWpKfxzSQ9pEIPHpQL3Qz9Ip7STlcxB/FJ/9oEQvpZeJg MHpgBLDzw1xqJw01n711qgGtEsCvTUoQMcnJxgeaI/WzEIaRd68breI2/qG22aBl 0XgH7THRl8WP+EC8ZYrB =wJvl -----END PGP SIGNATURE----- Merge remote-tracking branch 'dgilbert/tags/pull-hmp-20170517' into staging HMP pull # gpg: Signature made Wed 17 May 2017 07:03:39 PM BST # gpg: using RSA key 0x0516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * dgilbert/tags/pull-hmp-20170517: ramblock: add new hmp command "info ramblock" utils: provide size_to_str() ramblock: add RAMBLOCK_FOREACH() Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-18 13:36:15 +01:00
Peter Xu	99e15582de	ramblock: add RAMBLOCK_FOREACH() So that it can simplifies the iterators. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1494562661-9063-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 17:30:37 +01:00
Juan Quintela	1bfe5f0586	migration: Move check_migratable() into qdev.c The function is only used once, and nothing else in migration knows about objects. Create the function vmstate_device_is_migratable() in savem.c that really do the bit that is related with migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	bac3b21218	migration: Move postcopy stuff to postcopy-ram.c Yes, we don't have a good place to put that stuff. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	aa3544c371	migration: Move page_cache.c to migration/ It is only used by migration, so move it there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	795c40b8bd	migration: Create migration/blocker.h This allows us to remove lots of includes of migration/migration.h Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	bb890ed551	ram: Rename RAM_SAVE_FLAG_COMPRESS to RAM_SAVE_FLAG_ZERO Reflects better what it does now, and avoid confussions with RAM_SAVE_FLAG_COMPRESS_PAGE. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	927d663819	migration: Pass Error ** argument to {save,load}_vmstate This way we use the "normal" way of printing errors for hmp commands. Signed-off-by: Juan Quintela <quintela@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com>	2017-05-17 12:04:59 +02:00
Juan Quintela	2bf3aa85f0	migration: Fix regression with compression threads Compression threads got broken on commit commit `2479569466` Author: Juan Quintela <quintela@redhat.com> Date: Tue Mar 21 11:45:01 2017 +0100 ram: reorganize last_sent_block On do_compress_ram_page() we use a different QEMUFile than the migration one. We need to pass it there. The failure can be seen as: (qemu) qemu-system-x86_64: Unknown combination of migration flags: 0 qemu-system-x86_64: error while loading state section id 3(ram) qemu-system-x86_64: load of migration failed: Invalid argument Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Peter Xu <peterx@redhat.com>	2017-05-17 12:04:59 +02:00
Kevin Wolf	4417ab7adf	block: New BdrvChildRole.activate() for blk_resume_after_migration() Instead of manually calling blk_resume_after_migration() in migration code after doing bdrv_invalidate_cache_all(), integrate the BlockBackend activation with cache invalidation into a single function. This is achieved with a new callback in BdrvChildRole that is called by bdrv_invalidate_cache_all(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Kevin Wolf	ace21a5875	migration: Unify block node activation error handling Migration code activates all block driver nodes on the destination when the migration completes. It does so by calling bdrv_invalidate_cache_all() and blk_resume_after_migration(). There is one code path for precopy and one for postcopy migration, resulting in four function calls, which used to have three different failure modes. This patch unifies the behaviour so that failure to activate all block nodes is non-fatal, but the error message is logged and the VM isn't automatically started. 'cont' will retry activating the block nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-05-11 12:08:24 +02:00
Markus Armbruster	bd269ebc82	sockets: Limit SocketAddressLegacy to external interfaces SocketAddressLegacy is a simple union, and simple unions are awkward: they have their variant members wrapped in a "data" object on the wire, and require additional indirections in C. SocketAddress is the equivalent flat union. Convert all users of SocketAddressLegacy to SocketAddress, except for existing external interfaces. See also commit fce5d53..9445673 and 85a82e8..c5f1ae3. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-7-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Minor editing accident fixed, commit message and a comment tweaked] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:14:40 +02:00
Markus Armbruster	dfd100f242	sockets: Rename SocketAddress to SocketAddressLegacy The next commit will rename SocketAddressFlat to SocketAddress, and the commit after that will replace most uses of SocketAddressLegacy by SocketAddress, replacing most of this commit's renames right back. Note that checkpatch emits a few "line over 80 characters" warnings. The long lines are all temporary; the SocketAddressLegacy replacement will shorten them again. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-5-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-05-09 09:14:40 +02:00
Markus Armbruster	0785bd7a7c	sockets: Prepare inet_parse() for flattened SocketAddress I'm going to flatten SocketAddress: rename SocketAddress to SocketAddressLegacy, SocketAddressFlat to SocketAddress, eliminate SocketAddressLegacy except in external interfaces. inet_parse() returns a newly allocated InetSocketAddress. Lift the allocation from inet_parse() into its caller socket_parse() to prepare for flattening SocketAddress. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1493192202-3184-3-git-send-email-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [Straightforward rebase]	2017-05-09 09:14:40 +02:00
Stefan Hajnoczi	4aee86c60a	migration/next for 20170504 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJZCvXtAAoJEPSH7xhYctcjXM4P/igRf7yFkgp1cipE2u3xnGkF OBzuucG/WOAlyoWfOSOsyeb2sFC8KGTnBm4DRsJ6dlTaXxIXGe/BLfJnmsRThnJ+ NvQfNraI523gHsLpZm47XIkwpr96OQbmqHtFy/jRl1qSUTJPYe0o2HVnoS1zA5hY 0meTcivVeBes7TptCN0k638OAaC6yaXAn937JzTIic5oY1banR5o5aG61tsFaFhF OZHMU4CotJMDF2iZv5y0Q4Ui+0mpQfP6hJ/GxjwFKnLffXmcb3YlrjpQMzsfbBGl NyxTjD6DYMYSLvV6yVH9iYmXN0So2/VD3l6kkru8IeEtKHi8s4FQgDSElZbRtMy3 dMlqKFD436nmYw2wD1w3uUqidINMEJJ/LvC5fqGSfcME3N04DoiLJ3pc9QFVcW6z WnpgybdtOjJzmYtMiN65tHwZ/lYaBotAOP2GLOhE5YJlBY5+Vz4swy8krpppv8iP vGAwhakERW4Me4zajVAiNvO5TTxaIDAQEm+u5llWGijc/PgjTARWU2zoO6WUX9/y KaAbXETXBDfJ0cHvvtEmplAB4wdE+2VprznptutR9ewSdgkJRx295j7OlIQ4Gxyg ngFigR7+z5hKCbZrJ1ZGF+CM+hn6JgNWiMjJXvbbTZehV2LUe//RwvXab1VDLYOl uZriR9ZP9T/w3IjTYaPZ =FAsE -----END PGP SIGNATURE----- Merge remote-tracking branch 'quintela/tags/migration/20170504' into staging migration/next for 20170504 # gpg: Signature made Thu 04 May 2017 10:35:41 AM BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * quintela/tags/migration/20170504: migration: Extra tracing migration: Move postcopy-ram.h to migration/ monitor: Move hmp_info_snapshots from savevm.c to hmp.c monitor: Move hmp_delvm from savevm.c to hmp.c monitor: Move hmp_savevm from savevm.c to hmp.c monitor: Move hmp_loadvm from monitor.c to hmp.c monitor: Remove monitor parameter from save_vmstate migration: to_dst_file at that point is NULL migration: setup bi-directional I/O channel for exec: protocol ram: Split dirty bitmap by RAMBlock Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-05 16:52:35 +01:00
Stefan Hajnoczi	12a95f320a	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJZA6QBAAoJEH8JsnLIjy/WLmIQALlu36MvoETPGgzWAFPkaOah awHSeyMOIIO9Lotc8SpLUzq97avXX8ZzMdQINu95npcThO7rhBCn9DaxHjKsxaGj /fu3j3WHDCgbV2wQeX13G/LiBQdGMXWphuO0ww5lH7rRauRAEsgAxHdq3/KZmbeG abV4s/6a8Z4mF+vPglrnODWA3jiOZ92U869xFuqSNYit2eWH/MJm2CbhOwj+UWXw JtdtGl/a4LOJqcCxIxupWYkbjS1ugSCX7ZUco0UMG5R7yptXoFVaLRXwdJvx7dUk RDUnvtjit4gAUViuDzXpTXC5pSAALpskMBB8OUsMQEjIDDq6Z77L/WGnF8guUdmX xgfw+IRu3vur39zfLibxqUdfgsSeDXOfvu9Z91xCY5jNPaTmRYxfKPdw0ua7Lu9+ zzRfbejgxOOFZQGZAS9CNMU52zILpbnDFRxK2YfOBKMj++wS9vGX+qwj1tXnIu4z 35+ijhLvRMspZ3BhtsXGfP559BQ7+vXENKSIznDOgOnW1c75OMpRMl/pLq8CFGXk UCMUE1S6qlNMnYUD6Ihm/TNjjVSO20RNrYM0DB/pakEV8MD9YhqrF65sXLr4i8Qk nO+9ut/HBcWjbe7Ur83lbmLCYOB3fPBTxPKYRWBXMZbUngpsA5WbBXtyLlOUqXdN z2Mc4FxzaPpvtFQSJNsP =B+5d -----END PGP SIGNATURE----- Merge remote-tracking branch 'kwolf/tags/for-upstream' into staging Block layer patches # gpg: Signature made Fri 28 Apr 2017 09:20:17 PM BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * kwolf/tags/for-upstream: (34 commits) progress: Show current progress on SIGINFO iotests: fix exclusion option iotests: clarify help text qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed qemu-img: improve convert_iteration_sectors() block: assert no image modification under BDRV_O_INACTIVE block: fix obvious coding style mistakes in block_int.h qcow2: Allow discard of final unaligned cluster block: Add .bdrv_truncate() error messages block: Add errp to BD.bdrv_truncate() block: Add errp to b{lk,drv}_truncate() block/vhdx: Make vhdx_create() always set errp qemu-img: Document backing options qemu-img/convert: Move bs_n > 1 && -B check down qemu-img/convert: Use @opts for one thing only block: fix alignment calculations in bdrv_co_do_zero_pwritev block: Do not unref bs->file on error in BD's open iotests: 109: Filter out "len" of failed jobs iotests: Fix typo in 026 Issue a deprecation warning if the user specifies the "-hdachs" option. ... Message-id: 1493411622-5343-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-05-04 13:44:32 +01:00
Dr. David Alan Gilbert	1db9d8e501	migration: Extra tracing A couple more traces that would have made fixing that postcopy bug a bit easier. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-04 10:41:23 +02:00
Juan Quintela	be07b0ace8	migration: Move postcopy-ram.h to migration/ It is internal to migration, not intended for other users. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:40:30 +02:00
Juan Quintela	6683061873	monitor: Move hmp_info_snapshots from savevm.c to hmp.c It only uses block/* functions, nothing from migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:34:15 +02:00
Juan Quintela	d905bb7b74	monitor: Move hmp_delvm from savevm.c to hmp.c It really uses block/* stuff, not migration one. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:33:58 +02:00
Juan Quintela	d9c7d137c8	monitor: Move hmp_savevm from savevm.c to hmp.c It is a monitor command, and has nothing migration specific in it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:33:40 +02:00
Juan Quintela	c34c2f3701	monitor: Remove monitor parameter from save_vmstate load_vmstate() already use error_report, so be consistent. There is an identical error message in load_vmstate() that ends in a period. Remove it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-05-04 10:32:58 +02:00
Juan Quintela	fd4144d413	migration: to_dst_file at that point is NULL We have just arrived as: migration.c: qemu_migrate() .... s = migrate_init() <- puts it to NULL .... {tcp,unix}_start_outgoing_migration -> socket_outgoing_migration migration_channel_connect() sets to_dst_file if tls is enabled, we do another round through migrate_channel_tls_connect(), but we only set it up if there is no error. So we don't need the assignation. I am removing it to remove in the follwing patches the knowledge about MigrationState in that two files. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-05-04 10:00:38 +02:00
Daniel P. Berrange	062d81f0e9	migration: setup bi-directional I/O channel for exec: protocol Historically the migration data channel has only needed to be unidirectional. Thus the 'exec:' protocol was requesting an I/O channel with O_RDONLY on incoming side, and O_WRONLY on the outgoing side. This is fine for classic migration, but if you then try to run TLS over it, this fails because the TLS handshake requires a bi-directional channel. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-05-04 10:00:38 +02:00
Juan Quintela	6b6712efcc	ram: Split dirty bitmap by RAMBlock Both the ram bitmap and the unsent bitmap are split by RAMBlock. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix compilation when DEBUG_POSTCOPY is enabled (thanks Hailiang)	2017-05-04 10:00:38 +02:00
Markus Armbruster	38bb54f323	replication: Make --disable-replication compile again Broken in commit `daa33c5`. Cc: qemu-stable@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Message-id: 1493298053-17140-1-git-send-email-armbru@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-04-28 16:50:16 +01:00
Kevin Wolf	0042fd3663	migration: Call blk_resume_after_migration() for postcopy Commit `d35ff5e6` ('block: Ignore guest dev permissions during incoming migration') added blk_resume_after_migration() to the precopy migration path, but neglected to add it to the duplicated code that is used for postcopy migration. This means that the guest device doesn't request the necessary permissions, which ultimately led to failing assertions. Add the missing blk_resume_after_migration() to the postcopy path. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2017-04-27 15:39:49 +02:00
Fam Zheng	bbfb89e38c	migration: Make errp the last parameter of local functions Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <20170421122710.15373-13-famz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2017-04-24 09:13:44 +02:00
Peter Maydell	32c7e0ab75	migration/next for 20170421 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJY+d69AAoJEPSH7xhYctcj/4oQAIFFEyWaqrL9ve5ySiJgdtcY zYtiIhZQ+nPuy2i1oDSX+vbMcmkJDDyfO5qLovxyHGkZHniR8HtxNHP+MkZQa07p DiSIvd51HvcixIouhbGcoUCU63AYxqNL3o5/TyNpUI72nvsgwl3yfOot7PtutE/F r384j8DrOJ9VwC5GGPg27mJvRPvyfDQWfxDCyMYVw153HTuwVYtgiu/layWojJDV D2L1KV45ezBuGckZTHt9y6K4J5qz8qHb/dJc+whBBjj4j9T9XOILU9NPDAEuvjFZ gHbrUyxj7EiApjHcDZoQm9Raez422ALU30yc9Kn7ik7vSqTxk2Ejq6Gz7y9MJrDn KdMj75OETJNjBL+0T9MmbtWts28+aalpTUXtBpmi3eWQV5Hcox2NF1RP42jtD9Pa lkrM6jv0nsdNfBPlQ+ZmBTJxysWECcMqy487nrzmPNC8vZfokjXL5be12puho9fh ziU4gx9C6/k82S+/H6WD/AUtRiXJM7j4oTU2mnjrsSXQC1JNWqODBOFUo9zsDufl vtcrxfPhSD1DwOInFSIBHf/RylcgTkPCL0rPoJ8npNDly6rHFYJ+oIbsn84Z4uYY RWvH8xB9wgRlK9L1WdRgOd2q7PaeHQoSSdPOiS9YVEVMVvSW8Es5CRlhcAsw/M/T 1Tl65cNrjETAuZKL3dLH =EsZ5 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20170421' into staging migration/next for 20170421 # gpg: Signature made Fri 21 Apr 2017 11:28:13 BST # gpg: using RSA key 0xF487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20170421: (65 commits) hmp: info migrate_parameters format tunes hmp: info migrate_capability format tunes migration: rename max_size to threshold_size migration: set current_active_state once virtio-rng: stop virtqueue while the CPU is stopped migration: don't close a file descriptor while it can be in use ram: Remove migration_bitmap_extend() migration: Disable hotplug/unplug during migration qdev: Move qdev_unplug() to qdev-monitor.c qdev: Export qdev_hot_removed qdev: qdev_hotplug is really a bool migration: Remove MigrationState parameter from migration_is_idle() ram: Use RAMBitmap type for coherence ram: rename last_ram_offset() last_ram_pages() ram: Use ramblock and page offset instead of absolute offset ram: Change offset field in PageSearchStatus to page ram: Remember last_page instead of last_offset ram: Use page number instead of an address for the bitmap operations ram: reorganize last_sent_block ram: ram_discard_range() don't use the mis parameter ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-04-21 15:59:27 +01:00
Peter Xu	faec066ab8	migration: rename max_size to threshold_size In migration codes (especially in migration_thread()), max_size is used in many place for the threshold value that we will start to do the final flush and jump to the next stage to dump the whole rest things to destination. However its name is confusing to first readers. Let's rename it to "threshold_size" when proper and add a comment for it. No functional change is made. CC: Juan Quintela <quintela@redhat.com> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-04-21 12:25:41 +02:00
Peter Xu	343001f68d	migration: set current_active_state once We set it right above this one. No need to set it twice. CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-04-21 12:25:41 +02:00
Laurent Vivier	e8199e4895	migration: don't close a file descriptor while it can be in use If we close the QEMUFile descriptor in process_incoming_migration_co() while it has been stopped by an error, the postcopy_ram_listen_thread() can try to continue to use it. And as the memory has been freed it is working with an invalid pointer and crashes. Fix this by releasing the memory after having managed the error case (which, in fact, calls exit()) Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Amit Shah <amit@kernel.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	66103a5796	ram: Remove migration_bitmap_extend() We have disabled memory hotplug, so we don't need to handle migration_bitamp there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	fab3500526	migration: Remove MigrationState parameter from migration_is_idle() Only user don't have a MigrationState handly. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	352b0de982	ram: Use RAMBitmap type for coherence Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	b8c4899398	ram: rename last_ram_offset() last_ram_pages() We always use it as pages anyways. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	f20e286516	ram: Use ramblock and page offset instead of absolute offset This removes the needto pass also the absolute offset. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:40 +02:00
Juan Quintela	a935e30fbb	ram: Change offset field in PageSearchStatus to page We are moving everything to work on pages, not addresses. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	269ace2951	ram: Remember last_page instead of last_offset Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> -- Improve comment Fix typo	2017-04-21 12:25:39 +02:00
Juan Quintela	06b106889a	ram: Use page number instead of an address for the bitmap operations We use an unsigned long for the page number. Notice that our bitmaps already got that for the index, so we have that limit. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- rename page to page_abs everywhere. fix trace types for pages	2017-04-21 12:25:39 +02:00
Juan Quintela	2479569466	ram: reorganize last_sent_block We were setting it far away of when we changed it. Now everything is done inside save_page_header. Once there, reorganize code to pass RAMState. We also set CONTINUE flag in a single place. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	aaa2064c2a	ram: ram_discard_range() don't use the mis parameter Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	15440dd5a0	ram: Pass RAMBlock to bitmap_sync We change the meaning of start to be the offset from the beggining of the block. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Chao Fan	030ce1f861	ram: Add page-size to output in 'info migrate' The number of dirty pages is output in 'pages' in the command 'info migrate', so add page-size to calculate the number of dirty pages in bytes. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	20afaed98b	ram: Rename qemu_target_page_bits() to qemu_target_page_size() It was used as a size in all cases except one. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	a0a8aa147a	ram: We don't need MigrationState parameter anymore Remove it from callers and callees. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	5727309d25	migration: Remove MigrationState from migration_in_postcopy We need to call for the migrate_get_current() in more that half of the uses, so call that inside. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	6d358d9494	ram: Remove compression_switch and inline its logic We can calculate its value, so we don't create a variable for it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- After Peter and Dave review, I dropped the variable and just inlined the condition. Fix typo	2017-04-21 12:25:39 +02:00
Juan Quintela	ce25d33781	ram: Move QEMUFile into RAMState We receive the file from save_live operations and we don't use it until 3 or 4 levels of calls down. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:39 +02:00
Juan Quintela	204b88b869	ram: Add QEMUFile to RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:38 +02:00
Juan Quintela	96506894a3	ram: Move postcopy_requests into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:38 +02:00
Juan Quintela	47ad861976	ram: Move dirty_pages_rate to RAMState Treat it like the rest of ram stats counters. Export its value the same way. As an added bonus, no more MigrationState used in migration_bitmap_sync(); Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Again, dave was the one reviewing it	2017-04-21 12:25:38 +02:00
Juan Quintela	abbf1d7f9b	ram: Remove dirty_bytes_rate It can be recalculated from dirty_pages_rate. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Dave was the one that reviewed it O:-)	2017-04-21 12:25:38 +02:00
Juan Quintela	42d219d3b0	ram: Create ram_dirty_sync_count() This is a ram field that was inside MigrationState. Move it to RAMState and make it the same that the other ram stats. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	ec481c6c57	ram: Move src_page_req* to RAMState This are the last postcopy fields still at MigrationState. Once there Move MigrationSrcPageRequest to ram.c and remove MigrationState parameters where appropiate. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	68a098f386	ram: Move last_req_rb to RAMState It was on MigrationState when it is only used inside ram.c for postcopy. Problem is that we need to access it without being able to pass it RAMState directly. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	9edabd4de6	ram: Remove ram_save_remaining Just unfold it. Move ram_bytes_remaining() with the rest of exported functions. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	072c251157	ram: Use the RAMState bytes_transferred parameter Somewhere it was passed by reference, just use it from RAMState. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	2f4fde9352	ram: Move bytes_transferred into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	eb859c53dd	ram: Move migration_bitmap_rcu into RAMState Once there, rename the type to be shorter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	108cfae019	ram: Move migration_bitmap_mutex into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	ceb4d16898	ram: Everything was init to zero, so use memset And then init only things that are not zero by default. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	0d8ec885ed	ram: Move migration_dirty_pages to RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	180f61f75a	ram: Move xbzrle_overflows into RAMState Once there, remove the now unused AccountingInfo struct and var. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	b07016b674	ram: Move xbzrle_cache_miss_rate into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	544c36f188	ram: Move xbzrle_cache_miss into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:37 +02:00
Juan Quintela	f36ada95de	ram: Move xbzrle_pages into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Comment why we need bytes and pages	2017-04-21 12:25:37 +02:00
Juan Quintela	07ed50a2bb	ram: Move xbzrle_bytes into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	23b28c3c62	ram: Move iterations into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	29cc3d8a9b	ram: Remove norm_mig_bytes_transferred Its value can be calculated by other exported. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	b4d1c6e722	ram: Move norm_pages to RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	bedf53c14c	ram: Remove unused pages_skipped variable For compatibility, we need to still send a value, but just specify it and comment the fact. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	5bb1272c38	ram: Remove unused dup_mig_bytes_transferred() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	f7ccd61b4c	ram: Move dup_pages into RAMState Once there rename it to its actual meaning, zero_pages. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	36040d9cb2	ram: Move iterations_prev into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2017-04-21 12:25:36 +02:00
Juan Quintela	b5833fde40	ram: Move xbzrle_cache_miss_prev into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>	2017-04-21 12:25:36 +02:00
Juan Quintela	68908ed665	ram: Change num_dirty_pages_period type to uint64_t Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	a66cd90c74	ram: Move num_dirty_pages_period into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	c4bdf0cf4b	ram: Change byte_xfer_{prev,now} type to uint64_t Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	eac7415958	ram: Move bytes_xfer_prev into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:36 +02:00
Juan Quintela	f664da80fc	ram: Move start time into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Renamed start_time to time_last_bitmap_sync(peterx suggestion)	2017-04-21 12:25:35 +02:00
Juan Quintela	5a98773896	ram: Move bitmap_sync_count into RAMState Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:35 +02:00
Juan Quintela	8d820d6f67	ram: Add dirty_rate_high_cnt to RAMState We need to add a parameter to several functions to make this work. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:35 +02:00
Juan Quintela	6f37bb8bf3	ram: Create RAMState We create a struct where to put all the ram state Start with the following fields: last_seen_block, last_sent_block, last_offset, last_version and ram_bulk_stage are globals that are really related together. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix typo and warnings	2017-04-21 12:25:35 +02:00
Juan Quintela	3644915726	ram: Rename block_name to rbname So all places are consistent on the naming of a block name parameter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-04-21 12:25:35 +02:00
Juan Quintela	5e58f968f4	ram: Rename flush_page_queue() to migration_page_queue_free() It reflects better what it does. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2017-04-21 12:25:35 +02:00
Juan Quintela	3d0684b2ad	ram: Update all functions comments Added doc comments for existing functions comment and rewrite them in a common style. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix Peter Xu comments Improve postcopy comments as per reviews.	2017-04-21 12:25:35 +02:00
Lidong Chen	3928d50bf1	migration/block: use blk_pwrite_zeroes for each zero cluster BLOCK_SIZE is (1 << 20), qcow2 cluster size is 65536 by default, this may cause the qcow2 file size to be bigger after migration. This patch checks each cluster, using blk_pwrite_zeroes for each zero cluster. [Initialize cluster_size to BLOCK_SIZE to prevent a gcc uninitialized variable compiler warning. In reality we always initialize cluster_size in a conditional but gcc doesn't know that. --Stefan] Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Lidong Chen <lidongchen@tencent.com> Message-id: 1492050868-16200-1-git-send-email-lidongchen@tencent.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-04-21 10:36:12 +01:00
Kevin Wolf	d35ff5e6b3	block: Ignore guest dev permissions during incoming migration Usually guest devices don't like other writers to the same image, so they use blk_set_perm() to prevent this from happening. In the migration phase before the VM is actually running, though, they don't have a problem with writes to the image. On the other hand, storage migration needs to be able to write to the image in this phase, so the restrictive blk_set_perm() call of qdev devices breaks it. This patch flags all BlockBackends with a qdev device as blk->disable_perm during incoming migration, which means that the requested permissions are stored in the BlockBackend, but not actually applied to its root node yet. Once migration has finished and the VM should be resumed, the permissions are applied. If they cannot be applied (e.g. because the NBD server used for block migration hasn't been shut down), resuming the VM fails. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Tested-by: Kashyap Chamarthy <kchamart@redhat.com>	2017-04-07 14:44:05 +02:00
Dr. David Alan Gilbert	8679638b0e	postcopy: Check for shared memory Postcopy doesn't support migration of RAM shared with another process yet (we've got a bunch of things to understand). Check for the case and don't allow postcopy to be enabled. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-03-16 09:02:26 +01:00
QingFeng Hao	e1e686c1fa	vmstate: fix failed iotests case 68 and 91 This problem affects s390x only if we are running without KVM. Basically, S390CPU.irqstate is unused if we do not use KVM, and thus no buffer is allocated. This causes size=0, first_elem=NULL and n_elems=1 in vmstate_load_state and vmstate_save_state. And the assert fails. With this fix we can go back to the old behavior and support VMS_VBUFFER with size 0 and nullptr. Signed-off-by: QingFeng Hao <haoqf@linux.vnet.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-03-16 08:59:52 +01:00
Lidong Chen	1cf6aa74b3	migration/block: Avoid invoking blk_drain too frequently Increase bmds->cur_dirty after submit io, so reduce the frequency involve into blk_drain, and improve the performance obviously when block migration. The performance test result of this patch: During the block dirty save phase, this patch improve guest os IOPS from 4.0K to 9.5K. and improve the migration speed from 505856 rsec/s to 855756 rsec/s. Signed-off-by: Lidong Chen <jemmy858585@gmail.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-03-16 08:58:27 +01:00
Daniel P. Berrange	4af245dc3e	migration: use "" as the default for tls-creds/hostname The tls-creds parameter has a default value of NULL indicating that TLS should not be used. Setting it to non-NULL enables use of TLS. Once tls-creds are set to a non-NULL value via the monitor, it isn't possible to set them back to NULL again, due to current implementation limitations. The empty string is not a valid QObject identifier, so this switches to use "" as the default, indicating that TLS will not be used The tls-hostname parameter has a default value of NULL indicating the the hostname from the migrate connection URI should be used. Again, once tls-hostname is set non-NULL, to override the default hostname for x509 cert validation, it isn't possible to reset it back to NULL via the monitor. The empty string is not a valid hostname, so this switches to use "" as the default, indicating that the migrate URI hostname should be used. Using "" as the default for both, also means that the monitor commands "info migrate_parameters" / "query-migrate-parameters" will report existance of tls-creds/tls-parameters even when set to their default values. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-03-16 08:57:08 +01:00
Chao Fan	1ffb5dfd35	Change the method to calculate dirty-pages-rate In function cpu_physical_memory_sync_dirty_bitmap, file include/exec/ram_addr.h: if (src[idx][offset]) { unsigned long bits = atomic_xchg(&src[idx][offset], 0); unsigned long new_dirty; new_dirty = ~dest[k]; dest[k] \|= bits; new_dirty &= bits; num_dirty += ctpopl(new_dirty); } After these codes executed, only the pages not dirtied in bitmap(dest), but dirtied in dirty_memory[DIRTY_MEMORY_MIGRATION] will be calculated. For example: When ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION] = 0b00001111, and atomic_rcu_read(&migration_bitmap_rcu)->bmap = 0b00000011, the new_dirty will be 0b00001100, and this function will return 2 but not 4 which is expected. the dirty pages in dirty_memory[DIRTY_MEMORY_MIGRATION] are all new, so these should be calculated also. Signed-off-by: Chao Fan <fanc.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-03-16 08:55:56 +01:00
Eric Blake	7d66b1fbd2	migration: Document handling of bdrv_is_allocated() errors Migration is the only code left in the tree that does not react to bdrv_is_allocated() failures. But as there is no useful way to react to the failure, and we are merely skipping unallocated sectors on success, just document that our choice of handling is intended. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2017-03-13 12:49:33 +01:00
Peter Maydell	251501a371	Migration pull Note: The 'postcopy: Update userfaultfd.h header' is part of Paolo's header update and will disappear if applied after it. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJYtW9KAAoJEAUWMx68W/3nF/MQAIaMjoCkeVnCZsGCJ2VbJZcZ Fb5gMNjnHUKwOeDhGuSBP7FGRMalYM2JNRufSiBZnUowCQMXi3Emjad6pGUiPTr0 B+L1czjw0FDdskQpE1U/StdruLiYBJ98oktnjlla00f+E9rylY0cMmrHpmqfRwDn IXTa4qm77aw47Y2MYku1nce27gjA3JEko6Lg2fB7gTtwYzTi/uRrKa+ilbnTPoEZ /ZzK8hcUYiV8oDAOtEmKSG3Azo+6ylzDG4r/ldwEecJPhZxeUk39AhDOoU0mx98N OE8oOk2t/0Bo+mS7iOw9gZ8sr9p5L2myQkmoxxLuAXAcD9sHVlcp0eKi5lLYNmUa oWnnYo3QeCvqrcZzhvSX0b4rLXoY4GP+qKpQo21eKIPEyq3v6EDhrk10UCTXaiBO zxHblLgXSrX6VqYcEJGj2oUR/RjH9ouw3hjI5cDy/d/hRmNLCl8lwvPmVmv3tRer 6X1gcZSUs6hY/drs2/v6maJ0CqK/bx6/OBfkiUJUEN4Dg1ldgO2r1v8pBLukvM6c De2aNRezl821HK487EvRlluUq0nO6L3LkqDTBql4/4Rf4HoTRXxoJ68sB0LBqym5 PwD/C3mQuvlWg8tKJtaHVtS0ESuSCSroaSk1FB648mSs8nJYYFjstc/XovuePqTl 6UT2OQbUdWITILoWSlI5 =PCYv -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20170228a' into staging Migration pull Note: The 'postcopy: Update userfaultfd.h header' is part of Paolo's header update and will disappear if applied after it. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Tue 28 Feb 2017 12:38:34 GMT # gpg: using RSA key 0x0516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20170228a: (27 commits) postcopy: Add extra check for COPY function postcopy: Add doc about hugepages and postcopy postcopy: Check for userfault+hugepage feature postcopy: Update userfaultfd.h header postcopy: Allow hugepages postcopy: Send whole huge pages postcopy: Mask fault addresses to huge page boundary postcopy: Load huge pages in one go postcopy: Use temporary for placing zero huge pages postcopy: Plumb pagesize down into place helpers postcopy: Record largest page size postcopy: enhance ram_block_discard_range for hugepages exec: ram_block_discard_range postcopy: Chunk discards for hugepages postcopy: Transmit and compare individual page sizes postcopy: Transmit ram size summary word migration: fix use-after-free of to_dst_file migration: Update docs to discourage version bumps migration: fix id leak regression migrate: Introduce a 'dc->vmsd' check to avoid segfault for --only-migratable ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-02 17:39:12 +00:00
Peter Maydell	b9fe31392b	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJYtd8UAAoJEH8JsnLIjy/WuK0P/i3yi29JOZTEeUIUc+QXHm4b iXRj/3uLwLD5IJqfjLeDmOPUfI+w0SAkGTidSlKrS5maypW366ke/+CY8QZtbxiH qTazKrXNumhvJuJVXQ0L8bAwDC/r8pPlrvLJ3bfBOK6t6fu//M/nUZP/N92UMX87 w1xA+PNq1TNZEKC6VwdY50fsrhOxIR1ZoS2rEKseEBuYqTm2q5Q0Y8EXhbReO0LE Pb0vAOEmZg6K2Awv4Cg2L8BZlz+tGgUd+3eWV8zzCshHkAc2J/OaxjLCG/KUgXb4 MbdACnIHOUOGqhLfM9wjrVYJAhM/4F1sR0hDm8uUngFWlWsGwTdhH5bRlfat3QQs E4oNt6ORHiEis0l7pXfwHTDC3ChockArxdJlOvmpRm6EUSBs132IJS3TrlKBCr+R CBdoC3k1eXC6uHF4KCrsnW2u26D4Ju9V6Yb5h/RqYPJCc+o16BsD31/WOLhH4WTq M7A9ZbH4jBDHTO3A5zho0x7AGbVGDVMzKssU9MUOkUuPB+yM2KMg/Kr0kUDhrl0k aOdRfCF+OcsiXzM97U3msv0udbHr0/tUtE+/3tL3kcXjEHoaFECUdV8mU9F3KleZ 5MQP2vNcAZA96zuV1qMhPlDwWBoEwBebDnvM3LuTvkHdbBQZv+6A6XiCw/hXdh6o zyUn+/2KKGoPiNv/B6PD =g3ew -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Tue 28 Feb 2017 20:35:32 GMT # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (46 commits) block: Add Error parameter to bdrv_append() block: Add Error parameter to bdrv_set_backing_hd() block: Assertions for resize permission block: Assertions for write permissions block: Pass BdrvChild to bdrv_aligned_preadv/pwritev and copy-on-read tests: Remove FIXME comments nbd/server: Use real permissions for NBD exports migration/block: Use real permissions hmp: Request permissions in qemu-io commit: Add filter-node-name to block-commit mirror: Add filter-node-name to blockdev-mirror stream: Use real permissions in streaming block job mirror: Use real permissions in mirror/active commit block job blockjob: Factor out block_job_remove_all_bdrv() block: Allow backing file links in change_parent_backing_link() block: BdrvChildRole.attach/detach() callbacks block: Fix pending requests check in bdrv_append() backup: Use real permissions in backup block job commit: Use real permissions for HMP 'commit' commit: Use real permissions in commit block job ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2017-03-01 23:09:46 +00:00
Kevin Wolf	6f5ef23a3f	migration/block: Use real permissions Request BLK_PERM_CONSISTENT_READ for the source of block migration, and handle potential permission errors as good as we can in this place (which is not very good, but it matches the other failure cases). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2017-02-28 20:47:50 +01:00
Kevin Wolf	d7086422b1	block: Add error parameter to blk_insert_bs() Now that blk_insert_bs() requests the BlockBackend permissions for the node it attaches to, it can fail. Instead of aborting, pass the errors to the callers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Kevin Wolf	6d0eb64d5c	block: Add permissions to blk_new() We want every user to be specific about the permissions it needs, so we'll pass the initial permissions as parameters to blk_new(). A user only needs to call blk_set_perm() if it wants to change the permissions after the fact. The permissions are stored in the BlockBackend and applied whenever a BlockDriverState should be attached in blk_insert_bs(). This does not include actually choosing the right set of permissions everywhere yet. Instead, the usual FIXME comment is added to each place and will be addressed in individual patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com>	2017-02-28 20:40:36 +01:00
Zhang Chen	daa33c5215	Add a new qmp command to do checkpoint, query xen replication status We can call this qmp command to do checkpoint outside of qemu. Xen colo will need this function. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Wen Congyang <wencongyang@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>	2017-02-28 11:02:12 -08:00
Zhang Chen	2c9639ecab	Add a new qmp command to start/stop replication We can call this qmp command to start/stop replication outside of qemu. Like Xen colo need this function. Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Wen Congyang <wencongyang@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>	2017-02-28 11:01:56 -08:00
Dr. David Alan Gilbert	665414ad06	postcopy: Add extra check for COPY function As an extra sanity check, make sure the region we're registering can perform UFFDIO_COPY; the COPY will fail later but this gives a cleaner failure. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-17-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:24 +00:00
Dr. David Alan Gilbert	7e8cafb713	postcopy: Check for userfault+hugepage feature We need extra Linux kernel support (~4.11) to support userfaults on hugetlbfs; check for them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-15-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:24 +00:00
Dr. David Alan Gilbert	433bd0223c	postcopy: Allow hugepages Allow huge pages in postcopy. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-13-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:24 +00:00
Dr. David Alan Gilbert	4c011c37ec	postcopy: Send whole huge pages The RAM save code uses ram_save_host_page to send whole host pages at a time; change this to use the host page size associated with the RAM Block which may be a huge page. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-12-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:24 +00:00
Dr. David Alan Gilbert	332847f075	postcopy: Mask fault addresses to huge page boundary Currently the fault address received by userfault is rounded to the host page boundary and a host page is requested from the source. Use the current RAMBlock page size instead of the general host page size so that for RAMBlocks backed by huge pages we request the whole huge page. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-11-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:24 +00:00
Dr. David Alan Gilbert	28abd20014	postcopy: Load huge pages in one go The existing postcopy RAM load loop already ensures that it glues together whole host-pages from the target page size chunks sent over the wire. Modify the definition of host page that it uses to be the RAM block page size and thus be huge pages where appropriate. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-10-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	41d84210d4	postcopy: Use temporary for placing zero huge pages The kernel can't do UFFDIO_ZEROPAGE for huge pages, so we have to allocate a temporary (always zero) page and use UFFDIO_COPYPAGE on it. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-9-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	df9ff5e1e3	postcopy: Plumb pagesize down into place helpers Now we deal with normal size pages and huge pages we need to tell the place handlers the size we're dealing with and make sure the temporary page is large enough. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-8-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	67f11b5c23	postcopy: Record largest page size Record the largest page size in use; we'll need it soon for allocating temporary buffers. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-7-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	d3a5038c46	exec: ram_block_discard_range Create ram_block_discard_range in exec.c to replace postcopy_ram_discard_range and most of ram_discard_range. Those two routines are a bit of a weird combination, and ram_discard_range is about to get more complex for hugepages. It's OS dependent code (so shouldn't be in migration/ram.c) but it needs quite a bit of the innards of RAMBlock so doesn't belong in the os*.c. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	29c5917201	postcopy: Chunk discards for hugepages At the start of the postcopy phase, partially sent huge pages must be discarded. The code for dealing with host page sizes larger than the target page size can be reused for this case. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20170224182844.32452-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	ef08fb389f	postcopy: Transmit and compare individual page sizes When using postcopy with hugepages, we require the source and destination page sizes for any RAMBlock to match; note that different RAMBlocks in the same VM can have different page sizes. Transmit them as part of the RAM information header and fail if there's a difference. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170224182844.32452-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Dr. David Alan Gilbert	e8ca1db29b	postcopy: Transmit ram size summary word Replace the host page-size in the 'advise' command by a pagesize summary bitmap; if the VM is just using normal RAM then this will be exactly the same as before, however if they're using huge pages they'll be different, and thus: a) Migration from/to old qemu's that don't understand huge pages will fail early. b) Migrations with different size RAMBlocks will also fail early. This catches it very early; earlier than the detailed per-block check in the next patch. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170224182844.32452-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Vladimir Sementsov-Ogievskiy	f9c8caa04f	migration: fix use-after-free of to_dst_file hmp_savevm calls qemu_savevm_state(f), which sets to_dst_file=f in global migration state. Then hmp_savevm closes f (g_free called). Next access to to_dst_file in migration state (for example, qmp_migrate_set_speed) will use it after it was freed. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170225193155.447462-5-vsementsov@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:23 +00:00
Marc-André Lureau	128e4e1089	migration: fix id leak regression This leak was introduced in commit `581f08bac2`. (it stands out quickly with ASAN once the rest of the leaks are also removed from make check with this series) Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20170221141451.28305-31-marcandre.lureau@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:22 +00:00
Ashijeet Acharya	7562f90707	migrate: Introduce a 'dc->vmsd' check to avoid segfault for --only-migratable Commit `a3a3d8c7` introduced a segfault bug while checking for 'dc->vmsd->unmigratable' which caused QEMU to crash when trying to add devices which do no set their 'dc->vmsd' yet while initialization. Place a 'dc->vmsd' check prior to it so that we do not segfault for such devices. NOTE: This doesn't compromise the functioning of --only-migratable option as all the unmigratable devices do set their 'dc->vmsd'. Introduce a new function check_migratable() and move the only_migratable check inside it, also use stubs to avoid user-mode qemu build failures. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1487009088-23891-1-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:30:22 +00:00
Halil Pasic	07d4e69147	migration/vmstate: fix array of ptr with nullptrs Make VMS_ARRAY_OF_POINTER cope with null pointers. Previously the reward for trying to migrate an array with some null pointers in it was an illegal memory access, that is a swift and painless death of the process. Let's make vmstate cope with this scenario. The general approach is, when we encounter a null pointer (element), instead of following the pointer to save/load the data behind it, we save/load a placeholder. This way we can detect if we expected a null pointer at the load side but not null data was saved instead. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Guenther Hutzl <hutzl@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170222160119.52771-4-pasic@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:29:00 +00:00
Halil Pasic	cbfda0e6cf	migration/vmstate: split up vmstate_base_addr Currently vmstate_base_addr does several things: it pinpoints the field within the struct, possibly allocates memory and possibly does the first pointer dereference. Obviously allocation is needed only for load. Let us split up the functionality in vmstate_base_addr and move the address manipulations (that is everything but the allocation logic) to load and save so it becomes more obvious what is actually going on. Like this all the address calculations (and the handling of the flags controlling these) is in one place and the sequence is more obvious. The newly introduced function vmstate_handle_alloc also fixes the allocation for the unused VMS_VBUFFER\|VMS_MULTIPLY\|VMS_ALLOC scenario and is substantially simpler than the original vmstate_base_addr. In load and save some asserts are added so it's easier to debug situations where we would end up with a null pointer dereference. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170222160119.52771-3-pasic@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:29:00 +00:00
Halil Pasic	e84641f73d	migration/vmstate: renames in (load\|save)_state The vmstate_(load\|save)_state start out with an a void opaque pointing to some struct, and manipulate one or more elements of one field within that struct. First the field within the struct is pinpointed as opaque + offset, then if this is a pointer the pointer is dereferenced to obtain a pointer to the first element of the vmstate field. Pointers to further elements if any are calculated as first_element + i element_size (where i is the zero based index of the element in question). Currently base_addr and addr is used as a variable name for the pointer to the first element and the pointer to the current element being processed. This is suboptimal because base_addr is somewhat counter-intuitive (because obtained as base + offset) and both base_addr and addr not very descriptive (that we have a pointer should be clear from the fact that it is declared as a pointer). Let make things easier to understand by renaming base_addr to first_elem and addr to curr_elem. This has the additional benefit of harmonizing with other names within the scope (n_elems, vmstate_n_elems). Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170222160119.52771-2-pasic@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:29:00 +00:00
Daniel Henrique Barboza	87c9cc1c30	Changing error message of QMP 'migrate_set_downtime' to seconds Using QMP, the error message of 'migrate_set_downtime' was displaying the values in milliseconds, being misleading with the command that accepts the value in seconds: { "execute": "migrate_set_downtime", "arguments": {"value": 3000}} {"error": {"class": "GenericError", "desc": "Parameter 'downtime_limit' expects an integer in the range of 0 to 2000000 milliseconds"}} This message is also seen in HMP when trying to set the same parameter: (qemu) migrate_set_parameter downtime-limit 3000000 Parameter 'downtime_limit' expects an integer in the range of 0 to 2000000 milliseconds To allow for a proper error message when using QMP, a validation of the user input was added in 'qmp_migrate_set_downtime'. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> Message-Id: <20170222151729.5812-1-danielhb@linux.vnet.ibm.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-28 11:29:00 +00:00
Dr. David Alan Gilbert	bcf4513129	migration: Add VMSTATE_WITH_TMP VMSTATE_WITH_TMP is for handling structures where some calculation or rearrangement of the data needs to be performed before the data hits the wire. For example, where the value on the wire is an offset from a non-migrated base, but the data in the structure is the actual pointer. To use it, a temporary type is created and a vmsd used on that type. The first element of the type must be 'parent' a pointer back to the type of the main structure. VMSTATE_WITH_TMP takes care of allocating and freeing the temporary before running the child vmsd. The post_load/pre_save on the child vmsd can copy things from the parent to the temporary using the parent pointer and do any other calculations needed; it can then use normal VMSD entries to do the actual data storage without having to fiddle around with qemu_get_/qemu_put_ Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170203160651.19917-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:14 +00:00
zhanghailiang	a8664ba510	COLO: Don't process failover request while loading VM's state We should not do failover work while the main thread is loading VM's state. Otherwise the consistent of VM's memory and device state will be broken. We will restart the loading process after jump over the stage, The new failover status 'RELAUNCH' will help to record if we need to restart the process. Cc: Eric Blake <eblake@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1484657864-21708-4-git-send-email-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Added a missing '(Since 2.9)'	2017-02-13 17:27:13 +00:00
zhanghailiang	c937b9a6db	COLO: Shutdown related socket fd while do failover If the net connection between primary host and secondary host breaks while COLO/COLO incoming threads are doing read() or write(). It will block until connection is timeout, and the failover process will be blocked because of it. So it is necessary to shutdown all the socket fds used by COLO to avoid this situation. Besides, we should close the corresponding file descriptors after failvoer BH shutdown them, Or there will be an error. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1484657864-21708-3-git-send-email-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
zhanghailiang	479125d53e	COLO: fix setting checkpoint-delay not working properly If we set checkpoint-delay through command 'migrate-set-parameters', It will not take effect until we finish last sleep chekpoint-delay, That's will be offensive espeically when we want to change its value from an extreme big one to a proper value. Fix it by using timer to realize checkpoint-delay. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <1484657864-21708-2-git-send-email-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Halil Pasic	59046ec29a	migration: consolidate VMStateField.start The member VMStateField.start is used for two things, partial data migration for VBUFFER data (basically provide migration for a sub-buffer) and for locating next in QTAILQ. The implementation of the VBUFFER feature is broken when VMSTATE_ALLOC is used. This however goes unnoticed because actually partial migration for VBUFFER is not used at all. Let's consolidate the usage of VMStateField.start by removing support for partial migration for VBUFFER. Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com> Message-Id: <20170203175217.45562-1-pasic@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Ashijeet Acharya	0827b9e97d	migrate: Introduce zero RAM checks to skip RAM migration Migration of a "none" machine with no RAM crashes abruptly as bitmap_new() fails and thus aborts. Instead place zero RAM checks at appropriate places to skip migration of RAM in this case and complete migration successfully for devices only. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1486564125-31366-1-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Pavel Butsykin	ced1c6166e	migration: discard non-dirty ram pages after the start of postcopy After the start of postcopy migration there are some non-dirty pages which have already been migrated. These pages are no longer needed on the source vm so that we can free them and it doen't hurt to complete the migration. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Message-Id: <20170203152321.19739-4-pbutsykin@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Pavel Butsykin	53f09a1076	add 'release-ram' migrate capability This feature frees the migrated memory on the source during postcopy-ram migration. In the second step of postcopy-ram migration when the source vm is put on pause we can free unnecessary memory. It will allow, in particular, to start relaxing the memory stress on the source host in a load-balancing scenario. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Message-Id: <20170203152321.19739-3-pbutsykin@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manually merged in Pavel's 'migration: madvise error_report fixup!'	2017-02-13 17:27:13 +00:00
Pavel Butsykin	9eb1476610	migration: add MigrationState arg for ram_save_/compressed_/page() Cosmetic patch. The use of ms variable instead of migrate_get_current() looks nicer, especially when there reuse. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Message-Id: <20170203152321.19739-2-pbutsykin@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-02-13 17:27:13 +00:00
Dr. David Alan Gilbert	ef8d6488d2	postcopy: Recover block devices on early failure An early postcopy failure can be recovered from as long as we know we haven't sent the command to run the destination. We have to undo the bdrv_inactivate_all by calling bdrv_invalidate_cache_all Note that I'm not using ms->block_inactive because once we've sent the postcopy package we dont want anything else to try and recover the block storage on the source; the destination might have started writing to it. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170202155909.31784-3-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-02-06 13:36:49 +01:00
Dr. David Alan Gilbert	328d4d8528	Postcopy: Reset state to avoid cleanup assert On a destination host with no userfault support an incoming postcopy would cause the state to enter ADVISE before it realised there was no support, and because it was in ADVISE state it would perform a cleanup at the end. Since there was no support the cleanup function should be unreachable, but ends up being called and asserting. Reset the state when we realise we have no support, thus the cleanup doesn't happen. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20170202155909.31784-2-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-02-06 13:36:49 +01:00
Dr. David Alan Gilbert	581f08bac2	migration: Check for ID length The qdev id of a device can be huge if it's on the end of a chain of bridges; in reality such chains shouldn't occur but they can be made to by chaining PCIe bridges together. The migration format has a number of 256 character long format limits; check we don't hit them (we already use pstrcat/cpy but that just protects us from buffer overruns, we fairly quickly hit an assert). Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170202125956.21942-3-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-02-06 13:36:49 +01:00
Dr. David Alan Gilbert	bc5c4f2196	vmstate_register_with_alias_id: Take an Error ** I'll be adding an error to it in a subsequent patch. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20170202125956.21942-2-dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-02-06 13:36:49 +01:00
Juan Quintela	b4b076daf3	migration: create Migration Incoming State at init time Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1485207141-1941-3-git-send-email-quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2017-02-06 13:36:49 +01:00
Stefan Hajnoczi	7f4076c1bb	trace: clean up trace-events files There are a number of unused trace events that scripts/cleanup-trace-events.pl finds. The "hw/vfio/pci-quirks.c" filename was typoed and "qapi/qapi-visit-core.c" was missing the qapi/ directory prefix. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20170126171613.1399-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-01-31 17:12:15 +00:00
Pavel Dovgalyuk	ac8c19ba74	savevm: add public save_vmstate function This patch introduces save_vmstate function to allow saving and loading vmstates from the replay module. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20170124071741.4572.13714.stgit@PASHA-ISP> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-27 18:07:30 +01:00
Dr. David Alan Gilbert	46a02a7451	migration/tracing: Add tracing on save Add some tracing to vmstate_subsection_save and vmstate_save_state to help in debugging when you're not sure if a conditional piece of data is being saved. In vmstate_subsection_save I renamed the inner vmsd to avoid the aliasing and be able to print both names. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20161212125838.14425-1-dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 18:00:32 +00:00
Juan Quintela	55c4446b8b	migration: transform remaining DPRINTF into trace_ So we can remove DPRINTF() macro Signed-off-by: Juan Quintela <quintela@redhat.com> Message-Id: <1485207141-1941-2-git-send-email-quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Fixed up 'remained/remaining' as requested by Eric	2017-01-24 18:00:31 +00:00
Pankaj Gupta	009fad7f4c	migration: Change name of live migration thread Change the name of live migration thread from 'migration' to 'live_migration' to identify it clearly. 'migration' is a generic word and kernel also has tasks for process migration with the name 'migration/cpu#'. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Message-Id: <1485178976-15225-1-git-send-email-pagupta@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 18:00:31 +00:00
zhanghailiang	1d2acc3162	migration: re-active images while migration been canceled after inactive them commit `fe904ea824` fixed a case which migration aborted QEMU because it didn't regain the control of images while some errors happened. Actually, there are another two cases can trigger the same error reports: " bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed", Case 1, codes path: migration_thread() migration_completion() bdrv_inactivate_all() ----------------> inactivate images qemu_savevm_state_complete_precopy() socket_writev_buffer() --------> error because destination fails qemu_fflush() ----------------> set error on migration stream -> qmp_migrate_cancel() ----------------> user cancelled migration concurrently -> migrate_set_state() ------------------> set migrate CANCELLIN migration_completion() -----------------> go on to fail_invalidate if (s->state == MIGRATION_STATUS_ACTIVE) -> Jump this branch Case 2, codes path: migration_thread() migration_completion() bdrv_inactivate_all() ----------------> inactivate images migreation_completion() finished -> qmp_migrate_cancel() ---------------> user cancelled migration concurrently qemu_mutex_lock_iothread(); qemu_bh_schedule (s->cleanup_bh); As we can see from above, qmp_migrate_cancel can slip in whenever migration_thread does not hold the global lock. If this happens after bdrv_inactive_all() been called, the above error reports will appear. To prevent this, we can call bdrv_invalidate_cache_all() in qmp_migrate_cancel() directly if we find images become inactive. Besides, bdrv_invalidate_cache_all() in migration_completion() doesn't have the protection of big lock, fix it by add the missing qemu_mutex_lock_iothread(); Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <1485244792-11248-1-git-send-email-zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 18:00:31 +00:00
Ashijeet Acharya	b67b8c3a9d	migration: Fail migration blocker for --only-migratable migrate_add_blocker should rightly fail if the '--only-migratable' option was specified and the device in use should not be able to perform the action which results in an unmigratable VM. Make migrate_add_blocker return -EACCES in this case. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-6-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 18:00:31 +00:00
Ashijeet Acharya	fe44dc9180	migration: disallow migrate_add_blocker during migration If a migration is already in progress and somebody attempts to add a migration blocker, this should rightly fail. Add an errp parameter and a retcode return value to migrate_add_blocker. Signed-off-by: John Snow <jsnow@redhat.com> Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Message-Id: <1484566314-3987-5-git-send-email-ashijeetacharya@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Greg Kurz <groug@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Merged with recent 'Allow invtsc migration' change	2017-01-24 18:00:30 +00:00
Jianjun Duan	cde7ee5974	migration: add error_report Added error_report where version_ids do not match in vmstate_load_state. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> Message-Id: <1484852453-12728-5-git-send-email-duanj@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 17:54:47 +00:00
Jianjun Duan	94869d5c52	migration: migrate QTAILQ Currently we cannot directly transfer a QTAILQ instance because of the limitation in the migration code. Here we introduce an approach to transfer such structures. We created VMStateInfo vmstate_info_qtailq for QTAILQ. Similar VMStateInfo can be created for other data structures such as list. When a QTAILQ is migrated from source to target, it is appended to the corresponding QTAILQ structure, which is assumed to have been properly initialized. This approach will be used to transfer pending_events and ccs_list in spapr state. We also create some macros in qemu/queue.h to access a QTAILQ using pointer arithmetic. This ensures that we do not depend on the implementation details about QTAILQ in the migration code. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> Message-Id: <1484852453-12728-3-git-send-email-duanj@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 17:54:47 +00:00
Jianjun Duan	2c21ee769e	migration: extend VMStateInfo Current migration code cannot handle some data structures such as QTAILQ in qemu/queue.h. Here we extend the signatures of put/get in VMStateInfo so that customized handling is supported. put now will return int type. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jianjun Duan <duanj@linux.vnet.ibm.com> Message-Id: <1484852453-12728-2-git-send-email-duanj@linux.vnet.ibm.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-24 17:54:47 +00:00
Daniel P. Berrange	60e705c51c	io: change the QIOTask callback signature Currently the QIOTaskFunc signature takes an Object * for the source, and an Error * for any error. We also need to be able to provide a result pointer. Rather than continue to add parameters to QIOTaskFunc, remove the existing ones and simply pass the QIOTask object instead. This has methods to access all the other data items required in the callback impl. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2017-01-23 15:32:18 +00:00
Paolo Bonzini	a15215f3e1	build: remove --enable-colo/--disable-colo No need to provide this knob, so remove it and stubs/migration-colo.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-01-16 17:52:35 +01:00
Peter Xu	f37bc03623	migration: allow to prioritize save state entries During migration, save state entries are saved/loaded without a specific order - we just traverse the savevm_state.handlers list and do it one by one. This might not be enough. There are requirements that we need to load specific device's vmstate first before others. For example, VT-d IOMMU contains DMA address remapping information, which is required by all the PCI devices to do address translations. We need to make sure IOMMU's device state is loaded before the rest of the PCI devices, so that DMA address translation can work properly. This patch provide a VMStateDescription.priority value to allow specify the priority of the saved states. The loadvm operation will be done with those devices with higher vmsd priority. Before this patch, we are possibly achieving the ordering requirement by an assumption that the ordering will be the same with the ordering that objects are created. A better way is to mark it out explicitly in the VMStateDescription table, like what this patch does. Current ordering logic is still naive and slow, but after all that's not a critical path so IMO it's a workable solution for now. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2017-01-10 05:56:57 +02:00
Thomas Huth	5c90308f07	migration: Fix return code of ram_save_iterate() qemu_savevm_state_iterate() expects the iterators to return 1 when they are done, and 0 if there is still something left to do. However, ram_save_iterate() does not obey this rule and returns the number of saved pages instead. This causes a fatal hang with ppc64 guests when you run QEMU like this (also works with TCG): qemu-img create -f qcow2 /tmp/test.qcow2 1M qemu-system-ppc64 -nographic -nodefaults -m 256 \ -hda /tmp/test.qcow2 -serial mon:stdio ... then switch to the monitor by pressing CTRL-a c and try to save a snapshot with "savevm test1" for example. After the first iteration, ram_save_iterate() always returns 0 here, so that qemu_savevm_state_iterate() hangs in an endless loop and you can only "kill -9" the QEMU process. Fix it by using proper return values in ram_save_iterate(). Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-11-14 19:35:41 +01:00
zhanghailiang	fe39a4d440	migration: fix missing assignment for has_x_checkpoint_delay We forgot to assign true to params->has_x_checkpoint_delay parameter in qmp_query_migrate_parameters. Without this, qmp command 'query-migrate-parameters' doesn't show the default value for x-checkpoint-delay option. This also fixes the fact that HMP was relying on unspecified behavior by reading x_checkpoint_delay without checking has_x_checkpoint_delay. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-11-14 14:50:56 +01:00
Jeff Cody	02ba9265e8	migration: fix compiler warning on uninitialized variable Some older GCC versions (e.g. 4.4.7) report a warning on an uninitialized variable for 'request', even though all possible code paths that reference 'request' will be initialized. To appease these versions, initialize the variable to 0. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-id: 259818682e41b95ae60f1423b87954a3fe377639.1477950393.git.jcody@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-11-01 09:31:53 +00:00
Peter Maydell	eab9e9629c	Migration bits from the COLO project -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQItBAABCAAXBQJYFc37EBxhbWl0QGtlcm5lbC5vcmcACgkQ6wtN/GV+9nDdqQ/9 H+di+q/zF6aOEuXRCN0ud9R81fZ7nLve3e9EbKCNZXnxN3M3+zQj0At+e/SBEoTc rRwqJmNlRX3TWViWsAYmddgtopZ9R9DWwW/VsKjS2Ng230YShrA/o20hu2dJkwl3 CN4vAObzc/gxM59NWUlMnTXOG+Z9fI1NlEf0vZ2484a59KPVIE7W1zZccT1F8MNq sfa3RlOGBchNO2Rfzrr6cFGGH7UTfWiftPs7EDOfN/YBcpld6V4DfPWdPP8r2DSX gG7D3DJuuqPxZCjl0Nm1OjaIunYfVrRpMPMeNyo1+kTVbvvhDPFjah/MSWu8XJ2c N5lSqikIAWfMbQVW9gDpQ0495eRQTWA7VIlCbwN6mqNxyBvQMA+licFq6UFFrCMp quC3gO+daz3fKvUhi23TpebbqKLHA5OZA5ZvkjGDgkKPMCJyQRBZHLb81t5xsulZ cXuzAOeRcXK9aEJvcrDgWwxzi3PN8zg74RF1ZV8gxM4DkHKohlsnbgshWqGkFh3M +S5tEPqVlOlDO9juf6rlwNnVbWhFDFEGMKjI9XMTWwVWTREbqyP86fP66h2C4qc6 34yAHi5G2i43dxzHgpHC0MpU0XenO0EYdu+8Tcx35LSBSfkOeD7pU1DeZQgbI51m ZQSnLDJqv2HVdoZT7vaIjwuhtEl84xqelFdVg+cY7Gw= =sur7 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/amit-migration/tags/migration-for-2.8' into staging Migration bits from the COLO project # gpg: Signature made Sun 30 Oct 2016 10:39:55 GMT # gpg: using RSA key 0xEB0B4DFC657EF670 # gpg: Good signature from "Amit Shah <amit@amitshah.net>" # gpg: aka "Amit Shah <amit@kernel.org>" # gpg: aka "Amit Shah <amitshah@gmx.net>" # Primary key fingerprint: 48CA 3722 5FE7 F4A8 B337 2735 1E9A 3B5F 8540 83B6 # Subkey fingerprint: CC63 D332 AB8F 4617 4529 6534 EB0B 4DFC 657E F670 * remotes/amit-migration/tags/migration-for-2.8: MAINTAINERS: Add maintainer for COLO framework related files configure: Support enable/disable COLO feature docs: Add documentation for COLO feature COLO: Implement failover work for secondary VM COLO: Implement the process of failover for primary VM COLO: Introduce state to record failover process COLO: Add 'x-colo-lost-heartbeat' command to trigger failover COLO: Synchronize PVM's state to SVM periodically COLO: Add checkpoint-delay parameter for migrate-set-parameters COLO: Load VMState into QIOChannelBuffer before restore it COLO: Send PVM state to secondary side when do checkpoint COLO: Add a new RunState RUN_STATE_COLO COLO: Introduce checkpointing protocol COLO: Establish a new communicating path for COLO migration: Switch to COLO process after finishing loadvm migration: Enter into COLO mode after migration if COLO is enabled COLO: migrate COLO related info to secondary node migration: Introduce capability 'x-colo' to migration Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-31 13:06:38 +00:00
Peter Maydell	277d44f5a6	trivial patches for 2016-10-28 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCAAGBQJYE2wfAAoJEHAbT2saaT5ZGYUH/3QWJ4OFWbqGo1YYN5AIAheF v1bQGTh1HGbLk46ajhUvzB0bMHb1FC1KoOruU2wFYuKK/J5zQ+4X9EmaC/fD7hyx nGTcPWAyxKOlqOq3In9ro+xWQNzEhfoypKCQQVC4Y3quzub48wAro8fuFSNXLyBq ERvAsjgj0TrLEHoWtJl2bPYiqSd6KAHZAKPFW3Jw8MmsBcTLmnF2PVW3LBfdcHe7 6vlhqX7lPzVlHRaUsaxRkFxYd2YGisbe3bPRDw2fTxrtOYyEkopQq7xi2Q6Yq5N0 z0yM2oJ7o1QtUOXYa7KBf03WZ7e119HimaUkGLg+0LVhQNbeG3hd3gNwApXa5og= =tYml -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mjt/tags/trivial-patches-fetch' into staging trivial patches for 2016-10-28 # gpg: Signature made Fri 28 Oct 2016 16:17:51 BST # gpg: using RSA key 0x701B4F6B1A693E59 # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" # gpg: aka "Michael Tokarev <mjt@corpit.ru>" # gpg: aka "Michael Tokarev <mjt@debian.org>" # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * remotes/mjt/tags/trivial-patches-fetch: (23 commits) Fix build for less common build directories names clean-up: removed duplicate #includes scripts/clean-includes: added duplicate #include check monitor: deprecate 'default' option qemu-ga: Remove stray 'q' in documentation Makefile: Fix help text for target 'installer' s390: avoid always-true comparison in s390_pci_generate_fid() migration: Remove unneeded NULL check from migrate_fd_error() scripts/hxtool: fix undefined behavour of echo qemu-options.hx: set: fix copy-paste error usb: Change *_exitfn return type from int to void MAINTAINERS: qemu-trivial information colo-compare: remove unused struct CompareChardevProps and 'props' variable milkymist-pfpu: fix potential integer overflow hw/block/nvme: Simplify if-statements a little bit target-lm32: rewrite gen_compare() lm32: milkymist-tmu2: fix integer overflow target-lm32: disable asm logging via LOG_DIS() target-lm32: swap operand of wcsr in LOG_DIS() target-lm32: fix LOG_DIS operand order ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-31 11:58:30 +00:00
zhanghailiang	180fb75000	configure: Support enable/disable COLO feature configure --enable-colo/--disable-colo to switch COLO support on/off. COLO feature doesn't depend on any other external libraries, So here it is reasonable to enable COLO by default, to avoid re-compile QEMU if users want to use this capability. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	9d2db3760b	COLO: Implement failover work for secondary VM If users require SVM to takeover work, COLO incoming thread should exit from loop while failover BH helps backing to migration incoming coroutine. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	b3f7f0c5e6	COLO: Implement the process of failover for primary VM For primary side, if COLO gets failover request from users. To be exact, gets 'x_colo_lost_heartbeat' command. COLO thread will exit the loop while the failover BH does the cleanup work and resumes VM. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	aef060850b	COLO: Introduce state to record failover process When handling failover, COLO processes differently according to the different stage of failover process, here we introduce a global atomic variable to record the status of failover. We add four failover status to indicate the different stage of failover process. You should use the helpers to get and set the value. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	d89e666e06	COLO: Add 'x-colo-lost-heartbeat' command to trigger failover We leave users to choose whatever heartbeat solution they want, if the heartbeat is lost, or other errors they detect, they can use experimental command 'x_colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations accordingly. For example, if the command is sent to the Primary side, the Primary side will exit COLO mode, does cleanup work, and then, PVM will take over the service work. If sent to the Secondary side, the Secondary side will run failover work, then takes over PVM's service work. Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	18cc23d72c	COLO: Synchronize PVM's state to SVM periodically Do checkpoint periodically, the default interval is 200ms. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	68b5359187	COLO: Add checkpoint-delay parameter for migrate-set-parameters Add checkpoint-delay parameter for migrate-set-parameters, so that we can control the checkpoint frequency when COLO is in periodic mode. Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	4291d372e2	COLO: Load VMState into QIOChannelBuffer before restore it We should not destroy the state of SVM (Secondary VM) until we receive the complete data of PVM's state, in case the primary fails in the process of sending the state, so we cache the VM's state in secondary side before load it into SVM. Besides, we should call qemu_system_reset() before load VM state, which can ensure the data is intact. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	a91246c95f	COLO: Send PVM state to secondary side when do checkpoint VM checkpointing is to synchronize the state of PVM to SVM, just like migration does, we re-use save helpers to achieve migrating PVM's state to Secondary side. COLO need to cache the data of VM's state in the secondary side before synchronize it to SVM. COLO need the size of the data to determine how much data should be read in the secondary side. So here, we can get the size of the data by saving it into I/O channel before send it to the secondary side. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	4f97558e10	COLO: Introduce checkpointing protocol We need communications protocol of user-defined to control the checkpointing process. The new checkpointing request is started by Primary VM, and the interactive process like below: Checkpoint synchronizing points: Primary Secondary initial work 'checkpoint-ready' <-------------------- @ 'checkpoint-request' @ --------------------> Suspend (Only in hybrid mode) 'checkpoint-reply' <-------------------- @ Suspend&Save state 'vmstate-send' @ --------------------> Send state Receive state 'vmstate-received' <-------------------- @ Release packets Load state 'vmstate-load' <-------------------- @ Resume Resume (Only in hybrid mode) Start Comparing (Only in hybrid mode) NOTE: 1) '@' who sends the message 2) Every sync-point is synchronized by two sides with only one handshake(single direction) for low-latency. If more strict synchronization is required, a opposite direction sync-point should be added. 3) Since sync-points are single direction, the remote side may go forward a lot when this side just receives the sync-point. 4) For now, we only support 'periodic' checkpoint, for which the Secondary VM is not running, later we will support 'hybrid' mode. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Cc: Eric Blake <eblake@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	56ba83d2a8	COLO: Establish a new communicating path for COLO This new communication path will be used for returning messages from Secondary side to Primary side. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	25d0c16f62	migration: Switch to COLO process after finishing loadvm Switch from normal migration loadvm process into COLO checkpoint process if COLO mode is enabled. We add three new members to struct MigrationIncomingState, 'have_colo_incoming_thread' and 'colo_incoming_thread' record the COLO related thread for secondary VM, 'migration_incoming_co' records the original migration incoming coroutine. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	0b827d5e72	migration: Enter into COLO mode after migration if COLO is enabled Add a new migration state: MIGRATION_STATUS_COLO. Migration source side enters this state after the first live migration successfully finished if COLO is enabled by command 'migrate_set_capability x-colo on'. We reuse migration thread, so the process of checkpointing will be handled in migration thread. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	5821ebf93b	COLO: migrate COLO related info to secondary node We can determine whether or not VM in destination should go into COLO mode by referring to the info that was migrated. We skip this section if COLO is not enabled (i.e. migrate_set_capability colo off), so that, It doesn't break compatibility with migration no matter whether users configure the --enable-colo/disable-colo on the source/destination side or not; Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
zhanghailiang	35a6ed4f71	migration: Introduce capability 'x-colo' to migration We add helper function colo_supported() to indicate whether colo is supported or not, with which we use to control whether or not showing 'x-colo' string to users, they can use qmp command 'query-migrate-capabilities' or hmp command 'info migrate_capabilities' to learn if colo is supported. The default value for COLO (COarse-Grain LOck Stepping) is disabled. Cc: Juan Quintela <quintela@redhat.com> Cc: Amit Shah <amit.shah@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Amit Shah <amit@amitshah.net>	2016-10-30 15:17:39 +05:30
Peter Maydell	25174055f4	migration: Remove unneeded NULL check from migrate_fd_error() All the callers of migrate_fd_error() pass a non-NULL error parameter, and if any did pass NULL then we would segfault in error_copy(), so remove the unnecessary NULL check earlier in the function. (Spotted by Coverity.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-10-28 18:17:23 +03:00
Peter Maydell	01b601f061	Merge qio 2016/10/27 v1 -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJYEfjrAAoJEL6G67QVEE/fdU4P/i7yBJo436OpkdgeWS8AWuFr ptZ+Fj/weGka5GU9E3KQu36kbSgrtfcgwTHphCMXnZ0YCeKQDuM57f7LNiN6qheB nqgJvJioLbUvLTQvCHOISM7bWOnYvASBmYtLJFtUcP/jhdOy61KaADnJ+7MbliNv yJSW2RN+s/y9nUb+dxEpIXXUVMRa6BX+wHW3O44c1oLn6/Pe20aJeHTyDx3qiBhD 8RYXUgRZopH2bouBSzXgMQTbn/QMD/dC81WQlHKlt4swffyei2D/1pciOcuc0SXz +SZdkTre5JB5Kd6DU8zQ6PrrIt1nPmLSptSyhQvNxm+uWNWHnFcW1s2aYmf/ikjl 4boW37ayJx09mns8yv7TerzEPbL5qJvVX8Dsnb6telkvrS9hy9S1xuIB5xHbt6/h vwFmCdwaZoGpDDaoXRL+9k9TOI9BbEMKX33nAPDqvEXLMIf+og4fmweTKcY4XTRL /Fdg1H71v8Ayv+r5TJOKwFg3PNNjnvqkbk1psS+aaW7dup43iaYGIKWy+VFaCufk hPXLOtR5lUsYC2qm+nkjPIgoP7D8oZx4AGkCHbYsqzi+l1lynZH3rBIs8ggLr72o FFk4g0sNYe1ccAa89jFEgWIQbS0N6ckUXCv12g3eyF/UIC1F35/mGGugSRnTXuc2 a/WsvgU7pGBrtqXcg7lF =gsxL -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2016-10-27-1' into staging Merge qio 2016/10/27 v1 # gpg: Signature made Thu 27 Oct 2016 13:54:03 BST # gpg: using RSA key 0xBE86EBB415104FDF # gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" # gpg: aka "Daniel P. Berrange <berrange@redhat.com>" # Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF * remotes/berrange/tags/pull-qio-2016-10-27-1: main: set names for main loop sources created vnc: set name for all I/O channels created migration: set name for all I/O channels created char: set name for all I/O channels created nbd: set name for all I/O channels created io: add ability to set a name for IO channels io: Add a QIOChannelSocket cleanup test io: set LISTEN flag explicitly for listen sockets io: Introduce a qio_channel_set_feature() helper io: Use qio_channel_has_feature() where applicable io: Fix double shift usages on QIOChannel features Conflicts: qemu-char.c Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-28 15:30:55 +01:00
Daniel P. Berrange	6f01f136af	migration: set name for all I/O channels created Ensure that all I/O channels created for migration are given names to distinguish their respective roles. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>	2016-10-27 09:13:10 +02:00
Peter Maydell	59811a320d	migration/savevm.c: migrate non-default page size Add a subsection to vmstate_configuration which is present only if the guest is using a target page size which is different from the default. This allows us to helpfully diagnose attempts to migrate between machines which are using different target page sizes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net>	2016-10-24 16:26:50 +01:00
Vijaya Kumar K	adb65dec4b	migration: Remove static allocation of xzblre cache buffer Allocate xzblre zero page cache buffer dynamically. Remove dependency on TARGET_PAGE_SIZE to make run-time page size detection for arm platforms. Signed-off-by: Vijaya Kumar K <vijayak@cavium.com> Message-id: 1465808915-4887-2-git-send-email-vijayak@caviumnetworks.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-10-24 16:26:49 +01:00
Ashijeet Acharya	2ff3025797	migrate: move max-bandwidth and downtime-limit to migrate_set_parameter Mark the old commands 'migrate_set_speed' and 'migrate_set_downtime' as deprecated. Move max-bandwidth and downtime-limit into migrate-set-parameters for setting maximum migration speed and expected downtime limit parameters respectively. Change downtime units to milliseconds (only for new-command) and set its upper bound limit to 2000 seconds. Update the query part in both hmp and qmp qemu control interfaces. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Dr. David Alan Gilbert	9308ae5485	migration: Fix seg with missing port The command : migrate tcp:localhost: currently segs; fix it so it now says: error parsing address 'localhost:' and the same for -incoming. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Dr. David Alan Gilbert	5cf0f48d2a	migration/postcopy: Explicitly disallow huge pages At the moment postcopy will fail as soon as qemu tries to register userfault on the RAMBlock pages that are backed by hugepages. However, the kernel is going to get userfault support for hugepage at some point, and we've not got the rest of the QEMU code to support it yet, so fail neatly with an error like: Postcopy doesn't support hugetlbfs yet (/objects/mem1) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Dr. David Alan Gilbert	2ebeaec012	Postcopy vs xbzrle: Don't send xbzrle pages once in postcopy [for 2.8] xbzrle relies on reading pages that have already been sent to the destination and then applying the modifications; we can't do that in postcopy because the destination may well have modified the page already or the page has been discarded. I already didn't allow reception of xbzrle pages, but I forgot to add the test to stop them being sent. Enabling both xbzrle and postcopy can make some sense; if you think that your migration might finish if you have xbzrle, then when it doesn't complete you flick over to postcopy and stop xbzrle'ing. This corresponds to RH bug: https://bugzilla.redhat.com/show_bug.cgi?id=1368422 Symptom is: Unknown combination of migration flags: 0x60 (postcopy mode) (either 0x60 or 0x40) Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Ashijeet Acharya	091ecc8b69	migrate: Fix bounds check for migration parameters in migration.c This patch fixes the out-of-bounds check of migration parameters in qmp_migrate_set_parameters() for cpu-throttle-initial and cpu-throttle-increment by adding a return statement for both as they were broken since their introduction in 2.5 via commit `1626fee`. Due to the missing return statements, parameters were getting set to out-of-bounds values despite the error. Signed-off-by: Ashijeet Acharya <ashijeetacharya@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Eric Blake	7f375e0446	migrate: Use boxed qapi for migrate-set-parameters Now that QAPI makes it easy to pass a struct around, we don't have to declare as many parameters or local variables. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Eric Blake	de63ab6124	migrate: Share common MigrationParameters struct It is rather verbose, and slightly error-prone, to repeat the same set of parameters for input (migrate-set-parameters) as for output (query-migrate-parameters), where the only difference is whether the members are optional. We can just document that the optional members will always be present on output, and then share a common struct between both commands. The next patch can then reduce the amount of code needed on input. Also, we made a mistake in qemu 2.7 of returning an empty string during 'query-migrate-parameters' when there is no TLS, rather than omitting TLS details entirely. Technically, this change risks breaking any 2.7 client that is hard-coded to expect the parameter's existence; on the other hand, clients that are portable to 2.6 already must be prepared for those members to not be present. And this gets rid of yet one more place where the QMP output visitor is silently converting a NULL string into "" (which is a hack I ultimately want to kill off). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:23:53 +02:00
Dr. David Alan Gilbert	cd5ea07064	migration/rdma: Don't flag an error when we've been told about one If the other side tells us there's been an error and we fail the migration, we don't need to signal that failure to the other side because it already knew. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael R. Hines <michael@hinespot.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:22:38 +02:00
Dr. David Alan Gilbert	ccb783c312	migration: Make failed migration load set file error If an error occurs in a section load, set the file error flag so that the transport can get notified to do a cleanup. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael R. Hines <michael@hinespot.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:22:38 +02:00
Dr. David Alan Gilbert	12c67ffb1f	migration/rdma: Pass qemu_file errors across link If we fail for some reason (e.g. a mismatched RAMBlock) and it's set the qemu_file error flag, pass that error back to the peer so it can clean up rather than waiting for some higher level progress. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael R. Hines <michael@hinespot.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:22:38 +02:00
Dr. David Alan Gilbert	49228e17ed	migration: Report values for comparisons Report the values when a comparison fails; together with the previous patch that prints the device and field names this should give a good idea of why loading the migration failed. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:22:38 +02:00
Dr. David Alan Gilbert	a1771070e7	migration: report an error giving the failed field When a field fails to load (typically due to a limit check, or a call to a get/put) report the device and field to give an indication of the cause. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2016-10-13 17:22:38 +02:00
Paolo Bonzini	9c1f8f4493	migration: sync all address spaces Migrating a VM during reboot sometimes results in differences between the source and destination in the SMRAM area. This is because migration_bitmap_sync() only fetches from KVM the dirty log of address_space_memory. SMRAM memory slots are ignored and the modifications to SMRAM are not sent to the destination. Reported-by: He Rongguang <herongguang.he@huawei.com> Reviewed-by: He Rongguang <herongguang.he@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-27 11:57:29 +02:00
Richard Henderson	a1febc4950	cutils: Export only buffer_is_zero Since the two users don't make use of the returned offset, beyond ensuring that the entire buffer is zero, consider the can_use_buffer_find_nonzero_offset and buffer_find_nonzero_offset functions internal. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Message-Id: <1472496380-19706-4-git-send-email-rth@twiddle.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-09-13 19:09:45 +02:00
Laurent Vivier	e723b87103	trace-events: fix first line comment in trace-events Documentation is docs/tracing.txt instead of docs/trace-events.txt. find . -name trace-events -exec \ sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \ {} \; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-08-12 10:36:01 +01:00
Cao jin	474c624ddf	migration/socket: fix typo in file header Code of inet socket & unix socket is merged together. Also add some newlines, make code block well separated. Cc: Daniel P. Berrange <berrange@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Cc: Amit Shah <amit.shah@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Message-Id: <1469696074-12744-4-git-send-email-caoj.fnst@cn.fujitsu.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-08-11 17:03:51 +05:30
Liang Li	787d134fb1	migration: fix live migration failure with compression Because of commit `11808bb0c4`, which remove some condition checks of 'f->ops->writev_buffer', 'qemu_put_qemu_file' should be enhanced to clear the 'f_src->iovcnt', or 'f_src->iovcnt' may exceed the MAX_IOV_SIZE which will break live migration. This should be fixed. Signed-off-by: Liang Li <liang.z.li@intel.com> Reported-by: Jinshi Zhang <jinshi.c.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1470702146-24399-1-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-08-11 16:59:53 +05:30
Evgeny Yakovlev	0e8b3cdfbc	migration: mmap error check fix mmap man page: "On success, mmap() returns a pointer to the mapped area. On error, the value MAP_FAILED (that is, (void *) -1) is returned, and errno is set to indicate the cause of the error." The check in postcopy_get_tmp_page is definitely wrong and should be fixed. Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Amit Shah <amit.shah@redhat.com> Message-Id: <1469785705-16670-1-git-send-email-den@openvz.org> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-08-11 16:59:38 +05:30
Cao jin	e110aa919a	migration/ram: fix typo Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Message-Id: <1469776231-23820-1-git-send-email-caoj.fnst@cn.fujitsu.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-08-11 16:59:33 +05:30
Marc-André Lureau	df37dd6ffe	qjson: free str Release the qstring allocated in qjson_new(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-08-08 00:00:24 +04:00
Dr. David Alan Gilbert	42da5550d6	migration: set state to post-migrate on failure If a migration fails/is cancelled during the postcopy stage we currently end up with the runstate as finish-migrate, where it should be post-migrate. There's a small window in precopy where I think the same thing can happen, but I've never seen it. It rarely matters; the only postcopy case is if you restart a migration, which again is a case that rarely matters in postcopy because it's only safe to restart the migration if you know the destination hasn't been running (which you might if you started the destination with -S and hadn't got around to 'c' ing it before the postcopy failed). Even then it's a small window but potentially you could hit if there's a problem loading the devices on the destination. This corresponds to: https://bugzilla.redhat.com/show_bug.cgi?id=1355683 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Message-Id: <1468601086-32117-1-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-07-22 13:23:09 +05:30
Lin Ma	0c204cc810	hmp: show all of snapshot info on every block dev in output of 'info snapshots' Currently, the output of 'info snapshots' shows fully available snapshots. It's opaque, hides some snapshot information to users. It's not convenient if users want to know more about all of snapshot information on every block device via monitor. Follow Kevin's and Max's proposals, The patch makes the output more detailed: (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- checkpoint-1 165M 2016-05-22 16:58:07 00:02:06.813 List of partial (non-loadable) snapshots on 'drive_image1': ID TAG VM SIZE DATE VM CLOCK 1 snap1 0 2016-05-22 16:57:31 00:01:30.567 Signed-off-by: Lin Ma <lma@suse.com> Message-id: 1467869164-26688-3-git-send-email-lma@suse.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:39 +02:00
Lin Ma	3a1ee71190	hmp: use snapshot name to determine whether a snapshot is 'fully available' Currently qemu uses snapshot id to determine whether a snapshot is fully available, It causes incorrect output in some scenario. For instance: (qemu) info block drive_image1 (#block113): /opt/vms/SLES12-SP1-JeOS-x86_64-GM/disk0.qcow2 (qcow2) Cache mode: writeback drive_image2 (#block349): /opt/vms/SLES12-SP1-JeOS-x86_64-GM/disk1.qcow2 (qcow2) Cache mode: writeback (qemu) (qemu) info snapshots There is no snapshot available. (qemu) (qemu) snapshot_blkdev_internal drive_image1 snap1 (qemu) (qemu) info snapshots There is no suitable snapshot available (qemu) (qemu) savevm checkpoint-1 (qemu) (qemu) info snapshots ID TAG VM SIZE DATE VM CLOCK 1 snap1 0 2016-05-22 16:57:31 00:01:30.567 (qemu) $ qemu-img snapshot -l disk0.qcow2 Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 snap1 0 2016-05-22 16:57:31 00:01:30.567 2 checkpoint-1 165M 2016-05-22 16:58:07 00:02:06.813 $ qemu-img snapshot -l disk1.qcow2 Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 checkpoint-1 0 2016-05-22 16:58:07 00:02:06.813 The patch uses snapshot name instead of snapshot id to determine whether a snapshot is fully available and uses '--' instead of snapshot id in output because the snapshot id is not guaranteed to be the same on all images. For instance: (qemu) info snapshots List of snapshots present on all disks: ID TAG VM SIZE DATE VM CLOCK -- checkpoint-1 165M 2016-05-22 16:58:07 00:02:06.813 Signed-off-by: Lin Ma <lma@suse.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 1467869164-26688-2-git-send-email-lma@suse.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-07-13 13:41:39 +02:00
Paolo Bonzini	0b8b8753e4	coroutine: move entry argument to qemu_coroutine_create In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine co = qemu_coroutine_create(entry); + Coroutine co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-07-13 13:26:02 +02:00
Daniel P. Berrange	521d47c6dc	trace: split out trace events for migration/ directory Move all trace-events for files in the migration/ directory to their own file. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1466066426-16657-6-git-send-email-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-20 17:22:14 +01:00
Liang Li	0d9f9a5c52	migration: code clean up Use 'QemuMutex comp_done_lock' and 'QemuCond comp_done_cond' instead of 'QemuMutex *comp_done_lock' and 'QemuCond comp_done_cond'. To keep consistent with 'QemuMutex decomp_done_lock' and 'QemuCond comp_done_cond'. Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-10-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:31 +05:30
Liang Li	33d151f418	migration: refine the decompression code The current code for multi-thread decompression is not clear, especially in the aspect of using lock. Refine the code to make it clear. Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-9-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:28 +05:30
Liang Li	a7a9a88f9d	migration: refine the compression code The current code for multi-thread compression is not clear, especially in the aspect of using lock. Refine the code to make it clear. Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-8-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:26 +05:30
Liang Li	90e56fb46d	migration: protect the quit flag by lock quit_comp_thread and quit_decomp_thread are accessed by several thread, it's better to protect them with locks. We use a per thread flag to replace the global one, and the new flag is protected by a lock. Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-7-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:23 +05:30
Liang Li	fc50438ed0	migration: refine ram_save_compressed_page Use qemu_put_compression_data to do the compression directly instead of using do_compress_ram_page, avoid some data copy. very small improvement, at the same time, add code to check if the compression is successful. Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-6-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:21 +05:30
Liang Li	b3be28969b	qemu-file: Fix qemu_put_compression_data flaw Current qemu_put_compression_data can only work with no writable QEMUFile, and can't work with the writable QEMUFile. But it does not provide any measure to prevent users from using it with a writable QEMUFile. We should fix this flaw to make it works with writable QEMUFile. Signed-off-by: Liang Li <liang.z.li@intel.com> Suggested-by: Juan Quintela <quintela@redhat.com> Message-Id: <1462433579-13691-5-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:18 +05:30
Liang Li	e7bb92e21a	migration: remove useless code page_buffer is set twice repeatedly, remove the previous set. Signed-off-by: Liang Li <liang.z.li@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1462433579-13691-4-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:14 +05:30
Liang Li	5533b2e9bc	migration: Fix a potential issue At the end of live migration and before vm_start() on the destination side, we should make sure all the decompression tasks are finished, if this can not be guaranteed, the VM may get the incorrect memory data, or the updated memory may be overwritten by the decompression thread. Add the code to fix this potential issue. Suggested-by: David Alan Gilbert <dgilbert@redhat.com> Suggested-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-3-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:09 +05:30
Liang Li	73a8912b8a	migration: Fix multi-thread compression bug Recently, a bug related to multiple thread compression feature for live migration is reported. The destination side will be blocked during live migration if there are heavy workload in host and memory intensive workload in guest, this is most likely to happen when there is one decompression thread. Some parts of the decompression code are incorrect: 1. The main thread receives data from source side will enter a busy loop to wait for a free decompression thread. 2. A lock is needed to protect the decomp_param[idx]->start, because it is checked in the main thread and is updated in the decompression thread. Fix these two issues by following the code pattern for compression. Signed-off-by: Liang Li <liang.z.li@intel.com> Reported-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Liang Li <liang.z.li@intel.com> Message-Id: <1462433579-13691-2-git-send-email-liang.z.li@intel.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:24:01 +05:30
Denis V. Lunev	6dcf66681a	migration: fix inability to save VM after snapshot The following sequence of operations fails: virsh start vm virsh snapshot-create vm virshh save vm --file file with the following error error: Failed to save domain vm to file error: internal error: unable to execute QEMU command 'migrate': There's a migration process in progress The problem is that qemu_savevm_state() calls migrate_init() which sets migration state to MIGRATION_STATUS_SETUP and never cleaned it up. This patch do the job. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> CC: Juan Quintela <quintela@redhat.com> CC: Amit Shah <amit.shah@redhat.com> Message-Id: <1466003203-26263-1-git-send-email-den@openvz.org> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:23:57 +05:30
Dr. David Alan Gilbert	023ad1a6e9	migration: Trace improvements A couple of improvements to tracing that have come out of helping people with migration problems: * vmstate_n_elems trace the count/name - for when you have problems getting array counts right * vmstate_subsection_load_bad - add the idstr, for when you receive a subsection you weren't expecting. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1465896986-16132-1-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:23:53 +05:30
Peter Maydell	4d88513157	migration: Don't use _to_cpup() and cpu_to_w() The _to_cpup() and cpu_to_w() functions just compose a pointer dereference with a byteswap. Instead use ld_p() and st_p(), which handle potential pointer misalignment and avoid the need to cast the pointer. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Message-Id: <1465574962-2710-1-git-send-email-peter.maydell@linaro.org> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-17 18:23:49 +05:30
Paolo Bonzini	02d0e09503	os-posix: include sys/mman.h qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check is bogus without a previous inclusion of sys/mman.h. Include it in sysemu/os-posix.h and remove it from everywhere else. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Daniel P. Berrange	22724f4921	migration: rename functions to starting migrations Apply the following renames for starting incoming migration: process_incoming_migration -> migration_fd_process_incoming migration_set_incoming_channel -> migration_channel_process_incoming migration_tls_set_incoming_channel -> migration_tls_channel_process_incoming and for starting outgoing migration: migration_set_outgoing_channel -> migration_channel_connect migration_tls_set_outgoing_channel -> migration_tls_channel_connect Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1464776234-9910-3-git-send-email-berrange@redhat.com Message-Id: <1464776234-9910-3-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-16 09:51:37 +05:30
Dr. David Alan Gilbert	096631bd95	Postcopy: Check for support when setting the capability Knowing whether the destination host supports migration with postcopy can be tricky. The destination doesn't need the capability set, however if we set it then use the opportunity to do the test and tell the user/management layer early. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 1465816605-29488-7-git-send-email-dgilbert@redhat.com Message-Id: <1465816605-29488-7-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-16 09:50:07 +05:30
Dr. David Alan Gilbert	d3bf5418e2	Postcopy: Add stats on page requests On the source, add a count of page requests received from the destination. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-id: 1465816605-29488-4-git-send-email-dgilbert@redhat.com Message-Id: <1465816605-29488-4-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-16 09:50:07 +05:30
Dr. David Alan Gilbert	a22463a5dc	Migration: Split out ram part of qmp_query_migrate The RAM section of qmp_query_migrate is reasonably complex and repeated 3 times. Split it out into a helper. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1465816605-29488-3-git-send-email-dgilbert@redhat.com Reviwed-by: Denis V. Lunev <den@openvz.org> Message-Id: <1465816605-29488-3-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-16 09:50:07 +05:30
Dr. David Alan Gilbert	d688c62d09	Postcopy: Avoid 0 length discards The discard code in migration/ram.c would send request for zero length discards in the case where no discards were needed. It doesn't appear to have had any bad effect. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-id: 1465816605-29488-2-git-send-email-dgilbert@redhat.com Message-Id: <1465816605-29488-2-git-send-email-dgilbert@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-06-16 09:50:07 +05:30
Wen Congyang	88c16567d2	Introduce "xen-load-devices-state" Introduce a "xen-load-devices-state" QAPI command that can be used to load the state of all devices, but not the RAM or the block devices of the VM. We only have hmp commands savevm/loadvm, and qmp commands xen-save-devices-state. We use this new command for COLO: 1. suspend both primary vm and secondary vm 2. sync the state 3. resume both primary vm and secondary vm In such case, we need to update all devices' state in any time. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>	2016-06-13 11:50:53 +01:00
Kevin Wolf	ebd2f9e7db	migration/block: Convert saving to BlockBackend This creates a new BlockBackend for copying data from an images to the migration stream on the source host. All I/O for block migration goes through BlockBackend now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-08 10:21:08 +02:00
Kevin Wolf	ad2964b4ff	migration/block: Convert load to BlockBackend This converts the loading part of block migration to use BlockBackend interfaces rather than accessing the BlockDriverState directly. Note that this takes a lazy shortcut. We should really use a separate BlockBackend that is configured for the migration rather than for the guest (e.g. writethrough caching is unnecessary) and holds its own reference to the BlockDriverState, but the impact isn't that big and we didn't have a separate migration reference before either, so it must be good enough, I guess... Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-06-08 10:21:08 +02:00
Eric Blake	74021bc497	block: Switch bdrv_write_zeroes() to byte interface Rename to bdrv_pwrite_zeroes() to let the compiler ensure we cater to the updated semantics. Do the same for bdrv_co_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-06-08 10:21:08 +02:00
Peter Maydell	030c98aff1	all: Remove unnecessary glib.h includes Remove glib.h includes, as it is provided by osdep.h. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-06-07 18:19:24 +03:00
Paolo Bonzini	f615f39616	exec: remove ram_addr argument from qemu_ram_block_from_host Of the two callers, one does not use it, and the other can compute it itself based on the other output argument (offset) and the RAMBlock. Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-29 09:11:12 +02:00
Peter Maydell	aef11b8d33	migration: add TLS support to the migration data channel This is a big refactoring of the migration backend code - moving away from QEMUFile to the new QIOChannel framework introduced here. This brings a good level of abstraction and reduction of many lines of code. This series also adds the ability for many backends (all except RDMA) to use TLS for encrypting the migration data between the endpoints. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXRpKMAAoJEOsLTfxlfvZw2FMQAJmrp8ijvJNtdBa51bNY+xVx zvrHDpWco/HxxqyBIGxG7g8Iq+wpNsdgoRxoQkfgIz9RkZiNrzb1kGRiqNFFBKFX ziK1QQQ12ETUXwQ6VguBuwLDvCCenyUti0HfKkceG+Zu5263fyp+VzL+PuEtteT3 M0pZRrifj/TQqCBXR8yhBAo2dCiFETLVoruE+iNg2ipI3JDizxy8bdOU2gfnTayf na7lE53pI+Wy8KE+qrhtsEgjHFp48uJ0HwQIIumvVndXFpIhRzCcN/aeVCjNYRjo GeI18OJxcimRDwsnfuOwuZKhRcjWfa8WEIKsi8LdRTZFpFL6y9R57XNTBIFfbjOF 0lkmFTqJTBi3OTPjj0hMjpjOfXhyKUnwdqCAYlAxeuWHhqPDDhtEcnNtGdmQzx4Z KvYzc3t31o1gPin024UUfA528PNREszaXhTM90/Dj0dhVSMoG1VQsQjxzkPXxdM1 wemfic+77Bk4oUrSplhdvvk4nySDWeseEjfdyVU2ixqldy8Ib1+6H+PCjWNotpQ0 YiDOHBy3rrUh6NhIqb0C2PWvd/9Aqs0nHQHJ8QKYK574MDbVo8mKTACFdoSYoZ1u wuif7NL6qkyS55szf0dm8zPBCJ5nIR5SQE98E7+ptXNa8AipfFsTkZrr3aOjcdey 98AWF9KaZOWRfwgIm3Ft =AYEK -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-2' into staging migration: add TLS support to the migration data channel This is a big refactoring of the migration backend code - moving away from QEMUFile to the new QIOChannel framework introduced here. This brings a good level of abstraction and reduction of many lines of code. This series also adds the ability for many backends (all except RDMA) to use TLS for encrypting the migration data between the endpoints. # gpg: Signature made Thu 26 May 2016 07:07:08 BST using RSA key ID 657EF670 # gpg: Good signature from "Amit Shah <amit@amitshah.net>" # gpg: aka "Amit Shah <amit@kernel.org>" # gpg: aka "Amit Shah <amitshah@gmx.net>" * remotes/amit-migration/tags/migration-2.7-2: (28 commits) migration: remove qemu_get_fd method from QEMUFile migration: remove support for non-iovec based write handlers migration: add support for encrypting data with TLS migration: define 'tls-creds' and 'tls-hostname' migration parameters migration: don't use an array for storing migrate parameters migration: move definition of struct QEMUFile back into qemu-file.c migration: delete QEMUFile stdio implementation migration: delete QEMUFile sockets implementation migration: delete QEMUSizedBuffer struct migration: delete QEMUFile buffer implementation migration: convert savevm to use QIOChannel for writing to files migration: convert RDMA to use QIOChannel interface migration: convert exec socket protocol to use QIOChannel migration: convert fd socket protocol to use QIOChannel migration: convert tcp socket protocol to use QIOChannel migration: rename unix.c to socket.c migration: convert unix socket protocol to use QIOChannel migration: convert post-copy to use QIOChannelBuffer migration: add reporting of errors for outgoing migration migration: add helpers for creating QEMUFile from a QIOChannel ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-05-26 16:09:27 +01:00
Daniel P. Berrange	12992c16d9	migration: remove qemu_get_fd method from QEMUFile Now that there is a set_blocking callback in QEMUFileOps, and all users needing non-blocking support have been converted to QIOChannel, there is no longer any codepath requiring the qemu_get_fd() method for QEMUFile. Remove it to avoid further code being introduced with an expectation of direct file handle access. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-29-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:21 +05:30
Daniel P. Berrange	11808bb0c4	migration: remove support for non-iovec based write handlers All the remaining QEMUFile implementations provide an iovec based write handler, so the put_buffer callback can be removed to simplify the code. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-28-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:18 +05:30
Daniel P. Berrange	e122636562	migration: add support for encrypting data with TLS This extends the migration_set_incoming_channel and migration_set_outgoing_channel methods so that they will automatically wrap the QIOChannel in a QIOChannelTLS instance if TLS credentials are configured in the migration parameters. This allows TLS to work for tcp, unix, fd and exec migration protocols. It does not (currently) work for RDMA since it does not use these APIs, but it is unlikely that TLS would be desired with RDMA anyway since it would degrade the performance to that seen with TCP defeating the purpose of using RDMA. On the target host, QEMU would be launched with a set of TLS credentials for a server endpoint $ qemu-system-x86_64 -monitor stdio -incoming defer \ -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=server,id=tls0 \ ...other args... To enable incoming TLS migration 2 monitor commands are then used (qemu) migrate_set_str_parameter tls-creds tls0 (qemu) migrate_incoming tcp:myhostname:9000 On the source host, QEMU is launched in a similar manner but using client endpoint credentials $ qemu-system-x86_64 -monitor stdio \ -object tls-creds-x509,dir=/home/berrange/security/qemutls,endpoint=client,id=tls0 \ ...other args... To enable outgoing TLS migration 2 monitor commands are then used (qemu) migrate_set_str_parameter tls-creds tls0 (qemu) migrate tcp:otherhostname:9000 Thanks to earlier improvements to error reporting, TLS errors can be seen 'info migrate' when doing a detached migration. For example: (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: failed total time: 0 milliseconds error description: TLS handshake failed: The TLS connection was non-properly terminated. Or (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: failed total time: 0 milliseconds error description: Certificate does not match the hostname localhost Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-27-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:13 +05:30
Daniel P. Berrange	69ef1f36b0	migration: define 'tls-creds' and 'tls-hostname' migration parameters Define two new migration parameters to be used with TLS encryption. The 'tls-creds' parameter provides the ID of an instance of the 'tls-creds' object type, or rather a subclass such as 'tls-creds-x509'. Providing these credentials will enable use of TLS on the migration data stream. If using x509 certificates, together with a migration URI that does not include a hostname, the 'tls-hostname' parameter provides the hostname to use when verifying the server's x509 certificate. This allows TLS to be used in combination with fd: and exec: protocols where a TCP connection is established by a 3rd party outside of QEMU. NB, this requires changing the migrate_set_parameter method in the HMP to accept a 's' (string) value instead of 'i' (integer). This is backwards compatible, because the parsing of strings allows the quotes to be optional, thus any integer is also a valid string. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-26-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:10 +05:30
Daniel P. Berrange	2594f56d4c	migration: don't use an array for storing migrate parameters The MigrateState struct uses an array for storing migration parameters. This presumes that all future parameters will be integers too, which is not going to be the case. There is no functional reason why an array is used, if anything it makes the code less clear. The QAPI schema already defines a struct - MigrationParameters - capable of storing all the individual parameters, so just use that instead of an array. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-25-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:07 +05:30
Daniel P. Berrange	a24939f279	migration: move definition of struct QEMUFile back into qemu-file.c Now that the memory buffer based QEMUFile impl is gone, there is no need for any backend to be accessing internals of the QEMUFile struct, so it can be moved back into qemu-file.c Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-24-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:05 +05:30
Daniel P. Berrange	7fdc61c75d	migration: delete QEMUFile stdio implementation Now that the exec migration backend and savevm have converted to use the QIOChannel based QEMUFile, there is no user remaining for the stdio based QEMUFile impl and it can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-23-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:03 +05:30
Daniel P. Berrange	40946ae40b	migration: delete QEMUFile sockets implementation Now that the tcp, unix and fd migration backends have converted to use the QIOChannel based QEMUFile, there is no user remaining for the sockets based QEMUFile impl and it can be deleted. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-22-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:32:00 +05:30
Daniel P. Berrange	2a22b4f370	migration: delete QEMUSizedBuffer struct Now that we don't have have a buffer based QemuFile implementation, the QEMUSizedBuffer code is also unused and can be deleted. A simpler buffer class also exists in util/buffer.c which other code can used as needed. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-21-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:58 +05:30
Daniel P. Berrange	8b7c5c0f52	migration: delete QEMUFile buffer implementation The qemu_bufopen() method is no longer used, so the memory buffer based QEMUFile backend can be deleted entirely. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-20-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:55 +05:30
Daniel P. Berrange	8925839f00	migration: convert savevm to use QIOChannel for writing to files Convert the exec savevm code to use QIOChannel and QEMUFileChannel, instead of the stdio APIs. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-19-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:53 +05:30
Daniel P. Berrange	6ddd2d76ca	migration: convert RDMA to use QIOChannel interface This converts the RDMA code to provide a subclass of QIOChannel that uses RDMA for the data transport. This implementation of RDMA does not correctly handle non-blocking mode. Reads might block if there was not already some pending data and writes will block until all data is sent. This flawed behaviour was already present in the existing impl, so appears to not be a critical problem at this time. It should be on the list of things to fix in the future though. The RDMA code would be much better off it it could be split up in a generic RDMA layer, a QIOChannel impl based on RMDA, and then the RMDA migration glue. This is left as a future exercise for the brave. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-18-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:50 +05:30
Daniel P. Berrange	527792fae6	migration: convert exec socket protocol to use QIOChannel Convert the exec socket migration protocol driver to use QIOChannel and QEMUFileChannel, instead of the stdio popen APIs. It can be unconditionally built because the QIOChannelCommand class can report suitable error messages on platforms which can't fork processes. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-17-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:47 +05:30
Daniel P. Berrange	64802ee57f	migration: convert fd socket protocol to use QIOChannel Convert the fd socket migration protocol driver to use QIOChannel and QEMUFileChannel, instead of plain sockets APIs. It can be unconditionally built because the QIOChannel APIs it uses will take care to report suitable error messages if needed. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-16-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:45 +05:30
Daniel P. Berrange	e65c67e4da	migration: convert tcp socket protocol to use QIOChannel Drop the current TCP socket migration driver and extend the new generic socket driver to cope with the TCP address format Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-15-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:42 +05:30
Daniel P. Berrange	6f860ae755	migration: rename unix.c to socket.c The unix.c file will be nearly the same as the tcp.c file, only differing in the initial SocketAddress creation code. Rename unix.c to socket.c and refactor it a little to prepare for merging the TCP code. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-14-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:40 +05:30
Daniel P. Berrange	d984464eb9	migration: convert unix socket protocol to use QIOChannel Convert the unix socket migration protocol driver to use QIOChannel and QEMUFileChannel, instead of plain sockets APIs. It can be unconditionally built, since the socket impl of QIOChannel will report a suitable error on platforms where UNIX sockets are unavailable. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-13-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:37 +05:30
Daniel P. Berrange	61b67d473d	migration: convert post-copy to use QIOChannelBuffer The post-copy code does some I/O to/from an intermediate in-memory buffer rather than direct to the underlying I/O channel. Switch this code to use QIOChannelBuffer instead of QEMUSizedBuffer. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-12-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:34 +05:30
Daniel P. Berrange	d59ce6f344	migration: add reporting of errors for outgoing migration Currently if an application initiates an outgoing migration, it may or may not, get an error reported back on failure. If the error occurs synchronously to the 'migrate' command execution, the client app will see the error message. This is the case for DNS lookup failures. If the error occurs asynchronously to the monitor command though, the error will be thrown away and the client left guessing about what went wrong. This is the case for failure to connect to the TCP server (eg due to wrong port, or firewall rules, or other similar errors). In the future we'll be adding more scope for errors to happen asynchronously with the TLS protocol handshake. TLS errors are hard to diagnose even when they are well reported, so discarding errors entirely will make it impossible to debug TLS connection problems. Management apps which do migration are already using 'query-migrate' / 'info migrate' to check up on progress of background migration operations and to see their end status. This is a fine place to also include the error message when things go wrong. This patch thus adds an 'error-desc' field to the MigrationInfo struct, which will be populated when the 'status' is set to 'failed': (qemu) migrate -d tcp:localhost:9001 (qemu) info migrate capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off x-postcopy-ram: off Migration status: failed (Error connecting to socket: Connection refused) total time: 0 milliseconds In the HMP, when doing non-detached migration, it is also possible to display this error message directly to the app. (qemu) migrate tcp:localhost:9001 Error connecting to socket: Connection refused Or with QMP { "execute": "query-migrate", "arguments": {} } { "return": { "status": "failed", "error-desc": "address resolution failed for myhost:9000: No address associated with hostname" } } Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-11-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:30 +05:30
Daniel P. Berrange	48f07489ed	migration: add helpers for creating QEMUFile from a QIOChannel Currently creating a QEMUFile instance from a QIOChannel is quite simple only requiring a single call to qemu_fopen_channel_input or qemu_fopen_channel_output depending on the end of migration connection. When QEMU gains TLS support, however, there will need to be a TLS negotiation done inbetween creation of the QIOChannel and creation of the final QEMUFile. Introduce some helper methods that will encapsulate this logic, isolating the migration protocol drivers from knowledge about TLS. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-10-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:27 +05:30
Daniel P. Berrange	a9cfeb33bb	migration: introduce a new QEMUFile impl based on QIOChannel Introduce a new QEMUFile implementation that is based on the QIOChannel objects. This impl is different from existing impls in that there is no file descriptor that can be made available, as some channels may be based on higher level protocols such as TLS. Although the QIOChannel based implementation can trivially provide a bi-directional stream, initially we have separate functions for opening input & output directions to fit with the expectation of the current QEMUFile interface. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1461751518-12128-9-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:24 +05:30
Daniel P. Berrange	9e4d2b98ee	migration: force QEMUFile to blocking mode for outgoing migration Instead of relying on the default QEMUFile I/O blocking flag state, explicitly turn on blocking I/O for outgoing migration since it takes place in a background thread. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-8-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:21 +05:30
Daniel P. Berrange	06ad513532	migration: introduce set_blocking function in QEMUFileOps Remove the assumption that every QEMUFile implementation has a file descriptor available by introducing a new function in QEMUFileOps to change the blocking state of a QEMUFile. If not set, it will fallback to the original code using the get_fd method. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-7-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:19 +05:30
Daniel P. Berrange	0436e09f96	migration: split migration hooks out of QEMUFileOps The QEMUFileOps struct contains the I/O subsystem callbacks and the migration stage hooks. Split the hooks out into a separate QEMUFileHooks struct to make it easier to refactor the I/O side of QEMUFile without affecting the hooks. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-6-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:16 +05:30
Daniel P. Berrange	baf51e7739	migration: ensure qemu_fflush() always writes full data amount The QEMUFile writev_buffer / put_buffer functions are expected to write out the full set of requested data, blocking until complete. The qemu_fflush() caller does not expect to deal with partial writes. Clarify the function comments and add a sanity check to the code to catch mistaken implementations. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <1461751518-12128-5-git-send-email-berrange@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-26 11:31:14 +05:30
Kevin Wolf	88be7b4be4	block: Fix bdrv_next() memory leak The bdrv_next() users all leaked the BdrvNextIterator after completing the iteration. Simply changing bdrv_next() to free the iterator before returning NULL at the end of list doesn't work because some callers exit the loop before looking at all BDSes. This patch moves the BdrvNextIterator from the heap to the stack of the caller and switches to a bdrv_first()/bdrv_next() interface for initialising the iterator. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-25 19:04:10 +02:00
Peter Maydell	99694362ee	migration fixes: - ensure src block devices continue fine after a failed migration - fail on migration blockers; helps 9p savevm/loadvm - move autoconverge commands out of experimental state - move the migration-specific qjson in migration/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXQzqdAAoJEOsLTfxlfvZwMDgP/0WjJc6tcrRYWPnZ0I4+6/1A MByxfBf0LBeST5/A8HDOg8KrTasNHXKisMAQ5kHUxxWLuzF9GYScLdZ2Sf+2VrP2 rRLJXW2c56cVPsc3j4ZU5t93SO5Q2Dd1hZ2uabu5XMMH2IhtO5H05wfPkkMdRZO2 XzRt97z0LRBHOvh4O/ZfGjtEaMlmUTpl5X/PpPUW+o6yeDZU00kWFUz7BR7D9q27 Adru6G8N3pN3KJEMWMqIdmlgoSTEdebTItwLLJ7XwKlKF+bPwr/gsqM6i66C0ahB HjpS2T4ly7U33B2JdWElDCZSwlFXAy3Tv7oB0mHgCEqgfryabQXRupVpK0Vyk2EV yV7Hf+R/DdkHBNeCCl+rduQiA6ed/DFHSa62vt796Yilf2vUlvdeuh4d1aNp5uxo M4QCuxOUsvp75b9mBEuVhz/CCgkq/Hm8HlMZX6/lDTyvNc7qKQnVKWCx95zGsKem vPMKxfrKNPY6J08LcjXtqfNNdJEQ5Z1St2a9HiDg5eWuWT2vCgRrjizkMH5zbKEx 5BJbJlifY1JN7f5+guh9trQRRfB4CTAuuTOLrOH7xbST7jGNaFKAlmzsV0s0xDxF /47GcSz5uzLY4T2S4BMSu88mt3gVMTUIaZYxphHvCHqiOMuYG33HHLm8FyAdMBS2 hhyG4UcKTJtxiO5ymqv5 =RpPT -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/amit-migration/tags/migration-2.7-1' into staging migration fixes: - ensure src block devices continue fine after a failed migration - fail on migration blockers; helps 9p savevm/loadvm - move autoconverge commands out of experimental state - move the migration-specific qjson in migration/ # gpg: Signature made Mon 23 May 2016 18:15:09 BST using RSA key ID 657EF670 # gpg: Good signature from "Amit Shah <amit@amitshah.net>" # gpg: aka "Amit Shah <amit@kernel.org>" # gpg: aka "Amit Shah <amitshah@gmx.net>" * remotes/amit-migration/tags/migration-2.7-1: migration: regain control of images when migration fails to complete savevm: fail if migration blockers are present migration: Promote improved autoconverge commands out of experimental state migration/qjson: Drop gratuitous use of QOM migration: Move qjson.[ch] to migration/ Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-05-24 12:21:07 +01:00
Greg Kurz	fe904ea824	migration: regain control of images when migration fails to complete We currently have an error path during migration that can cause the source QEMU to abort: migration_thread() migration_completion() runstate_is_running() ----------------> true if guest is running bdrv_inactivate_all() ----------------> inactivate images qemu_savevm_state_complete_precopy() ... qemu_fflush() socket_writev_buffer() --------> error because destination fails qemu_fflush() -------------------> set error on migration stream migration_completion() -----------------> set migrate state to FAILED migration_thread() -----------------------> break migration loop vm_start() -----------------------------> restart guest with inactive images and you get: qemu-system-ppc64: socket_writev_buffer: Got err=104 for (32768/18446744073709551615) qemu-system-ppc64: /home/greg/Work/qemu/qemu-master/block/io.c:1342:bdrv_co_do_pwritev: Assertion `!(bs->open_flags & 0x0800)' failed. Aborted (core dumped) If we try postcopy with a similar scenario, we also get the writev error message but QEMU leaves the guest paused because entered_postcopy is true. We could possibly do the same with precopy and leave the guest paused. But since the historical default for migration errors is to restart the source, this patch adds a call to bdrv_invalidate_cache_all() instead. Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Message-Id: <146357896785.6003.11983081732454362715.stgit@bahia.huguette.org> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-23 22:19:36 +05:30
Greg Kurz	24f3902b08	savevm: fail if migration blockers are present QEMU has currently two ways to prevent migration to occur: - migration blocker when it depends on runtime state - VMStateDescription.unmigratable when migration is not supported at all This patch gathers all the logic into a single function to be called from both the savevm and the migrate paths. This fixes a bug with 9p, at least, where savevm would succeed and the following would happen in the guest after loadvm: $ ls /host ls: cannot access /host: Protocol error With this patch: (qemu) savevm foo Migration is disabled when VirtFS export path '/' is mounted in the guest using mount_tag 'host' Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <146239057139.11271.9011797645454781543.stgit@bahia.huguette.org> [Update subject according to Paolo's suggestion - Amit] Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-23 21:44:08 +05:30
Gonglei	fa53a0e53e	memory: drop find_ram_block() On the one hand, we have already qemu_get_ram_block() whose function is similar. On the other hand, we can directly use mr->ram_block but searching RAMblock by ram_addr which is a kind of waste. Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-Id: <1462845901-89716-2-git-send-email-arei.gonglei@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-23 16:53:44 +02:00
Jason J. Herne	d85a31d1f4	migration: Promote improved autoconverge commands out of experimental state The new autoconverge throttling commands have been tested for a release now. It is time to move them out of the experimental state. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Message-Id: <1461262038-8197-1-git-send-email-jjherne@linux.vnet.ibm.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-23 16:05:09 +05:30
Markus Armbruster	b72fe9e690	migration/qjson: Drop gratuitous use of QOM All the use of QOM buys us here is the ability to destroy the thing with object_unref(OBJECT(vmdesc)). Not worth the notational overhead. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1462380558-2030-3-git-send-email-armbru@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-23 14:16:12 +05:30
Markus Armbruster	17b74b9867	migration: Move qjson.[ch] to migration/ Type QJSON lets you build JSON text. Its interface mirrors (a subset of) abstract JSON syntax. QAPI output visitors also produce JSON text. They assert their preconditions and invariants, and therefore abort on incorrect use. Contrastingly, QJSON does not detect incorrect use. It happily produces invalid JSON then. This is what migration wants. QJSON was designed for migration, and migration is its only user. Move it to migration/ for proper coverage by MAINTAINERS, and to deter accidental use outside migration. [Pointed out by Eric: QJSON was added in commits 0457d07..b174257 -- Amit] Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1462380558-2030-2-git-send-email-armbru@redhat.com> Signed-off-by: Amit Shah <amit.shah@redhat.com>	2016-05-23 14:16:09 +05:30
Peter Maydell	6bd8ab6889	Block layer patches -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJXPdcnAAoJEH8JsnLIjy/WPEoQAK5vlRYqvQrrevMJviT4ZPUX cGGbabOcmfTBHGAgGwRLg+vQ043Sgu14JjtNbrsoSsBwAl9eAhAVGOimiieaY3vR 35OOUxECswArJzK8I4XRx4KhI871Yq+8kHILPoXpF8L7YU38Zqa1D5z2dcOKYrL8 Oy5IEfd1+Qfpxg/txKIioP5BzKVpz3V9/8GRNo0iAl7c806NoYFpnM0TXsed9Fjr YvUn1AdGHUF0/pV6vU46Qxz4yy1Q+cuoh923z6+YvXTcwok7PbjhAQWWA0qvSTuG otnPKMPBhYa6g7XOPD9Mra986vs6vBEGiPS5uqXoM5FqxF4Hc9LIeHEr+3hb+m53 NLOmGqfct0USY9r6rXsOhZQb7nZCDuhaedv33ZfgE0T0cYxIilHs5PhgFAWfthhP aNJYlzbJUhqhTi7CJrJcFoGbNQDxux5qtlFo43M4vz/WYYDrwu8P7O3YO+sH0jU1 EXJnbtztQvwfsiIEbIzvBRQl3XD9QmCfYO3lRbOwdCnd3ZLy47E2bze4gV3DwzK7 CsBr+sa49xI8LMswPxTms+A+Inndn8O0mGI32Zi4nBKapjpy5Fb4YG6z8+WPfTKp Il1PsSgG84wm4YxGWty/UI4DoPY+hqlIIz1CNuRRNQtZTybLgNCK8ZKYbVlRppmf pGPpQ8pmqkeFLmx8hecm =ntKz -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches # gpg: Signature made Thu 19 May 2016 16:09:27 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: (31 commits) qemu-iotests: Fix regression in 136 on aio_read invalid qemu-iotests: Simplify 109 with unaligned qemu-img compare qemu-io: Fix recent UI updates block: clarify error message for qmp-eject qemu-iotests: Some more write_zeroes tests qcow2: Fix write_zeroes with partially allocated backing file cluster qcow2: fix condition in is_zero_cluster block: Propagate AioContext change to all children block: Remove BlockDriverState.blk block: Don't return throttling info in query-named-block-nodes block: Avoid bs->blk in bdrv_next() block: Add bdrv_has_blk() block: Remove bdrv_aio_multiwrite() blockjob: Don't touch BDS iostatus blockjob: Don't set iostatus of target block: User BdrvChild callback for device name block: Use BdrvChild callbacks for change_media/resize block: Don't check throttled reqs in bdrv_requests_pending() Revert "block: Forbid I/O throttling on nodes with multiple parents for 2.6" block: Remove bdrv_move_feature_fields() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-05-19 16:54:12 +01:00
Kevin Wolf	7c8eece45b	block: Avoid bs->blk in bdrv_next() We need to introduce a separate BdrvNextIterator struct that can keep more state than just the current BDS in order to avoid using the bs->blk pointer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-05-19 16:45:31 +02:00
Paolo Bonzini	33c11879fd	qemu-common: push cpu.h inclusion out of qemu-common.h Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-05-19 16:42:29 +02:00
Stefan Weil	cb8d4c8f54	Fix some typos found by codespell Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2016-05-18 15:04:27 +03:00

... 17 18 19 20 21 ...

2095 Commits