mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Leonardo Bras	d59c40cc48	migration/multifd: Report to user when zerocopy not working Some errors, like the lack of Scatter-Gather support by the network interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using zero-copy, which causes it to fall back to the default copying mechanism. After each full dirty-bitmap scan there should be a zero-copy flush happening, which checks for errors each of the previous calls to sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then increment dirty_sync_missed_zero_copy migration stat to let the user know about it. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20220711211112.18951-4-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-07-20 12:15:09 +01:00
Peter Xu	85a8578ea5	migration: Add helpers to detect TLS capability Add migrate_channel_requires_tls() to detect whether the specific channel requires TLS, leveraging the recently introduced migrate_use_tls(). No functional change intended. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220707185513.27421-1-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-07-20 12:15:08 +01:00
Daniel P. Berrangé	bc698c367d	migration: rename qemu_file_update_transfer to qemu_file_acct_rate_limit The qemu_file_update_transfer name doesn't give a clear guide on what its purpose is, and how it differs from the qemu_file_credit_transfer method. The latter is specifically for accumulating for total migration traffic, while the former is specifically for accounting in thue rate limit calculations. The new name give better guidance on its usage. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-06-22 19:33:43 +01:00
Leonardo Bras	5b1d9bab2d	multifd: Implement zero copy write in multifd migration (multifd-zero-copy) Implement zero copy send on nocomp_send_write(), by making use of QIOChannel writev + flags & flush interface. Change multifd_send_sync_main() so flush_zero_copy() can be called after each iteration in order to make sure all dirty pages are sent before a new iteration is started. It will also flush at the beginning and at the end of migration. Also make it return -1 if flush_zero_copy() fails, in order to cancel the migration process, and avoid resuming the guest in the target host without receiving all current RAM. This will work fine on RAM migration because the RAM pages are not usually freed, and there is no problem on changing the pages content between writev_zero_copy() and the actual sending of the buffer, because this change will dirty the page and cause it to be re-sent on a next iteration anyway. A lot of locked memory may be needed in order to use multifd migration with zero-copy enabled, so disabling the feature should be necessary for low-privileged users trying to perform multifd migrations. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-9-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-05-16 13:56:24 +01:00
Leonardo Bras	b7dbdd8e76	multifd: Send header packet without flags if zero-copy-send is enabled Since `d48c3a0445` ("multifd: Use a single writev on the send side"), sending the header packet and the memory pages happens in the same writev, which can potentially make the migration faster. Using channel-socket as example, this works well with the default copying mechanism of sendmsg(), but with zero-copy-send=true, it will cause the migration to often break. This happens because the header packet buffer gets reused quite often, and there is a high chance that by the time the MSG_ZEROCOPY mechanism get to send the buffer, it has already changed, sending the wrong data and causing the migration to abort. It means that, as it is, the buffer for the header packet is not suitable for sending with MSG_ZEROCOPY. In order to enable zero copy for multifd, send the header packet on an individual write(), without any flags, and the remanining pages with a writev(), as it was happening before. This only changes how a migration with zero-copy-send=true works, not changing any current behavior for migrations with zero-copy-send=false. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-8-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-05-16 13:56:24 +01:00
Leonardo Bras	33d70973a3	multifd: multifd_send_sync_main now returns negative on error Even though multifd_send_sync_main() currently emits error_reports, it's callers don't really check it before continuing. Change multifd_send_sync_main() to return -1 on error and 0 on success. Also change all it's callers to make use of this change and possibly fail earlier. (This change is important to next patch on multifd zero copy implementation, to make it sure an error in zero-copy flush does not go unnoticed. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20220513062836.965425-7-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-05-16 13:56:24 +01:00
Leonardo Bras	d2fafb6a68	migration: Add migrate_use_tls() helper A lot of places check parameters.tls_creds in order to evaluate if TLS is in use, and sometimes call migrate_get_current() just for that test. Add new helper function migrate_use_tls() in order to simplify testing for TLS usage. Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220513062836.965425-6-leobras@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-05-16 13:56:24 +01:00
Peter Xu	f444eeda71	migration: Move migrate_allow_multifd and helpers into migration.c This variable, along with its helpers, is used to detect whether multiple channel will be supported for migration. In follow up patches, there'll be other capability that requires multi-channels. Hence move it outside multifd specific code and make it public. Meanwhile rename it from "multifd" to "multi_channels" to show its real meaning. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220331150857.74406-5-peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-04-21 19:36:46 +01:00
Peter Xu	7f692ec79a	migration: Drop multifd tls_hostname cache The hostname is cached N times, N equals to the multifd channels. Drop that cache because after previous patch we've got s->hostname being alive for the whole lifecycle of migration procedure. Cc: Juan Quintela <quintela@redhat.com> Cc: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220331150857.74406-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-04-21 19:36:46 +01:00
Juan Quintela	8c0ec0b2b0	multifd: Rename pages_used to normal_pages Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	faf60935df	multifd: recv side only needs the RAMBlock host address So we can remove the MultiFDPages. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	cf2d4aa8a2	multifd: Use normal pages array on the recv side Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Rename num_normal_pages to total_normal_pages (peter)	2022-01-28 15:38:23 +01:00
Juan Quintela	815956f039	multifd: Use normal pages array on the send side We are only sending normal pages through multifd channels. Later on this series, we are going to also send zero pages. We are going to detect if a page is zero or non zero in the multifd channel thread, not on the main thread. So we receive an array of pages page->offset[N] And we will end with: p->normal[N - zero_pages] p->zero[zero_pages]. In this patch, we just copy all the pages in offset to normal. for (i = 0; i < pages->num; i++) { p->narmal[p->normal_num] = pages->offset[i]; p->normal_num++: } Later in the series this becomes: for (i = 0; i < pages->num; i++) { if (buffer_is_zero(page->offset[i])) { p->zerol[p->zero_num] = pages->offset[i]; p->zero_num++: } else { p->narmal[p->normal_num] = pages->offset[i]; p->normal_num++: } } Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- Improving comment (dave) Renaming num_normal_pages to total_normal_pages (peter)	2022-01-28 15:38:23 +01:00
Juan Quintela	c27779a215	multifd: Unfold "used" variable by its value Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	d48c3a0445	multifd: Use a single writev on the send side Until now, we wrote the packet header with write(), and the rest of the pages with writev(). Just increase the size of the iovec and do a single writev(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	468fcb5dd0	multifd: Remove send_write() method Everything use now iov's. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	226468ba3d	multifd: Move iov from pages to params This will allow us to reduce the number of system calls on the next patch. Signed-off-by: Juan Quintela <quintela@redhat.com>	2022-01-28 15:38:23 +01:00
Juan Quintela	04e1140494	migration: All this fields are unsigned So printing it as %d is wrong. Notice that for the channel id, that is an uint8_t, but I changed it anyways for consistency. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>	2022-01-28 15:38:22 +01:00
Li Zhang	077fbb5942	multifd: Shut down the QIO channels to avoid blocking the send threads when they are terminated. When doing live migration with multifd channels 8, 16 or larger number, the guest hangs in the presence of the network errors such as missing TCP ACKs. At sender's side: The main thread is blocked on qemu_thread_join, migration_fd_cleanup is called because one thread fails on qio_channel_write_all when the network problem happens and other send threads are blocked on sendmsg. They could not be terminated. So the main thread is blocked on qemu_thread_join to wait for the threads terminated. (gdb) bt 0 0x00007f30c8dcffc0 in __pthread_clockjoin_ex () at /lib64/libpthread.so.0 1 0x000055cbb716084b in qemu_thread_join (thread=0x55cbb881f418) at ../util/qemu-thread-posix.c:627 2 0x000055cbb6b54e40 in multifd_save_cleanup () at ../migration/multifd.c:542 3 0x000055cbb6b4de06 in migrate_fd_cleanup (s=0x55cbb8024000) at ../migration/migration.c:1808 4 0x000055cbb6b4dfb4 in migrate_fd_cleanup_bh (opaque=0x55cbb8024000) at ../migration/migration.c:1850 5 0x000055cbb7173ac1 in aio_bh_call (bh=0x55cbb7eb98e0) at ../util/async.c:141 6 0x000055cbb7173bcb in aio_bh_poll (ctx=0x55cbb7ebba80) at ../util/async.c:169 7 0x000055cbb715ba4b in aio_dispatch (ctx=0x55cbb7ebba80) at ../util/aio-posix.c:381 8 0x000055cbb7173ffe in aio_ctx_dispatch (source=0x55cbb7ebba80, callback=0x0, user_data=0x0) at ../util/async.c:311 9 0x00007f30c9c8cdf4 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0 10 0x000055cbb71851a2 in glib_pollfds_poll () at ../util/main-loop.c:232 11 0x000055cbb718521c in os_host_main_loop_wait (timeout=42251070366) at ../util/main-loop.c:255 12 0x000055cbb7185321 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531 13 0x000055cbb6e6ba27 in qemu_main_loop () at ../softmmu/runstate.c:726 14 0x000055cbb6ad6fd7 in main (argc=68, argv=0x7ffc0c578888, envp=0x7ffc0c578ab0) at ../softmmu/main.c:50 To make sure that the send threads could be terminated, IO channels should be shut down to avoid waiting IO. Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	01102a2ef6	multifd: Fill offset and block for reception We were using the iov directly, but we will need this info on the following patch. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	40a4bfe9d3	multifd: remove used parameter from send_recv_pages() method It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	02fb81043e	multifd: remove used parameter from send_prepare() method It is already there as p->pages->num. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	1943c11a62	multifd: The variable is only used inside the loop Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	18ede636bc	multifd: Add missing documention Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	90a3d2f9d5	multifd: Rename used field to num We will need to split it later in zero_num (number of zero pages) and normal_num (number of normal pages). This name is better. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Juan Quintela	144fa06b34	migration: Never call twice qemu_target_page_size() Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-12-15 10:31:42 +01:00
Li Zhijian	5ad15e8614	migration: allow enabling mutilfd for specific protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:52 +0800 (5 weeks, 4 days, 17 hours ago) And change the default to true so that in '-incoming defer' case, user is able to change multifd capability. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Li Zhijian	b7acd65707	migration: allow multifd for socket protocol only To: <quintela@redhat.com>, <dgilbert@redhat.com>, <qemu-devel@nongnu.org> CC: Li Zhijian <lizhijian@cn.fujitsu.com> Date: Sat, 31 Jul 2021 22:05:51 +0800 (5 weeks, 4 days, 17 hours ago) multifd with unsupported protocol will cause a segment fault. (gdb) bt #0 0x0000563b4a93faf8 in socket_connect (addr=0x0, errp=0x7f7f02675410) at ../util/qemu-sockets.c:1190 #1 0x0000563b4a797a03 in qio_channel_socket_connect_sync (ioc=0x563b4d16e8c0, addr=0x0, errp=0x7f7f02675410) at ../io/channel-socket.c:145 #2 0x0000563b4a797abf in qio_channel_socket_connect_worker (task=0x563b4cd86c30, opaque=0x0) at ../io/channel-socket.c:168 #3 0x0000563b4a792631 in qio_task_thread_worker (opaque=0x563b4cd86c30) at ../io/task.c:124 #4 0x0000563b4a91da69 in qemu_thread_start (args=0x563b4c44bb80) at ../util/qemu-thread-posix.c:541 #5 0x00007f7fe9b5b3f9 in ?? () #6 0x0000000000000000 in ?? () It's enough to check migrate_multifd_is_allowed() in multifd cleanup() and multifd setup() though there are so many other places using migrate_use_multifd(). Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	e9ab82b858	multifd: Unconditionally unregister yank function To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 4 Aug 2021 21:26:32 +0200 (5 weeks, 11 hours, 52 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-08-04T21:26:32+0200 using RSA]] Unconditionally unregister yank function in multifd_load_cleanup(). If it is not unregistered here, it will leak and cause a crash in yank_unregister_instance(). Now if the ioc is still in use afterwards, it will only lead to qemu not being able to recover from a hang related to that ioc. After checking the code, i am pretty sure that ref is always 1 when arriving here. So all this currently does is remove the unneeded check. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Lukas Straub	20171ea895	multifd: Implement yank for multifd send side To: qemu-devel <qemu-devel@nongnu.org> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>, Peter Xu <peterx@redhat.com>, Leonardo Bras Soares Passos <lsoaresp@redhat.com> Date: Wed, 1 Sep 2021 17:58:57 +0200 (1 week, 15 hours, 17 minutes ago) [[PGP Signed Part:No public key for 35AB0B289C5DB258 created at 2021-09-01T17:58:57+0200 using RSA]] When introducing yank functionality in the migration code I forgot to cover the multifd send side. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2021-10-19 08:39:04 +02:00
Peter Xu	18711405b5	migration: Introduce migration_ioc_[un]register_yank() There're plenty of places in migration/* that checks against either socket or tls typed ioc for yank operations. Provide two helpers to hide all these information. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-4-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-07-26 12:44:54 +01:00
Dr. David Alan Gilbert	a59136f3b1	migration/socket: Close the listener at the end Delay closing the listener until the cleanup hook at the end; mptcp needs the listener to stay open while the other paths come in. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-5-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 19:36:19 +01:00
Leonardo Bras	7de2e85653	yank: Unregister function when using TLS migration After yank feature was introduced in migration, whenever migration is started using TLS, the following error happens in both source and destination hosts: (qemu) qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. This happens because of a missing yank_unregister_function() when using qio-channel-tls. Fix this by also allowing TYPE_QIO_CHANNEL_TLS object type to perform yank_unregister_function() in channel_close() and multifd_load_cleanup(). Also, inside migration_channel_connect() and migration_channel_process_incoming() move yank_register_function() so it only runs once on a TLS migration. Fixes: `b5eea99ec2` ("migration: Add yank feature", 2021-01-13) Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1964326 Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Peter Xu <peterx@redhat.com> -- Changes since v2: - Dropped all references to ioc->master - yank_register_function() and yank_unregister_function() now only run once in a TLS migration. Changes since v1: - Cast p->c to QIOChannelTLS into multifd_load_cleanup() Message-Id: <20210601054030.1153249-1-leobras.c@gmail.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-06-08 18:50:03 +01:00
David Hildenbrand	c1668bde5c	migration/multifd: Print used_length of memory block We actually want to print the used_length, against which we check. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-10-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-05-13 18:21:14 +01:00
Lukas Straub	1a92d6d500	yank: Remove dependency on qiochannel Remove dependency on qiochannel by removing yank_generic_iochannel and letting migration and chardev use their own yank function for iochannel. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20ff143fc2db23e27cd41d38043e481376c9cec1.1616521341.git.lukasstraub2@web.de>	2021-04-01 15:27:44 +04:00
Hao Wang	fca676429c	migration/tls: add error handling in multifd_tls_handshake_thread If any error happens during multifd send thread creating (e.g. channel broke because new domain is destroyed by the dst), multifd_tls_handshake_thread may exit silently, leaving main migration thread hanging (ram_save_setup -> multifd_send_sync_main -> qemu_sem_wait(&p->sem_sync)). Fix that by adding error handling in multifd_tls_handshake_thread. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-3-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Hao Wang	a339149afa	migration/tls: fix inverted semantics in multifd_channel_connect Function multifd_channel_connect() return "true" to indicate failure, which is rather confusing. Fix that. Signed-off-by: Hao Wang <wanghao232@huawei.com> Message-Id: <20210209104237.2250941-2-wanghao232@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2021-03-15 20:01:55 +00:00
Lukas Straub	b5eea99ec2	migration: Add yank feature Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2021-01-13 10:21:17 +01:00
Chuan Zheng	9e8424088c	multifd/tls: fix memoryleak of the QIOChannelSocket object when cancelling migration When creating new tls client, the tioc->master will be referenced which results in socket leaking after multifd_save_cleanup if we cancel migration. Fix it by do object_unref() after tls client creation. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1605104763-118687-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 15:52:20 +00:00
Chuan Zheng	a1af605bd5	migration/multifd: fix hangup with TLS-Multifd due to blocking handshake The qemu main loop could hang up forever when we enable TLS+Multifd. The Src multifd_send_0 invokes tls handshake, it sends hello to sever and wait response. However, the Dst main qemu loop has been waiting recvmsg() for multifd_recv_1. Both of Src and Dst main qemu loop are blocking and waiting for reponse which results in hanging up forever. Src: (multifd_send_0) Dst: (multifd_recv_1) multifd_channel_connect migration_channel_process_incoming multifd_tls_channel_connect migration_tls_channel_process_incoming multifd_tls_channel_connect qio_channel_tls_handshake_task qio_channel_tls_handshake gnutls_handshake qio_channel_tls_handshake_task ... qcrypto_tls_session_handshake ... gnutls_handshake ... ... ... recvmsg (Blocking I/O waiting for response) recvmsg (Blocking I/O waiting for response) Fix this by offloadinig handshake work to a background thread. Reported-by: Yan Jin <jinyan12@huawei.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Message-Id: <1604643893-8223-1-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-12 14:35:29 +00:00
Peter Maydell	92d0950267	Trivial Patches Pull request 20200928 -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEzS913cjjpNwuT1Fz8ww4vT8vvjwFAl9xqZQSHGxhdXJlbnRA dml2aWVyLmV1AAoJEPMMOL0/L748nFQQAJL/5ydfeo+t9RVTfatMI6bRP+loqQzc Y/t/xYgOfCQsMjXRU5HHNre+FtltqP0lVU2Ey2zAon7MjJDXfQOCv6aOK+vsOGXi FITC6vZ050VrVy7iPfXaJR5aIbkJme4NXLgJj9mqaFZELgoTAMCuCkGN+km3n/Uw pf0lI43VSkLt3pvHvGjy2UT51OjH6/LxaXcgY2w67nIH+KcLgxdh7fXlD+5Gxdug 458gbMbtqAPb6qNV7jBbrgUMRx5hpUKoa5QvL0DWkIsboemPJGsTlw0nhON5ZPQ7 XYNPyb9ELPYE5V8I9Ki+ESzsWFMVMdu0Hj/MNbnSDdg+2uR0xgCSIRnnuEwpSmLB jbhKa8b3B3nPFFNQAqQ6FPpOW76PpwFVKRAPT3p3rnDqrtXLUJpGdGzTz8ltMZry pOxdbSkuEl+79D1i5Lt5mfCqRNqOjYk1awPO4K/JdmhJxk9dFmY5X21edaIe6lN+ GiZlE43fF+GM3HelplnTCIwlRAjhUX/PRSDBkeLuPYj0qFob27MFauMcsGvC28FI CQIY3CmIFCmzf8c1DUE3TVYWpJj0e+OnKU02D89/FF4M4TOGTa1/CtpNcpSDgE5A TCEw4cyEG1LEvDtw4DRdLBKVDnFW8XiUPz2xVC87/dZSC88CTllLnxtsaSfDHuui 0Y0BBZ3MJxgs =t5Q1 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/vivier2/tags/trivial-branch-for-5.2-pull-request' into staging Trivial Patches Pull request 20200928 # gpg: Signature made Mon 28 Sep 2020 10:15:00 BST # gpg: using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C # gpg: issuer "laurent@vivier.eu" # gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full] # gpg: aka "Laurent Vivier <laurent@vivier.eu>" [full] # gpg: aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full] # Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F 5173 F30C 38BD 3F2F BE3C * remotes/vivier2/tags/trivial-branch-for-5.2-pull-request: docs/system/deprecated: Move lm32 and unicore32 to the right section migration/multifd: Remove superfluous semicolons timer: Fix timer_mod_anticipate() documentation vhost-vdpa: remove useless variable Add *.pyc back to the .gitignore file virtio: vdpa: omit check return of g_malloc meson: fix static flag summary vhost-vdpa: fix indentation in vdpa_ops Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-09-28 14:03:09 +01:00
Chuan Zheng	894f021411	migration/tls: add trace points for multifd-tls add trace points for multifd-tls for debug. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-7-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	2964714015	migration/tls: add support for multifd tls-handshake Similar like migration main thread, we need to do handshake for each multifd thread. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-6-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	03c7a42d0d	migration/tls: extract cleanup function for common-use multifd channel cleanup is need if multifd handshake failed, let's extract it. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <1600139042-104593-5-git-send-email-zhengchuan@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Chuan Zheng	8e5fa05932	migration/tls: add tls_hostname into MultiFDSendParams Since multifd creation is async with migration_channel_connect, we should pass the hostname from MigrationState to MultiFDSendParams. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Message-Id: <1600139042-104593-4-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-09-25 12:45:58 +01:00
Philippe Mathieu-Daudé	df55509470	migration/multifd: Remove superfluous semicolons checkpatch.pl report superfluous semicolons since commit `ee0f3c09e0`, but this one was missed: scripts/checkpatch.pl d32ca5ad798~..d32ca5ad798 ERROR: superfluous trailing semicolon #498: FILE: migration/multifd.c:308: + ram_counters.transferred += transferred;; total: 1 errors, 1 warnings, 2073 lines checked Fixes: `d32ca5ad79` ("multifd: Split multifd code into its own file") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Message-Id: <20200921040231.437653-1-f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-23 19:13:56 +02:00
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
zhaolichang	3a4452d896	migration/: fix some comment spelling errors I found that there are many spelling errors in the comments of qemu, so I used the spellcheck tool to check the spelling errors and finally found some spelling errors in the migration folder. Signed-off-by: zhaolichang <zhaolichang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20200917075029.313-3-zhaolichang@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2020-09-17 20:36:32 +02:00
Laurent Vivier	7e89a1401a	migration: fix multifd_send_pages() next channel multifd_send_pages() loops around the available channels, the next channel to use between two calls to multifd_send_pages() is stored inside a local static variable, next_channel. It works well, except if the number of channels decreases between two calls to multifd_send_pages(). In this case, the loop can try to access the data of a channel that doesn't exist anymore. The problem can be triggered if we start a migration with a given number of channels and then we cancel the migration to restart it with a lower number. This ends generally with an error like: qemu-system-ppc64: .../util/qemu-thread-posix.c:77: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. This patch fixes the error by capping next_channel with the current number of channels before using it. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20200617113154.593233-1-lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-06-17 17:48:39 +01:00
Pan Nengyuan	13f2cb21e5	migration/multifd: Do error_free after migrate_set_error to avoid memleaks When error happen in multifd_send_thread, it use error_copy to set migrate error in multifd_send_terminate_threads(). We should call error_free after it. Similarly, fix another two places in multifd_recv_thread/multifd_save_cleanup. The leak stack: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x7f781af07cf0 in calloc (/lib64/libasan.so.5+0xefcf0) #1 0x7f781a2ce22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d) #2 0x55ee1d075c17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61 #3 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109 #4 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569 #5 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207 #6 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171 #7 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257 #8 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657 #9 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519 #10 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd) #11 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2) Indirect leak of 52 byte(s) in 1 object(s) allocated from: #0 0x7f781af07f28 in __interceptor_realloc (/lib64/libasan.so.5+0xeff28) #1 0x7f78156f07d9 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d7d9) #2 0x7f781a30ea6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c) #3 0x7f781a2e7cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0) #4 0x7f781a2e7d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c) #5 0x55ee1d075c86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65 #6 0x55ee1d076464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109 #7 0x55ee1cef066e in qio_channel_socket_writev /mnt/sdb/backup/qemu/io/channel-socket.c:569 #8 0x55ee1cee806b in qio_channel_writev /mnt/sdb/backup/qemu/io/channel.c:207 #9 0x55ee1cee806b in qio_channel_writev_all /mnt/sdb/backup/qemu/io/channel.c:171 #10 0x55ee1cee8248 in qio_channel_write_all /mnt/sdb/backup/qemu/io/channel.c:257 #11 0x55ee1ca12c9a in multifd_send_thread /mnt/sdb/backup/qemu/migration/multifd.c:657 #12 0x55ee1d0607fc in qemu_thread_start /mnt/sdb/backup/qemu/util/qemu-thread-posix.c:519 #13 0x7f78159ae2dd in start_thread (/lib64/libpthread.so.0+0x82dd) #14 0x7f78156df4b2 in __GI___clone (/lib64/libc.so.6+0xfc4b2) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200506095416.26099-3-pannengyuan@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-07 17:40:24 +01:00
Pan Nengyuan	ad31b8af73	migration/multifd: fix memleaks in multifd_new_send_channel_async When error happen in multifd_new_send_channel_async, 'sioc' will not be used to create the multifd_send_thread. Let's free it to avoid a memleak. And also do error_free after migrate_set_error() to avoid another leak in the same place. The leak stack: Direct leak of 2880 byte(s) in 8 object(s) allocated from: #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8) #1 0x7f20b44df1d5 in g_malloc (/lib64/libglib-2.0.so.0+0x531d5) #2 0x564133bce18b in object_new_with_type /mnt/sdb/backup/qemu/qom/object.c:683 #3 0x564133eea950 in qio_channel_socket_new /mnt/sdb/backup/qemu/io/channel-socket.c:56 #4 0x5641339cfe4f in socket_send_channel_create /mnt/sdb/backup/qemu/migration/socket.c:37 #5 0x564133a10328 in multifd_save_setup /mnt/sdb/backup/qemu/migration/multifd.c:772 #6 0x5641339cebed in migrate_fd_connect /mnt/sdb/backup/qemu/migration/migration.c:3530 #7 0x5641339d15e4 in migration_channel_connect /mnt/sdb/backup/qemu/migration/channel.c:92 #8 0x5641339cf5b7 in socket_outgoing_migration /mnt/sdb/backup/qemu/migration/socket.c:108 Direct leak of 384 byte(s) in 8 object(s) allocated from: #0 0x7f20b5118cf0 in calloc (/lib64/libasan.so.5+0xefcf0) #1 0x7f20b44df22d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5322d) #2 0x56413406fc17 in error_setv /mnt/sdb/backup/qemu/util/error.c:61 #3 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109 #4 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379 #5 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458 #6 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105 #7 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145 #8 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168 Indirect leak of 360 byte(s) in 8 object(s) allocated from: #0 0x7f20b5118ae8 in __interceptor_malloc (/lib64/libasan.so.5+0xefae8) #1 0x7f20af901817 in __GI___vasprintf_chk (/lib64/libc.so.6+0x10d817) #2 0x7f20b451fa6c in g_vasprintf (/lib64/libglib-2.0.so.0+0x93a6c) #3 0x7f20b44f8cd0 in g_strdup_vprintf (/lib64/libglib-2.0.so.0+0x6ccd0) #4 0x7f20b44f8d8c in g_strdup_printf (/lib64/libglib-2.0.so.0+0x6cd8c) #5 0x56413406fc86 in error_setv /mnt/sdb/backup/qemu/util/error.c:65 #6 0x564134070464 in error_setg_errno_internal /mnt/sdb/backup/qemu/util/error.c:109 #7 0x5641340851be in inet_connect_addr /mnt/sdb/backup/qemu/util/qemu-sockets.c:379 #8 0x5641340851be in inet_connect_saddr /mnt/sdb/backup/qemu/util/qemu-sockets.c:458 #9 0x5641340870ab in socket_connect /mnt/sdb/backup/qemu/util/qemu-sockets.c:1105 #10 0x564133eeaabf in qio_channel_socket_connect_sync /mnt/sdb/backup/qemu/io/channel-socket.c:145 #11 0x564133eeabf5 in qio_channel_socket_connect_worker /mnt/sdb/backup/qemu/io/channel-socket.c:168 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200506095416.26099-2-pannengyuan@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-05-07 17:40:24 +01:00
Daniel Brodsky	6e8a355de6	lockable: replaced locks with lock guard macros where appropriate - ran regexp "qemu_mutex_lock$.$.\n.*if" to find targets - replaced result with QEMU_LOCK_GUARD if all unlocks at function end - replaced result with WITH_QEMU_LOCK_GUARD if unlock not at end Signed-off-by: Daniel Brodsky <dnbrdsky@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20200404042108.389635-3-dnbrdsky@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2020-05-04 16:07:43 +01:00
Juan Quintela	7ec2c2b3c1	multifd: Add zlib compression multifd support Signed-off-by: Juan Quintela <quintela@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-02-28 09:24:43 +01:00
Juan Quintela	ab7cbb0b9a	multifd: Make no compression operations into its own structure It will be used later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> --- No comp value needs to be zero.	2020-02-28 09:24:43 +01:00
Juan Quintela	d32ca5ad79	multifd: Split multifd code into its own file Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-01-29 11:28:59 +01:00

1 2 3 4

155 Commits