mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Ilya Leoshkevich	3faebbcd64	test-util-filemonitor: Adapt to the FreeBSD inotify rename semantics Unlike on Linux, on FreeBSD renaming a file when the destination already exists results in an IN_DELETE event for that existing file: $ FILEMONITOR_DEBUG=1 build/tests/unit/test-util-filemonitor Rename /tmp/test-util-filemonitor-K13LI2/fish/one.txt -> /tmp/test-util-filemonitor-K13LI2/two.txt Event id=200000000 event=2 file=one.txt Queue event id 200000000 event 2 file one.txt Queue event id 100000000 event 2 file two.txt Queue event id 100000002 event 2 file two.txt Queue event id 100000000 event 0 file two.txt Queue event id 100000002 event 0 file two.txt Event id=100000000 event=0 file=two.txt Expected event 0 but got 2 This difference in behavior is not expected to break the real users, so teach the test to accept it. Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 10:27:50 +01:00
Ilya Leoshkevich	fceffd6b3f	tests/vm/freebsd: Reload the sshd configuration After console_sshd_config(), the SSH server needs to be nudged to pick up the new configs. The scripts for the other BSD flavors already do this with a reboot, but a simple reload is sufficient. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 10:27:50 +01:00
Ilya Leoshkevich	7f0bf5ea41	tests/vm: Set UseDNS=no in the sshd configuration make vm-build-freebsd sometimes fails with "Connection timed out during banner exchange". The client strace shows: 13:59:30 write(3, "SSH-2.0-OpenSSH_9.3\r\n", 21) = 21 13:59:30 getpid() = 252655 13:59:30 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "S", 1) = 1 13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "S", 1) = 1 13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "H", 1) = 1 There is a 2s delay during connection, and ConnectTimeout is set to 1. Raising it makes the issue go away, but we can do better. The server truss shows: 888: 27.811414714 socket(PF_INET,SOCK_DGRAM\|SOCK_CLOEXEC,0) = 5 (0x5) 888: 27.811765030 connect(5,{ AF_INET 10.0.2.3:53 },16) = 0 (0x0) 888: 27.812166941 sendto(5,"\^Z/\^A\0\0\^A\0\0\0\0\0\0\^A2"...,39,0,NULL,0) = 39 (0x27) 888: 29.363970743 poll({ 5/POLLRDNORM },1,5000) = 1 (0x1) So the delay is due to a DNS query. Disable DNS queries in the server config. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 10:27:50 +01:00
Philippe Mathieu-Daudé	d0143fa9ee	target/s390x: Prefer fast cpu_env() over slower CPU QOM cast macro Mechanical patch produced running the command documented in scripts/coccinelle/cpu_env.cocci_template header. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240129164514.73104-25-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 10:27:50 +01:00
Ilya Leoshkevich	eb14b021f8	tests/tcg/s390x: Test CONVERT TO BINARY Check the CVB's, CVBY's, and CVBG's corner cases. Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-5-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 09:51:37 +01:00
Ilya Leoshkevich	5b003b59ac	tests/tcg/s390x: Test CONVERT TO DECIMAL Check the CVD's, CVDY's, and CVDG's corner cases. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 09:51:37 +01:00
Ilya Leoshkevich	b4b8d58e56	target/s390x: Emulate CVB, CVBY and CVBG Convert to Binary - counterparts of the already implemented Convert to Decimal (CVD*) instructions. Example from the Principles of Operation: 25594C becomes 63FA. Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240205205830.6425-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 09:51:37 +01:00
Ilya Leoshkevich	a6e55a82e9	target/s390x: Emulate CVDG CVDG is the same as CVD, except that it converts 64 bits into 128, rather than 32 into 64. Create a new helper, which uses Int128 wrappers. Reported-by: Ido Plat <Ido.Plat@ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-06 09:51:37 +01:00
Mark Kanda	04accf43df	oslib-posix: initialize backend memory objects in parallel QEMU initializes preallocated backend memory as the objects are parsed from the command line. This is not optimal in some cases (e.g. memory spanning multiple NUMA nodes) because the memory objects are initialized in series. Allow the initialization to occur in parallel (asynchronously). In order to ensure optimal thread placement, asynchronous initialization requires prealloc context threads to be in use. Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2024-02-06 08:15:22 +01:00
David Hildenbrand	540a1abbf0	memory-device: reintroduce memory region size check We used to check that the memory region size is multiples of the overall requested address alignment for the device memory address. We removed that check, because there are cases (i.e., hv-balloon) where devices unconditionally request an address alignment that has a very large alignment (i.e., 32 GiB), but the actual memory device size might not be multiples of that alignment. However, this change: (a) allows for some practically impossible DIMM sizes, like "1GB+1 byte". (b) allows for DIMMs that partially cover hugetlb pages, previously reported in [1]. Both scenarios don't make any sense: we might even waste memory. So let's reintroduce that check, but only check that the memory region size is multiples of the memory region alignment (i.e., page size, huge page size), but not any additional memory device requirements communicated using md->get_min_alignment(). The following examples now fail again as expected: (a) 1M with 2M THP qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (b) 1G+1byte qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1073741825B \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (c) Unliagned hugetlb size (2M) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages/tmp,size=511M \ -device pc-dimm,id=dimm1,memdev=mem1 backend memory size must be multiple of 0x200000 (d) Unliagned hugetlb size (1G) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages1G/tmp,size=2047M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x40000000 Note that this fix depends on a hv-balloon change to communicate its additional alignment requirements using get_min_alignment() instead of through the memory region. [1] https://lkml.kernel.org/r/f77d641d500324525ac036fe1827b3070de75fc1.1701088320.git.mprivozn@redhat.com Message-ID: <20240117135554.787344-3-david@redhat.com> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Reported-by: Michal Privoznik <mprivozn@redhat.com> Fixes: `eb1b7c4bd4` ("memory-device: Drop size alignment check") Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2024-02-06 08:14:59 +01:00
Peter Xu	488c84acb4	migration/multifd: Optimize sender side to be lockless When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-06 11:05:49 +08:00
Richard Henderson	23c5692abc	tcg/tci: Support TCG_COND_TST{EQ,NE} Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-05 22:45:41 +00:00
Richard Henderson	585b7a4247	tcg/s390x: Support TCG_COND_TST{EQ,NE} Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-05 22:45:41 +00:00
Thomas Huth	6a41a62171	docs/about: Deprecate the old "power5+" and "power7+" CPU names For consistency we should drop the names with a "+" in it in the long run. Message-ID: <20240117141054.73841-3-thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-05 14:21:21 +01:00
Thomas Huth	5bfb75f152	target/ppc/cpu-models: Rename power5+ and power7+ for new QOM naming rules The character "+" is now forbidden in QOM device names (see commit `b447378e12` - "Limit type names to alphanumerical and some few special characters"). For the "power5+" and "power7+" CPU names, there is currently a hack in type_name_is_valid() to still allow them for compatibility reasons. However, there is a much nicer solution for this: Simply use aliases! This way we can still support the old names without the need for the ugly hack in type_name_is_valid(). Message-ID: <20240117141054.73841-2-thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-05 14:21:21 +01:00
Sven Schnelle	8b09b7fe47	hw/scsi/lsi53c895a: add missing decrement of reentrancy counter When the maximum count of SCRIPTS instructions is reached, the code stops execution and returns, but fails to decrement the reentrancy counter. This effectively renders the SCSI controller unusable because on next entry the reentrancy counter is still above the limit. This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS loops. Fixes: `b987718bbb` ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)") Signed-off-by: Sven Schnelle <svens@stackframe.org> Message-ID: <20240128202214.2644768-1-svens@stackframe.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Thomas Huth <thuth@redhat.com>	2024-02-05 14:21:21 +01:00
Peter Xu	98ea497d8b	migration/multifd: Fix MultiFDSendParams.packet_num race As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:11 +08:00
Peter Xu	cde85c37ca	migration/multifd: Stick with send/recv on function names Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:11 +08:00
Peter Xu	5e6ea8a1d6	migration/multifd: Cleanup multifd_load_cleanup() Use similar logic to cleanup the recv side. Note that multifd_recv_terminate_threads() may need some similar rework like the sender side, but let's leave that for later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	12808db3b8	migration/multifd: Cleanup multifd_save_cleanup() Shrink the function by moving relevant works into helpers: move the thread join()s into multifd_send_terminate_threads(), then create two more helpers to cover channel/state cleanups. Add a TODO entry for the thread terminate process because p->running is still buggy. We need to fix it at some point but not yet covered. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	f88f86c4ee	migration/multifd: Rewrite multifd_queue_page() The current multifd_queue_page() is not easy to read and follow. It is not good with a few reasons: - No helper at all to show what exactly does a condition mean; in short, readability is low. - Rely on pages->ramblock being cleared to detect an empty queue. It's slightly an overload of the ramblock pointer, per Fabiano [1], which I also agree. - Contains a self recursion, even if not necessary.. Rewrite this function. We add some comments to make it even clearer on what it does. [1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	3b40964a86	migration/multifd: Change retval of multifd_send_pages() Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	d6556d174a	migration/multifd: Change retval of multifd_queue_page() Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	3ab4441d97	migration/multifd: Split multifd_send_terminate_threads() Split multifd_send_terminate_threads() into two functions: - multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only - multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling. Use multifd_send_set_error() in the three old call sites where only the error will be set. Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup(). Both helpers will need to set quitting=1. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	859ebaf346	migration/multifd: Forbid spurious wakeups Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory was right it was there because multifd used to hit that when still debugging. Let's drop it and see what can explode; as long as it's not reaching soft-freeze. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	25a1f87875	migration/multifd: Move header prepare/fill into send_prepare() This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy. Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later. So now the API looks like: p->pages -----------> send_prepare() -------------> IOVs This also prepares for the case where the input can be extended to even not any p->pages. But that's for later. This patch will achieve similar goal of what Fabiano used to propose here: https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano. Co-developed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	452b205702	migration/multifd: multifd_send_prepare_header() Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender. It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in. With this helper, we can already slightly clean up the zero copy path. Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	8a9ef17380	migration/multifd: Move trace_multifd_send\|recv() Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	db7e1cc510	migration/multifd: Move total_normal_pages accounting Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	05b7ec1890	migration/multifd: Rename p->num_packets and clean it up This field, no matter whether on src or dest, is only used for debugging purpose. They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t). So let's keep them for now, but make them much easier to understand, by doing below: - Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num. - Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1. - Move them to where we send/recv the packets. They're: - multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	83c560fb42	migration/multifd: Drop pages->num check in sender thread Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	e3cce9af10	migration/multifd: Simplify locking in sender thread The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations, because the requester can already read pending_job safely even without the lock, because the requester is currently the only one who can assign a task. Drop that lock complication on both sides: (1) in the sender thread, always take the mutex until job done (2) in the requester thread, check pending_job clear lockless Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	f5f48a7891	migration/multifd: Separate SYNC request with normal jobs Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request. The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages. Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job. However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where: - migration thread enqueue a job to send some pages, pending_job++ (0->1) - [...before the selected multifd sender thread wakes up...] - migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC - multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information. I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated. This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags. With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway. Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0(). Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	efd8c5439d	migration/multifd: Drop MultiFDSendParams.normal[] array This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array. Alongside, the normal_num is also redundant, which is the same to p->pages->num. This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	836eca47f6	migration/multifd: Postpone reset of MultiFDPages_t Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "pages" struct will not be reused anyway until pending_job is cleared. Move that to the end after the job is completed, provide a helper to reset a "pages" object. Use that same helper when free the object too. This prepares us to keep using p->pages in the follow up patches, where we may drop p->normal[]. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	15f3f21d59	migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths Multifd send side has two fields to indicate error quits: - MultiFDSendParams.quit - &multifd_send_state->exiting Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock. A few more things done on top of this altogether: - multifd_send_terminate_threads() Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state(). - multifd_send_sync_main() In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit. - multifd_tls_outgoing_handshake() Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel. - multifd_new_send_channel_async() Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	48c0f5d56f	migration/multifd: multifd_send_kick_main() When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon. Provide a helper for it and deduplicate the code. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
Peter Xu	8888a552bf	migration/multifd: Drop stale comment for multifd zero copy We've already done that with multifd_flush_after_each_section, for multifd in general. Drop the stale "TODO-like" comment. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:42:10 +08:00
William Roche	06152b89db	migration: prevent migration when VM has poisoned memory A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:41:58 +08:00
David Hildenbrand	f77c5f38f4	hv-balloon: use get_min_alignment() to express 32 GiB alignment Let's implement the get_min_alignment() callback for memory devices, and copy for the device memory region the alignment of the host memory region. This mimics what virtio-mem does, and allows for re-introducing proper alignment checks for the memory region size (where we don't care about additional device requirements) in memory device core. Message-ID: <20240117135554.787344-2-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2024-02-04 17:42:03 +01:00
Richard Henderson	d95b51d3fb	tcg/s390x: Add TCG_CT_CONST_CMP Better constraint for tcg_out_cmp, based on the comparison. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	cbaddf3094	tcg/s390x: Split constraint A into J+U Signed 33-bit == signed 32-bit + unsigned 32-bit. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	ad788aebba	tcg/ppc: Support TCG_COND_TST{EQ,NE} Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	282ef7e8ef	tcg/ppc: Add TCG_CT_CONST_CMP Better constraint for tcg_out_cmp, based on the comparison. We can't yet remove the fallback to load constants into a scratch because of tcg_out_cmp2, but that path should not be as frequent. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	2f2faef6b0	tcg/ppc: Tidy up tcg_target_const_match Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	740f1d40e5	tcg/ppc: Use cr0 in tcg_to_bc and tcg_to_isel Using cr0 means we could choose to use rc=1 to compute the condition. Adjust the tables and tcg_out_cmp that feeds them. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	2e7eafcc40	tcg/ppc: Sink tcg_to_bc usage into tcg_out_bc Rename the current tcg_out_bc function to tcg_out_bc_lab, and create a new function that takes an integer displacement + link. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	b9ddaf5618	tcg/sparc64: Support TCG_COND_TST{EQ,NE} Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	905afe37ab	tcg/sparc64: Pass TCGCond to tcg_out_cmp Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00
Richard Henderson	6bc74a5387	tcg/sparc64: Hoist read of tcg_cond_to_rcond Use a non-zero value here (an illegal encoding) as a better condition than is_unsigned_cond for when MOVR/BPR is usable. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-02-03 23:53:49 +00:00

1 2 3 4 5 ...

110659 Commits