mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Isaku Yamahata	c5d9425ef4	kvm/tdx: Don't complain when converting vMMIO region to shared Because vMMIO region needs to be shared region, guest TD may explicitly convert such region from private to shared. Don't complain such conversion. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-34-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Chao Peng	c15e568407	kvm: handle KVM_EXIT_MEMORY_FAULT Upon an KVM_EXIT_MEMORY_FAULT exit, userspace needs to do the memory conversion on the RAMBlock to turn the memory into desired attribute, switching between private and shared. Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when KVM_EXIT_MEMORY_FAULT happens. Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has guest_memfd memory backend. Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is added. When page is converted from shared to private, the original shared memory can be discarded via ram_block_discard_range(). Note, shared memory can be discarded only when it's not back'ed by hugetlb because hugetlb is supposed to be pre-allocated and no need for discarding. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-13-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:26 +02:00
Xiaoyao Li	bd3bcf6962	kvm/memory: Make memory type private by default if it has guest memfd backend KVM side leaves the memory to shared by default, which may incur the overhead of paging conversion on the first visit of each page. Because the expectation is that page is likely to private for the VMs that require private memory (has guest memfd). Explicitly set the memory to private when memory region has valid guest memfd backend. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-16-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Chao Peng	ce5a983233	kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM. With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that backend'ed both by hva-based shared memory and guest memfd based private memory. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-10-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	15f7a80c49	RAMBlock: Add support of KVM private guest memfd Add KVM guest_memfd support to RAMBlock so both normal hva based memory and kvm guest memfd based private memory can be associated in one RAMBlock. Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to create private guest_memfd during RAMBlock setup. Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd is more flexible and extensible than simply relying on the VM type because in the future we may have the case that not all the memory of a VM need guest memfd. As a benefit, it also avoid getting MachineState in memory subsystem. Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of confidential guests, such as TDX VM. How and when to set it for memory backends will be implemented in the following patches. Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has KVM guest_memfd allocated. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20240320083945.991426-7-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	0811baed49	kvm: Introduce support for memory_attributes Introduce the helper functions to set the attributes of a range of memory to private or shared. This is necessary to notify KVM the private/shared attribute of each gpa range. KVM needs the information to decide the GPA needs to be mapped at hva-based shared memory or guest_memfd based private memory. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-11-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Xiaoyao Li	72853afc63	trace/kvm: Split address space and slot id in trace_kvm_set_user_memory() The upper 16 bits of kvm_userspace_memory_region::slot are address space id. Parse it separately in trace_kvm_set_user_memory(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	a99c0c66eb	KVM: remove kvm_arch_cpu_check_are_resettable Board reset requires writing a fresh CPU state. As far as KVM is concerned, the only thing that blocks reset is that CPU state is encrypted; therefore, kvm_cpus_are_resettable() can simply check if that is the case. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	5c3131c392	KVM: track whether guest state is encrypted So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the guest state is encrypted, in which case they do nothing. For the new API using VM types, instead, the ioctls will fail which is a safer and more robust approach. The new API will be the only one available for SEV-SNP and TDX, but it is also usable for SEV and SEV-ES. In preparation for that, require architecture-specific KVM code to communicate the point at which guest state is protected (which must be after kvm_cpu_synchronize_post_init(), though that might change in the future in order to suppor migration). From that point, skip reading registers so that cpu->vcpu_dirty is never true: if it ever becomes true, kvm_arch_put_registers() will fail miserably. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-23 17:35:25 +02:00
Paolo Bonzini	1e1e48792a	kvm: use configs/ definition to conditionalize debug support If an architecture adds support for KVM_CAP_SET_GUEST_DEBUG but QEMU does not have the necessary code, QEMU will fail to build after updating kernel headers. Avoid this by using a #define in config-target.h instead of KVM_CAP_SET_GUEST_DEBUG. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-18 11:17:27 +02:00
Igor Mammedov	e34f4d87e8	kvm: error out of kvm_irqchip_add_msi_route() in case of full route table subj is calling kvm_add_routing_entry() which simply extends KVMState::irq_routes::entries[] but doesn't check if number of routes goes beyond limit the kernel is willing to accept. Which later leads toi the assert qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed typically it happens during guest boot for large enough guest Reproduced with: ./qemu --enable-kvm -m 8G -smp 64 -machine pc \ `for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b "; for i in {0..31}; do touch /tmp/vblk$b$i; echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw -device virtio-blk-pci,drive=drive$b$i,bus=pci$b "; done; done` While crash at boot time is bad, the same might happen at hotplug time which is unacceptable. So instead calling kvm_add_routing_entry() unconditionally, check first that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio device insteads killin qemu, will gracefully fail to initialize device as expected with following warnings on console: virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set. virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-ID: <20240408110956.451558-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-04-08 21:22:00 +02:00
William Roche	06152b89db	migration: prevent migration when VM has poisoned memory A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>	2024-02-05 14:41:58 +08:00
Peter Maydell	3f2a357b95	HW core patch queue . Deprecate unmaintained SH-4 models (Samuel) . HPET: Convert DPRINTF calls to trace events (Daniel) . Implement buffered block writes in Intel PFlash (Gerd) . Ignore ELF loadable segments with zero size (Bin) . ESP/NCR53C9x: PCI DMA fixes (Mark) . PIIX: Simplify Xen PCI IRQ routing (Bernhard) . Restrict CPU 'start-powered-off' property to sysemu (Phil) . target/alpha: Only build sys_helper.c on system emulation (Phil) . target/xtensa: Use generic instruction breakpoint API & add test (Max) . Restrict icount to system emulation (Phil) . Do not set CPUState TCG-specific flags in non-TCG accels (Phil) . Cleanup TCG tb_invalidate API (Phil) . Correct LoongArch/KVM include path (Bibo) . Do not ignore throttle errors in crypto backends (Phil) . MAINTAINERS updates (Raphael, Zhao) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmWqXbkACgkQ4+MsLN6t wN6VVBAAkP/Bs2JfQYobPZVV868wceM97KeUJMXP2YWf6dSLpHRCQN5KtuJcACM9 y3k3R7nMeVJSGmzl/1gF1G9JhjoCLoVLX/ejeBppv4Wq//9sEdggaQfdCwkhWw2o IK/gPjTZpimE7Er4hPlxmuhSRuM1MX4duKFRRfuZpE7XY14Y7/Hk12VIG7LooO0x 2Sl8CaU0DN7CWmRVDoUkwVx7JBy28UVarRDsgpBim7oKmjjBFnCJkH6B6NJXEiYr z1BmIcHa87S09kG1ek+y8aZpG9iPC7nUWjPIQyJGhnfrnBuO7hQHwCLIjHHp5QBR BoMr8YQNTI34/M/D8pBfg96LrGDjkQOfwRyRddkMP/jJcNPMAPMNGbfVaIrfij1e T+jFF4gQenOvy1XKCY3Uk/a11P3tIRFBEeOlzzQg4Aje9W2MhUNwK2HTlRfBbrRr V30R764FDmHlsyOu6/E3jqp4GVCgryF1bglPOBjVEU5uytbQTP8jshIpGVnxBbF+ OpFwtsoDbsousNKVcO5+B0mlHcB9Ru9h11M5/YD/jfLMk95Ga90JGdgYpqQ5tO5Y aqQhKfCKbfgKuKhysxpsdWAwHZzVrlSf+UrObF0rl2lMXXfcppjCqNaw4QJ0oedc DNBxTPcCE2vWhUzP3A60VH7jLh4nLaqSTrxxQKkbx+Je1ERGrxs= =KmQh -----END PGP SIGNATURE----- Merge tag 'hw-cpus-20240119' of https://github.com/philmd/qemu into staging HW core patch queue . Deprecate unmaintained SH-4 models (Samuel) . HPET: Convert DPRINTF calls to trace events (Daniel) . Implement buffered block writes in Intel PFlash (Gerd) . Ignore ELF loadable segments with zero size (Bin) . ESP/NCR53C9x: PCI DMA fixes (Mark) . PIIX: Simplify Xen PCI IRQ routing (Bernhard) . Restrict CPU 'start-powered-off' property to sysemu (Phil) . target/alpha: Only build sys_helper.c on system emulation (Phil) . target/xtensa: Use generic instruction breakpoint API & add test (Max) . Restrict icount to system emulation (Phil) . Do not set CPUState TCG-specific flags in non-TCG accels (Phil) . Cleanup TCG tb_invalidate API (Phil) . Correct LoongArch/KVM include path (Bibo) . Do not ignore throttle errors in crypto backends (Phil) . MAINTAINERS updates (Raphael, Zhao) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmWqXbkACgkQ4+MsLN6t # wN6VVBAAkP/Bs2JfQYobPZVV868wceM97KeUJMXP2YWf6dSLpHRCQN5KtuJcACM9 # y3k3R7nMeVJSGmzl/1gF1G9JhjoCLoVLX/ejeBppv4Wq//9sEdggaQfdCwkhWw2o # IK/gPjTZpimE7Er4hPlxmuhSRuM1MX4duKFRRfuZpE7XY14Y7/Hk12VIG7LooO0x # 2Sl8CaU0DN7CWmRVDoUkwVx7JBy28UVarRDsgpBim7oKmjjBFnCJkH6B6NJXEiYr # z1BmIcHa87S09kG1ek+y8aZpG9iPC7nUWjPIQyJGhnfrnBuO7hQHwCLIjHHp5QBR # BoMr8YQNTI34/M/D8pBfg96LrGDjkQOfwRyRddkMP/jJcNPMAPMNGbfVaIrfij1e # T+jFF4gQenOvy1XKCY3Uk/a11P3tIRFBEeOlzzQg4Aje9W2MhUNwK2HTlRfBbrRr # V30R764FDmHlsyOu6/E3jqp4GVCgryF1bglPOBjVEU5uytbQTP8jshIpGVnxBbF+ # OpFwtsoDbsousNKVcO5+B0mlHcB9Ru9h11M5/YD/jfLMk95Ga90JGdgYpqQ5tO5Y # aqQhKfCKbfgKuKhysxpsdWAwHZzVrlSf+UrObF0rl2lMXXfcppjCqNaw4QJ0oedc # DNBxTPcCE2vWhUzP3A60VH7jLh4nLaqSTrxxQKkbx+Je1ERGrxs= # =KmQh # -----END PGP SIGNATURE----- # gpg: Signature made Fri 19 Jan 2024 11:32:09 GMT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'hw-cpus-20240119' of https://github.com/philmd/qemu: (36 commits) configure: Add linux header compile support for LoongArch MAINTAINERS: Update hw/core/cpu.c entry MAINTAINERS: Update Raphael Norwitz email hw/elf_ops: Ignore loadable segments with zero size hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion interrupt hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources hw/scsi/esp-pci: use correct address register for PCI DMA transfers target/riscv: Rename tcg_cpu_FOO() to include 'riscv' target/i386: Rename tcg_cpu_FOO() to include 'x86' hw/s390x: Rename cpu_class_init() to include 'sclp' hw/core/cpu: Rename cpu_class_init() to include 'common' accel: Rename accel_init_ops_interfaces() to include 'system' cpus: Restrict 'start-powered-off' property to system emulation system/watchpoint: Move TCG specific code to accel/tcg/ system/replay: Restrict icount to system emulation hw/pflash: implement update buffer for block writes hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p hw/pflash: refactor pflash_data_write() hw/i386/pc_piix: Make piix_intx_routing_notifier_xen() more device independent ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-01-19 11:39:38 +00:00
Philippe Mathieu-Daudé	396f66f99d	accel: Do not set CPUState::can_do_io in non-TCG accels 'can_do_io' is specific to TCG. It was added to other accelerators in `626cf8f4c6` ("icount: set can_do_io outside TB execution"), then likely copy/pasted in commit `c97d6d2cdf` ("i386: hvf: add code base from Google's QEMU repository"). Having it set in non-TCG code is confusing, so remove it from QTest / HVF / KVM. Fixes: `626cf8f4c6` ("icount: set can_do_io outside TB execution") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231129205037.16849-1-philmd@linaro.org>	2024-01-19 12:28:59 +01:00
Daan De Meyer	aef158b093	Add class property to configure KVM device node to use This allows passing the KVM device node to use as a file descriptor via /dev/fdset/XX. Passing the device node to use as a file descriptor allows running qemu unprivileged even when the user running qemu is not in the kvm group on distributions where access to /dev/kvm is gated behind membership of the kvm group (as long as the process invoking qemu is able to open /dev/kvm and passes the file descriptor to qemu). Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Message-ID: <20231021134015.1119597-1-daan.j.demeyer@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2024-01-18 10:43:14 +01:00
Stefan Hajnoczi	195801d700	system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2024-01-08 10:45:43 -05:00
Jai Arora	9cdfb1e3a5	accel/kvm: Turn DPRINTF macro use into tracepoints Patch removes DPRINTF macro and adds multiple tracepoints to capture different kvm events. We also drop the DPRINTFs that don't add any additional information than trace_kvm_run_exit already does. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827 Signed-off-by: Jai Arora <arorajai2798@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2023-12-23 19:39:35 +03:00
Richard Henderson	16617c3cba	accel/kvm: Make kvm_has_guest_debug static This variable is not used or declared outside kvm-all.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-12-19 17:57:38 +00:00
Paolo Bonzini	1a44a79ddf	kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEP Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 19:53:30 +02:00
Paolo Bonzini	f57a4dd311	kvm: i386: require KVM_CAP_DEBUGREGS This was introduced in KVM in Linux 2.6.35, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	2cb81af0b1	kvm: unify listeners for PIO address space Since we now assume that ioeventfds are present, kvm_io_listener is always registered. Merge it with kvm_coalesced_pio_listener in a single listener. Since PIO space does not have KVM memslots attached to it, the priority is irrelevant. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	126e7f7803	kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016. Assume that it is present. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	5d9ec1f4c7	kvm: assume that many ioeventfds can be created NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had increased to 1000 and later ioeventfds were changed to not count against the limit. But the earlier limit of 200 would already be enough for kvm_check_many_ioeventfds() to be true, so remove the check. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	d19fe67ba8	kvm: drop reference to KVM_CAP_PCI_2_3 This is a remnant of pre-VFIO device assignment; it is not defined anymore by Linux and not used by QEMU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	a788260b20	kvm: require KVM_IRQFD for kernel irqchip KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been available on architectures that support an in-kernel interrupt controller. We can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	cc5e719e2c	kvm: require KVM_CAP_SIGNAL_MSI This was introduced in KVM in Linux 3.5, we can require it unconditionally in kvm_irqchip_send_msi(). However, not all architectures have to implement it so check it only in x86, the only architecture that ever had MSI injection but not KVM_CAP_SIGNAL_MSI. ARM uses it to detect the presence of the ITS emulation in the kernel, introduced in Linux 4.8. Assume that it's there and possibly fail when realizing the arm-its-kvm device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:14 +02:00
Paolo Bonzini	aacec9aee1	kvm: require KVM_CAP_INTERNAL_ERROR_DATA This was introduced in KVM in Linux 2.6.33, we can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:14 +02:00
David Hildenbrand	16ab2eda57	kvm: Add stub for kvm_get_max_memslots() We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Message-ID: <20230926185738.277351-11-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2023-10-12 14:15:22 +02:00
David Hildenbrand	5b23186a95	kvm: Return number of free memslots Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2023-10-12 14:15:22 +02:00
Richard Henderson	464dacf609	accel/tcg: Move can_do_io to CPUNegativeOffsetState Minimize the displacement to can_do_io, since it may be touched at the start of each TranslationBlock. It fits into other padding within the substructure. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-10-03 08:01:02 -07:00
Akihiko Odaki	7191f24c7f	accel/kvm/kvm-all: Handle register access errors A register access error typically means something seriously wrong happened so that anything bad can happen after that and recovery is impossible. Even failing one register access is catastorophic as architecture-specific code are not written so that it torelates such failures. Make sure the VM stop and nothing worse happens if such an error occurs. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-09-29 09:33:10 +02:00
Shameer Kolothum	c8f2eb5d41	arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Now that we have Eager Page Split support added for ARM in the kernel, enable it in Qemu. This adds, -eager-split-size to -accel sub-options to set the eager page split chunk size. -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-09-08 16:41:36 +01:00
Anton Johansson	b67be03e3a	accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint Widens the pc and saved_insn fields of kvm_sw_breakpoint from target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also widened to match. Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230807155706.9580-2-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-08-24 11:21:22 -07:00
Akihiko Odaki	43a5e377f4	accel/kvm: Make kvm_dirty_ring_reaper_init() void The returned value was always zero and had no meaning. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:04 +01:00
Akihiko Odaki	4625742cd2	accel/kvm: Free as when an error occurred An error may occur after s->as is allocated, for example if the KVM_CREATE_VM ioctl call fails. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-08-22 17:31:04 +01:00
Akihiko Odaki	bc3e41a0e8	accel/kvm: Use negative KVM type for error propagation On MIPS, kvm_arch_get_default_type() returns a negative value when an error occurred so handle the case. Also, let other machines return negative values when errors occur and declare returning a negative value as the correct way to propagate an error that happened when determining KVM type. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:03 +01:00
Akihiko Odaki	5e0d65909c	kvm: Introduce kvm_arch_get_default_type hook kvm_arch_get_default_type() returns the default KVM type. This hook is particularly useful to derive a KVM type that is valid for "none" machine model, which is used by libvirt to probe the availability of KVM. For MIPS, the existing mips_kvm_type() is reused. This function ensures the availability of VZ which is mandatory to use KVM on the current QEMU. Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: added doc comment for new function] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:02 +01:00
Gavin Shan	fe6bda58e0	kvm: Fix crash due to access uninitialized kvm_state Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing uninitialized @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit `176d073029` ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added earlier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the uninitialized @kvm_state. With this applied, no crash is observed on arm64. Fixes: `176d073029` ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230731125946.2038742-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-07-31 14:19:43 +01:00
Isaku Yamahata	14a868c626	exec/memory: Add symbol for the min value of memory listener priority Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Isaku Yamahata	8be0461d37	exec/memory: Add symbol for memory listener priority for device backend Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value for memory listener to replace the hard-coded value 10 for the device backend. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Isaku Yamahata	5369a36c4f	exec/memory: Add symbolic value for memory listener priority for accel Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory listener to replace the hard-coded value 10 for accel. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Marcelo Tosatti	3b6f485275	kvm: reuse per-vcpu stats fd to avoid vcpu interruption A regression has been detected in latency testing of KVM guests. More specifically, it was observed that the cyclictest numbers inside of an isolated vcpu (running on isolated pcpu) are: Where a maximum of 50us is acceptable. The implementation of KVM_GET_STATS_FD uses run_on_cpu to query per vcpu statistics, which interrupts the vcpu (and is unnecessary). To fix this, open the per vcpu stats fd on vcpu initialization, and read from that fd from QEMU's main thread. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-06-26 10:23:01 +02:00
Gavin Shan	856e23a0fb	kvm: Enable dirty ring for arm64 arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230509022122.20888-5-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:51 +02:00
Gavin Shan	3794cb9485	kvm: Add helper kvm_dirty_ring_init() Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-4-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Gavin Shan	b20cc77692	kvm: Synchronize the backup bitmap in the last stage In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-3-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Gavin Shan	1e493be587	migration: Add last stage indicator to global dirty log The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Peter Xu	56adee407f	kvm: dirty-ring: Fix race with vcpu creation It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add <---- can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-04 18:46:46 +02:00
Mads Ynddal	412ae12647	gdbstub: move update guest debug to accel ops Continuing the refactor of `a48e7d9e52` (gdbstub: move guest debug support check to ops) by removing hardcoded kvm_enabled() from generic cpu.c code, and replace it with a property of AccelOpsClass. Signed-off-by: Mads Ynddal <m.ynddal@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230207131721.49233-1-mads@ynddal.dk> [AJB: add ifdef around update_guest_debug_ops, fix brace] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230302190846.2593720-27-alex.bennee@linaro.org> Message-Id: <20230303025805.625589-30-richard.henderson@linaro.org>	2023-03-07 20:44:09 +00:00
David Woodhouse	e16aff4cc2	kvm/i386: Add xen-evtchn-max-pirq property The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:09:22 +00:00
David Woodhouse	6f43f2ee49	kvm/i386: Add xen-gnttab-max-frames property Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:07:52 +00:00

1 2 3 4 5

206 Commits