mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Paolo Bonzini	1a44a79ddf	kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEP Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 19:53:30 +02:00
Paolo Bonzini	f57a4dd311	kvm: i386: require KVM_CAP_DEBUGREGS This was introduced in KVM in Linux 2.6.35, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	2cb81af0b1	kvm: unify listeners for PIO address space Since we now assume that ioeventfds are present, kvm_io_listener is always registered. Merge it with kvm_coalesced_pio_listener in a single listener. Since PIO space does not have KVM memslots attached to it, the priority is irrelevant. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	126e7f7803	kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016. Assume that it is present. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	5d9ec1f4c7	kvm: assume that many ioeventfds can be created NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had increased to 1000 and later ioeventfds were changed to not count against the limit. But the earlier limit of 200 would already be enough for kvm_check_many_ioeventfds() to be true, so remove the check. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	d19fe67ba8	kvm: drop reference to KVM_CAP_PCI_2_3 This is a remnant of pre-VFIO device assignment; it is not defined anymore by Linux and not used by QEMU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	a788260b20	kvm: require KVM_IRQFD for kernel irqchip KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been available on architectures that support an in-kernel interrupt controller. We can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:15 +02:00
Paolo Bonzini	cc5e719e2c	kvm: require KVM_CAP_SIGNAL_MSI This was introduced in KVM in Linux 3.5, we can require it unconditionally in kvm_irqchip_send_msi(). However, not all architectures have to implement it so check it only in x86, the only architecture that ever had MSI injection but not KVM_CAP_SIGNAL_MSI. ARM uses it to detect the presence of the ITS emulation in the kernel, introduced in Linux 4.8. Assume that it's there and possibly fail when realizing the arm-its-kvm device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:14 +02:00
Paolo Bonzini	aacec9aee1	kvm: require KVM_CAP_INTERNAL_ERROR_DATA This was introduced in KVM in Linux 2.6.33, we can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-10-25 17:35:14 +02:00
David Hildenbrand	16ab2eda57	kvm: Add stub for kvm_get_max_memslots() We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Message-ID: <20230926185738.277351-11-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2023-10-12 14:15:22 +02:00
David Hildenbrand	5b23186a95	kvm: Return number of free memslots Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>	2023-10-12 14:15:22 +02:00
Akihiko Odaki	7191f24c7f	accel/kvm/kvm-all: Handle register access errors A register access error typically means something seriously wrong happened so that anything bad can happen after that and recovery is impossible. Even failing one register access is catastorophic as architecture-specific code are not written so that it torelates such failures. Make sure the VM stop and nothing worse happens if such an error occurs. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-09-29 09:33:10 +02:00
Shameer Kolothum	c8f2eb5d41	arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Now that we have Eager Page Split support added for ARM in the kernel, enable it in Qemu. This adds, -eager-split-size to -accel sub-options to set the eager page split chunk size. -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-09-08 16:41:36 +01:00
Anton Johansson	b67be03e3a	accel/kvm: Widen pc/saved_insn for kvm_sw_breakpoint Widens the pc and saved_insn fields of kvm_sw_breakpoint from target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also widened to match. Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230807155706.9580-2-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2023-08-24 11:21:22 -07:00
Akihiko Odaki	43a5e377f4	accel/kvm: Make kvm_dirty_ring_reaper_init() void The returned value was always zero and had no meaning. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:04 +01:00
Akihiko Odaki	4625742cd2	accel/kvm: Free as when an error occurred An error may occur after s->as is allocated, for example if the KVM_CREATE_VM ioctl call fails. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-08-22 17:31:04 +01:00
Akihiko Odaki	bc3e41a0e8	accel/kvm: Use negative KVM type for error propagation On MIPS, kvm_arch_get_default_type() returns a negative value when an error occurred so handle the case. Also, let other machines return negative values when errors occur and declare returning a negative value as the correct way to propagate an error that happened when determining KVM type. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:03 +01:00
Akihiko Odaki	5e0d65909c	kvm: Introduce kvm_arch_get_default_type hook kvm_arch_get_default_type() returns the default KVM type. This hook is particularly useful to derive a KVM type that is valid for "none" machine model, which is used by libvirt to probe the availability of KVM. For MIPS, the existing mips_kvm_type() is reused. This function ensures the availability of VZ which is mandatory to use KVM on the current QEMU. Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: added doc comment for new function] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-08-22 17:31:02 +01:00
Gavin Shan	fe6bda58e0	kvm: Fix crash due to access uninitialized kvm_state Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing uninitialized @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit `176d073029` ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added earlier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the uninitialized @kvm_state. With this applied, no crash is observed on arm64. Fixes: `176d073029` ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230731125946.2038742-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-07-31 14:19:43 +01:00
Isaku Yamahata	14a868c626	exec/memory: Add symbol for the min value of memory listener priority Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Isaku Yamahata	8be0461d37	exec/memory: Add symbol for memory listener priority for device backend Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value for memory listener to replace the hard-coded value 10 for the device backend. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Isaku Yamahata	5369a36c4f	exec/memory: Add symbolic value for memory listener priority for accel Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory listener to replace the hard-coded value 10 for accel. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2023-06-28 14:27:59 +02:00
Marcelo Tosatti	3b6f485275	kvm: reuse per-vcpu stats fd to avoid vcpu interruption A regression has been detected in latency testing of KVM guests. More specifically, it was observed that the cyclictest numbers inside of an isolated vcpu (running on isolated pcpu) are: Where a maximum of 50us is acceptable. The implementation of KVM_GET_STATS_FD uses run_on_cpu to query per vcpu statistics, which interrupts the vcpu (and is unnecessary). To fix this, open the per vcpu stats fd on vcpu initialization, and read from that fd from QEMU's main thread. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-06-26 10:23:01 +02:00
Gavin Shan	856e23a0fb	kvm: Enable dirty ring for arm64 arm64 has different capability from x86 to enable the dirty ring, which is KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Besides, arm64 also needs the backup bitmap extension (KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP) when 'kvm-arm-gicv3' or 'arm-its-kvm' device is enabled. Here the extension is always enabled and the unnecessary overhead to do the last stage of dirty log synchronization when those two devices aren't used is introduced, but the overhead should be very small and acceptable. The benefit is cover future cases where those two devices are used without modifying the code. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230509022122.20888-5-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:51 +02:00
Gavin Shan	3794cb9485	kvm: Add helper kvm_dirty_ring_init() Due to multiple capabilities associated with the dirty ring for different architectures: KVM_CAP_DIRTY_{LOG_RING, LOG_RING_ACQ_REL} for x86 and arm64 separately. There will be more to be done in order to support the dirty ring for arm64. Lets add helper kvm_dirty_ring_init() to enable the dirty ring. With this, the code looks a bit clean. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-4-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Gavin Shan	b20cc77692	kvm: Synchronize the backup bitmap in the last stage In the last stage of live migration or memory slot removal, the backup bitmap needs to be synchronized when it has been enabled. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-3-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Gavin Shan	1e493be587	migration: Add last stage indicator to global dirty log The global dirty log synchronization is used when KVM and dirty ring are enabled. There is a particularity for ARM64 where the backup bitmap is used to track dirty pages in non-running-vcpu situations. It means the dirty ring works with the combination of ring buffer and backup bitmap. The dirty bits in the backup bitmap needs to collected in the last stage of live migration. In order to identify the last stage of live migration and pass it down, an extra parameter is added to the relevant functions and callbacks. This last stage indicator isn't used until the dirty ring is enabled in the subsequent patches. No functional change intended. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Message-Id: <20230509022122.20888-2-gshan@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-05-18 08:53:50 +02:00
Peter Xu	56adee407f	kvm: dirty-ring: Fix race with vcpu creation It's possible that we want to reap a dirty ring on a vcpu that is during creation, because the vcpu is put onto list (CPU_FOREACH visible) before initialization of the structures. In this case: qemu_init_vcpu x86_cpu_realizefn cpu_exec_realizefn cpu_list_add <---- can be probed by CPU_FOREACH qemu_init_vcpu cpus_accel->create_vcpu_thread(cpu); kvm_init_vcpu map kvm_dirty_gfns <--- kvm_dirty_gfns valid Don't try to reap dirty ring on vcpus during creation or it'll crash. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2124756 Reported-by: Xiaohui Li <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1d14deb6684bcb7de1c9633c5bd21113988cc698.1676563222.git.huangy81@chinatelecom.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-04-04 18:46:46 +02:00
David Woodhouse	e16aff4cc2	kvm/i386: Add xen-evtchn-max-pirq property The default number of PIRQs is set to 256 to avoid issues with 32-bit MSI devices. Allow it to be increased if the user desires. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:09:22 +00:00
David Woodhouse	6f43f2ee49	kvm/i386: Add xen-gnttab-max-frames property Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 09:07:52 +00:00
David Woodhouse	61491cf441	i386/kvm: Add xen-version KVM accelerator property and init KVM Xen support This just initializes the basic Xen support in KVM for now. Only permitted on TYPE_PC_MACHINE because that's where the sysbus devices for Xen heap overlay, event channel, grant tables and other stuff will exist. There's no point having the basic hypercall support if nothing else works. Provide sysemu/kvm_xen.h and a kvm_xen_get_caps() which will be used later by support devices. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>	2023-03-01 08:22:49 +00:00
Philippe Mathieu-Daudé	2459d4209f	accel/kvm: Silent -Wmissing-field-initializers warning Silent when compiling with -Wextra: ../accel/kvm/kvm-all.c:2291:17: warning: missing field 'num' initializer [-Wmissing-field-initializers] { NULL, } ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20221220143532.24958-3-philmd@linaro.org>	2023-02-27 22:29:01 +01:00
Philippe Mathieu-Daudé	55b5b8e928	gdbstub: Use vaddr type for generic insert/remove_breakpoint() API Both insert/remove_breakpoint() handlers are used in system and user emulation. We can not use the 'hwaddr' type on user emulation, we have to use 'vaddr' which is defined as "wide enough to contain any #target_ulong virtual address". gdbstub.c doesn't require to include "exec/hwaddr.h" anymore. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221216215519.5522-4-philmd@linaro.org>	2023-02-27 22:29:01 +01:00
Markus Armbruster	aa09b3d5f8	stats: Move QMP commands from monitor/ to stats/ This moves these commands from MAINTAINERS section "QMP" to new section "Stats". Status is Orphan. Volunteers welcome! Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230124121946.1139465-23-armbru@redhat.com>	2023-02-04 07:56:54 +01:00
David Hildenbrand	f39b7d2b96	kvm: Atomic memslot updates If we update an existing memslot (e.g., resize, split), we temporarily remove the memslot to re-add it immediately afterwards. These updates are not atomic, especially not for KVM VCPU threads, such that we can get spurious faults. Let's inhibit most KVM ioctls while performing relevant updates, such that we can perform the update just as if it would happen atomically without additional kernel support. We capture the add/del changes and apply them in the notifier commit stage instead. There, we can check for overlaps and perform the ioctl inhibiting only if really required (-> overlap). To keep things simple we don't perform additional checks that wouldn't actually result in an overlap -- such as !RAM memory regions in some cases (see kvm_set_phys_mem()). To minimize cache-line bouncing, use a separate indicator (in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock while performing both actions (removing+re-adding). We have to wait until all IOCTLs were exited and block new ones from getting executed. This approach cannot result in a deadlock as long as the inhibitor does not hold any locks that might hinder an IOCTL from getting finished and exited - something fairly unusual. The inhibitor will always hold the BQL. AFAIKs, one possible candidate would be userfaultfd. If a page cannot be placed (e.g., during postcopy), because we're waiting for a lock, or if the userfaultfd thread cannot process a fault, because it is waiting for a lock, there could be a deadlock. However, the BQL is not applicable here, because any other guest memory access while holding the BQL would already result in a deadlock. Nothing else in the kernel should block forever and wait for userspace intervention. Note: pause_all_vcpus()/resume_all_vcpus() or start_exclusive()/end_exclusive() cannot be used, as they either drop the BQL or require to be called without the BQL - something inhibitors cannot handle. We need a low-level locking mechanism that is deadlock-free even when not releasing the BQL. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-01-11 09:59:39 +01:00
Emanuele Giuseppe Esposito	a27dd2de68	KVM: keep track of running ioctls Using the new accel-blocker API, mark where ioctls are being called in KVM. Next, we will implement the critical section that will take care of performing memslots modifications atomically, therefore preventing any new ioctl from running and allowing the running ones to finish. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-3-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2023-01-11 09:59:39 +01:00
Markus Armbruster	d1c81c3496	qapi: Use returned bool to check for failure (again) Commit `012d4c96e2` changed the visitor functions taking Error ** to return bool instead of void, and the commits following it used the new return value to simplify error checking. Since then a few more uses in need of the same treatment crept in. Do that. All pretty mechanical except for * balloon_stats_get_all() This is basically the same transformation commit `012d4c96e2` applied to the virtual walk example in include/qapi/visitor.h. * set_max_queue_size() Additionally replace "goto end of function" by return. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221121085054.683122-10-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2022-12-14 16:19:35 +01:00
Chenyi Qiang	e2e69f6bb9	i386: add notify VM exit support There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. The format of the argument when enabling this capability is as follows: Bit 63:32 - notify window specified in qemu command Bit 31:0 - some flags (e.g. KVM_X86_NOTIFY_VMEXIT_ENABLED is set to enable the feature.) Users can configure the feature by a new (x86 only) accel property: qemu -accel kvm,notify-vmexit=run\|internal-error\|disable,notify-window=n The default option of notify-vmexit is run, which will enable the capability and do nothing if the exit happens. The internal-error option raises a KVM internal error if it happens. The disable option does not enable the capability. The default value of notify-window is 0. It is valid only when notify-vmexit is not disabled. The valid range of notify-window is non-negative. It is even safe to set it to zero since there's an internal hardware threshold to be added to ensure no false positive. Because a notify VM exit may happen with VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated that would set this bit), which means VM context is corrupted. It would be reflected in the flags of KVM_EXIT_NOTIFY exit. If KVM_NOTIFY_CONTEXT_INVALID bit is set, raise a KVM internal error unconditionally. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-5-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-11 09:36:00 +02:00
Chenyi Qiang	5f8a6bce1f	kvm: expose struct KVMState Expose struct KVMState out of kvm-all.c so that the field of struct KVMState can be accessed when defining target-specific accelerator properties. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220929072014.20705-4-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-11 09:36:00 +02:00
Paolo Bonzini	3dba0a335c	kvm: allow target-specific accelerator properties Several hypervisor capabilities in KVM are target-specific. When exposed to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they should not be available for all targets. Add a hook for targets to add their own properties to -accel kvm, for now no such property is defined. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220929072014.20705-3-chenyi.qiang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-10-10 09:23:16 +02:00
Alex Bennée	c7f1c53735	accel/kvm: move kvm_update_guest_debug to inline stub Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20220929114231.583801-47-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	a48e7d9e52	gdbstub: move guest debug support check to ops This removes the final hard coding of kvm_enabled() in gdbstub and moves the check to an AccelOps. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-46-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	ae7467b1ac	gdbstub: move breakpoint logic to accel ops As HW virtualization requires specific support to handle breakpoints lets push out special casing out of the core gdbstub code and into AccelOpsClass. This will make it easier to add other accelerator support and reduces some of the stub shenanigans. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-45-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Alex Bennée	3b7a93880a	gdbstub: move sstep flags probing into AccelClass The support of single-stepping is very much dependent on support from the accelerator we are using. To avoid special casing in gdbstub move the probing out to an AccelClass function so future accelerators can put their code there. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Mads Ynddal <mads@ynddal.dk> Message-Id: <20220929114231.583801-44-alex.bennee@linaro.org>	2022-10-06 11:53:41 +01:00
Paolo Bonzini	21adec30f6	kvm: fix memory leak on failure to read stats descriptors Reported by Coverity as CID 1490142. Since the size is constant and the lifetime is the same as the StatsDescriptors struct, embed the struct directly instead of using a separate allocation. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-18 09:17:40 +02:00
Paolo Bonzini	52281c6d11	KVM: use store-release to mark dirty pages as harvested The following scenario can happen if QEMU sets more RESET flags while the KVM_RESET_DIRTY_RINGS ioctl is ongoing on another host CPU: CPU0 CPU1 CPU2 ------------------------ ------------------ ------------------------ fill gfn0 store-rel flags for gfn0 fill gfn1 store-rel flags for gfn1 load-acq flags for gfn0 set RESET for gfn0 load-acq flags for gfn1 set RESET for gfn1 do ioctl! -----------> ioctl(RESET_RINGS) fill gfn2 store-rel flags for gfn2 load-acq flags for gfn2 set RESET for gfn2 process gfn0 process gfn1 process gfn2 do ioctl! etc. The three load-acquire in CPU0 synchronize with the three store-release in CPU2, but CPU0 and CPU1 are only synchronized up to gfn1 and CPU1 may miss gfn2's fields other than flags. The kernel must be able to cope with invalid values of the fields, and userspace will invoke the ioctl once more. However, once the RESET flag is cleared on gfn2, it is lost forever, therefore in the above scenario CPU1 must read the correct value of gfn2's fields. Therefore RESET must be set with a store-release, that will synchronize with KVM's load-acquire in CPU1. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-18 09:17:40 +02:00
Paolo Bonzini	4802bf910e	KVM: dirty ring: add missing memory barrier The KVM_DIRTY_GFN_F_DIRTY flag ensures that the entry is valid. If the read of the fields are not ordered after the read of the flag, QEMU might see stale values. Cc: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-09-01 08:37:04 +02:00
Paolo Bonzini	a9197ad210	kvm: fix segfault with query-stats-schemas and -M none -M none creates a guest without a vCPU, causing the following error: $ ./qemu-system-x86_64 -qmp stdio -M none -accel kvm {execute:qmp_capabilities} {"return": {}} {execute: query-stats-schemas} Segmentation fault (core dumped) Fix it by not querying the vCPU stats if first_cpu is NULL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-08-18 14:08:24 +02:00
Cornelia Huck	47c182fe8b	kvm: don't use perror() without useful errno perror() is designed to append the decoded errno value to a string. This, however, only makes sense if we called something that actually sets errno prior to that. For the callers that check for split irqchip support that is not the case, and we end up with confusing error messages that end in "success". Use error_report() instead. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20220728142446.438177-1-cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-29 00:15:02 +02:00
Peter Maydell	d12dd9c7ee	accel/kvm: Avoid Coverity warning in query_stats() Coverity complains that there is a codepath in the query_stats() function where it can leak the memory pointed to by stats_list. This can only happen if the caller passes something other than STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no callsite does. Enforce this assumption using g_assert_not_reached(), so that if we have a future bug we hit the assert rather than silently leaking memory. Resolves: Coverity CID 1490140 Fixes: `cc01a3f4ca` ("kvm: Support for querying fd-based stats") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2022-07-22 19:01:44 +02:00

1 2 3 4

178 Commits