There are distinct helpers for creating and parking a KVM vCPU.
However, there can be cases where a platform needs to create and
immediately park the vCPU during early stages of vcpu init which
can later be reused when vcpu thread gets initialized. This would
help detect failures with kvm_create_vcpu at an early stage.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
* hpet: emulation improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
=00mB
-----END PGP SIGNATURE-----
Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging
* target/i386/kvm: support for reading RAPL MSRs using a helper program
* hpet: emulation improvements
# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN
# qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE
# Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk
# jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz
# jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio
# 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ==
# =00mB
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 23 Jul 2024 03:19:58 AM AEST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
hpet: avoid timer storms on periodic timers
hpet: store full 64-bit target value of the counter
hpet: accept 64-bit reads and writes
hpet: place read-only bits directly in "new_val"
hpet: remove unnecessary variable "index"
hpet: ignore high bits of comparator in 32-bit mode
hpet: fix and cleanup persistence of interrupt status
Add support for RAPL MSRs in KVM/Qemu
tools: build qemu-vmsr-helper
qio: add support for SO_PEERCRED for socket channel
target/i386: do not crash if microvm guest uses SGX CPUID leaves
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread
is spawned. This is common to all the architectures as of now.
Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the
corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't
support vCPU removal. Therefore, its representative KVM vCPU object/context in
Qemu is parked.
Refactor architecture common logic so that some APIs could be reused by vCPU
Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs
with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally
called. No functional change is intended here.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL
interface (Running Average Power Limit) for advertising the accumulated
energy consumption of various power domains (e.g. CPU packages, DRAM,
etc.).
The consumption is reported via MSRs (model specific registers) like
MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are
64 bits registers that represent the accumulated energy consumption in
micro Joules. They are updated by microcode every ~1ms.
For now, KVM always returns 0 when the guest requests the value of
these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle
these MSRs dynamically in userspace.
To limit the amount of system calls for every MSR call, create a new
thread in QEMU that updates the "virtual" MSR values asynchronously.
Each vCPU has its own vMSR to reflect the independence of vCPUs. The
thread updates the vMSR values with the ratio of energy consumed of
the whole physical CPU package the vCPU thread runs on and the
thread's utime and stime values.
All other non-vCPU threads are also taken into account. Their energy
consumption is evenly distributed among all vCPUs threads running on
the same physical CPU package.
To overcome the problem that reading the RAPL MSR requires priviliged
access, a socket communication between QEMU and the qemu-vmsr-helper is
mandatory. You can specified the socket path in the parameter.
This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock
Actual limitation:
- Works only on Intel host CPU because AMD CPUs are using different MSR
adresses.
- Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at
the moment.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This is an experiment to further reduce the amount we throw into the
exec headers. It might not be as useful as I initially thought because
just under half of the users also need gdbserver_start().
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240620152220.2192768-3-alex.bennee@linaro.org>
Since the kvm_dirty_ring_enabled function accesses a null kvm_state
pointer when the KVM acceleration parameter is not specified, running
calc_dirty_rate with the -r or -b option causes a segmentation fault.
Signed-off-by: Masato Imai <mii@sfc.wide.ad.jp>
Message-ID: <20240507025010.1968881-1-mii@sfc.wide.ad.jp>
[Assert kvm_state when kvm_dirty_ring_enabled was called to fix it. - Hyman]
Signed-off-by: Hyman Huang <yong.huang@smartx.com>
In kvm-all.c, two lines have been accidentally indented with
hard-coded tabs rather than spaces. Normalise to match the rest
of the file.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240531170952.505323-1-peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Let hw/hyperv/hyperv.c and hw/intc/s390_flic.c handle (respectively)
SynIC and adapter routes, removing the code from target-independent
files. This also removes the only occurrence of AdapterInfo outside
s390 code, so remove that from typedefs.h.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
TDX requires vMMIO region to be shared. For KVM, MMIO region is the region
which kvm memslot isn't assigned to (except in-kernel emulation).
qemu has the memory region for vMMIO at each device level.
While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO
region, qemu doesn't find corresponding vMMIO region because it's before
PCI device allocation and memory_region_find() finds the device region, not
PCI bus region. It's safe to ignore MapGPA(to-shared) because when guest
accesses those region they use GPA with shared bit set for vMMIO. Ignore
memory conversion request of non-assigned region to shared and return
success. Otherwise OVMF is confused and panics there.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-35-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because vMMIO region needs to be shared region, guest TD may explicitly
convert such region from private to shared. Don't complain such
conversion.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-34-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Upon an KVM_EXIT_MEMORY_FAULT exit, userspace needs to do the memory
conversion on the RAMBlock to turn the memory into desired attribute,
switching between private and shared.
Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when
KVM_EXIT_MEMORY_FAULT happens.
Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has
guest_memfd memory backend.
Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is
added.
When page is converted from shared to private, the original shared
memory can be discarded via ram_block_discard_range(). Note, shared
memory can be discarded only when it's not back'ed by hugetlb because
hugetlb is supposed to be pre-allocated and no need for discarding.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-13-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM side leaves the memory to shared by default, which may incur the
overhead of paging conversion on the first visit of each page. Because
the expectation is that page is likely to private for the VMs that
require private memory (has guest memfd).
Explicitly set the memory to private when memory region has valid
guest memfd backend.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-ID: <20240320083945.991426-16-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM.
With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that
backend'ed both by hva-based shared memory and guest memfd based private
memory.
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-10-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add KVM guest_memfd support to RAMBlock so both normal hva based memory
and kvm guest memfd based private memory can be associated in one RAMBlock.
Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to
create private guest_memfd during RAMBlock setup.
Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd
is more flexible and extensible than simply relying on the VM type because
in the future we may have the case that not all the memory of a VM need
guest memfd. As a benefit, it also avoid getting MachineState in memory
subsystem.
Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of
confidential guests, such as TDX VM. How and when to set it for memory
backends will be implemented in the following patches.
Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has
KVM guest_memfd allocated.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20240320083945.991426-7-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce the helper functions to set the attributes of a range of
memory to private or shared.
This is necessary to notify KVM the private/shared attribute of each gpa
range. KVM needs the information to decide the GPA needs to be mapped at
hva-based shared memory or guest_memfd based private memory.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240320083945.991426-11-michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The upper 16 bits of kvm_userspace_memory_region::slot are
address space id. Parse it separately in trace_kvm_set_user_memory().
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Board reset requires writing a fresh CPU state. As far as KVM is
concerned, the only thing that blocks reset is that CPU state is
encrypted; therefore, kvm_cpus_are_resettable() can simply check
if that is the case.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the
guest state is encrypted, in which case they do nothing. For the new
API using VM types, instead, the ioctls will fail which is a safer and
more robust approach.
The new API will be the only one available for SEV-SNP and TDX, but it
is also usable for SEV and SEV-ES. In preparation for that, require
architecture-specific KVM code to communicate the point at which guest
state is protected (which must be after kvm_cpu_synchronize_post_init(),
though that might change in the future in order to suppor migration).
From that point, skip reading registers so that cpu->vcpu_dirty is
never true: if it ever becomes true, kvm_arch_put_registers() will
fail miserably.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If an architecture adds support for KVM_CAP_SET_GUEST_DEBUG but QEMU does not
have the necessary code, QEMU will fail to build after updating kernel headers.
Avoid this by using a #define in config-target.h instead of KVM_CAP_SET_GUEST_DEBUG.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
subj is calling kvm_add_routing_entry() which simply extends
KVMState::irq_routes::entries[]
but doesn't check if number of routes goes beyond limit the kernel
is willing to accept. Which later leads toi the assert
qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed
typically it happens during guest boot for large enough guest
Reproduced with:
./qemu --enable-kvm -m 8G -smp 64 -machine pc \
`for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b ";
for i in {0..31}; do touch /tmp/vblk$b$i;
echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw
-device virtio-blk-pci,drive=drive$b$i,bus=pci$b ";
done; done`
While crash at boot time is bad, the same might happen at hotplug time
which is unacceptable.
So instead calling kvm_add_routing_entry() unconditionally, check first
that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio
device insteads killin qemu, will gracefully fail to initialize device
as expected with following warnings on console:
virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set.
virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-ID: <20240408110956.451558-1-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A memory page poisoned from the hypervisor level is no longer readable.
The migration of a VM will crash Qemu when it tries to read the
memory address space and stumbles on the poisoned page with a similar
stack trace:
Program terminated with signal SIGBUS, Bus error.
#0 _mm256_loadu_si256
#1 buffer_zero_avx2
#2 select_accel_fn
#3 buffer_is_zero
#4 save_zero_page
#5 ram_save_target_page_legacy
#6 ram_save_host_page
#7 ram_find_and_save_block
#8 ram_save_iterate
#9 qemu_savevm_state_iterate
#10 migration_iteration_run
#11 migration_thread
#12 qemu_thread_start
To avoid this VM crash during the migration, prevent the migration
when a known hardware poison exists on the VM.
Signed-off-by: William Roche <william.roche@oracle.com>
Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
This allows passing the KVM device node to use as a file
descriptor via /dev/fdset/XX. Passing the device node to
use as a file descriptor allows running qemu unprivileged
even when the user running qemu is not in the kvm group
on distributions where access to /dev/kvm is gated behind
membership of the kvm group (as long as the process invoking
qemu is able to open /dev/kvm and passes the file descriptor
to qemu).
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Message-ID: <20231021134015.1119597-1-daan.j.demeyer@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().
The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.
The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)
There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Patch removes DPRINTF macro and adds multiple tracepoints
to capture different kvm events.
We also drop the DPRINTFs that don't add any additional
information than trace_kvm_run_exit already does.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827
Signed-off-by: Jai Arora <arorajai2798@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This variable is not used or declared outside kvm-all.c.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since we now assume that ioeventfds are present, kvm_io_listener is always
registered. Merge it with kvm_coalesced_pio_listener in a single
listener. Since PIO space does not have KVM memslots attached to it,
the priority is irrelevant.
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had
increased to 1000 and later ioeventfds were changed to not count against
the limit. But the earlier limit of 200 would already be enough for
kvm_check_many_ioeventfds() to be true, so remove the check.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This is a remnant of pre-VFIO device assignment; it is not defined
anymore by Linux and not used by QEMU.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been
available on architectures that support an in-kernel interrupt controller.
We can require it unconditionally.
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was introduced in KVM in Linux 3.5, we can require it unconditionally
in kvm_irqchip_send_msi(). However, not all architectures have to implement
it so check it only in x86, the only architecture that ever had MSI injection
but not KVM_CAP_SIGNAL_MSI.
ARM uses it to detect the presence of the ITS emulation in the kernel,
introduced in Linux 4.8. Assume that it's there and possibly fail when
realizing the arm-its-kvm device.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This was introduced in KVM in Linux 2.6.33, we can require it unconditionally.
Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We'll need the stub soon from memory device context.
While at it, use "unsigned int" as return value and place the
declaration next to kvm_get_free_memslots().
Message-ID: <20230926185738.277351-11-david@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Let's return the number of free slots instead of only checking if there
is a free slot. While at it, check all address spaces, which will also
consider SMM under x86 correctly.
This is a preparation for memory devices that consume multiple memslots.
Message-ID: <20230926185738.277351-5-david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
A register access error typically means something seriously wrong
happened so that anything bad can happen after that and recovery is
impossible.
Even failing one register access is catastorophic as
architecture-specific code are not written so that it torelates such
failures.
Make sure the VM stop and nothing worse happens if such an error occurs.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that we have Eager Page Split support added for ARM in the kernel,
enable it in Qemu. This adds,
-eager-split-size to -accel sub-options to set the eager page split chunk size.
-enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE.
The chunk size specifies how many pages to break at a time, using a
single allocation. Bigger the chunk size, more pages need to be
allocated ahead of time.
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Widens the pc and saved_insn fields of kvm_sw_breakpoint from
target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also
widened to match.
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230807155706.9580-2-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The returned value was always zero and had no meaning.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
An error may occur after s->as is allocated, for example if the
KVM_CREATE_VM ioctl call fails.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On MIPS, kvm_arch_get_default_type() returns a negative value when an
error occurred so handle the case. Also, let other machines return
negative values when errors occur and declare returning a negative
value as the correct way to propagate an error that happened when
determining KVM type.
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
kvm_arch_get_default_type() returns the default KVM type. This hook is
particularly useful to derive a KVM type that is valid for "none"
machine model, which is used by libvirt to probe the availability of
KVM.
For MIPS, the existing mips_kvm_type() is reused. This function ensures
the availability of VZ which is mandatory to use KVM on the current
QEMU.
Cc: qemu-stable@nongnu.org
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added doc comment for new function]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Runs into core dump on arm64 and the backtrace extracted from the
core dump is shown as below. It's caused by accessing uninitialized
@kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d073029
("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's
memory region is added earlier than before.
main
qemu_init
configure_accelerators
qemu_opts_foreach
do_configure_accelerator
accel_init_machine
kvm_init
virt_kvm_type
virt_set_memmap
machine_memory_devices_init
memory_region_add_subregion
memory_region_add_subregion_common
memory_region_update_container_subregions
memory_region_transaction_begin
qemu_flush_coalesced_mmio_buffer
kvm_flush_coalesced_mmio_buffer
Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the
uninitialized @kvm_state. With this applied, no crash is observed on
arm64.
Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()")
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20230731125946.2038742-1-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of
the memory listener instead of the hard-coded magic value 0. Add explicit
initialization.
No functional change intended.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add MEMORY_LISTENER_PRIORITY_DEV_BACKEND for the symbolic value
for memory listener to replace the hard-coded value 10 for the
device backend.
No functional change intended.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <8314d91688030d7004e96958f12e2c83fb889245.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Add MEMORY_LISTNER_PRIORITY_ACCEL for the symbolic value for the memory
listener to replace the hard-coded value 10 for accel.
No functional change intended.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <feebe423becc6e2aa375f59f6abce9a85bc15abb.1687279702.git.isaku.yamahata@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
A regression has been detected in latency testing of KVM guests.
More specifically, it was observed that the cyclictest
numbers inside of an isolated vcpu (running on isolated pcpu) are:
Where a maximum of 50us is acceptable.
The implementation of KVM_GET_STATS_FD uses run_on_cpu to query
per vcpu statistics, which interrupts the vcpu (and is unnecessary).
To fix this, open the per vcpu stats fd on vcpu initialization,
and read from that fd from QEMU's main thread.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>