qemu/include/sysemu
Peter Xu 5504a81261 KVM: Dynamic sized kvm memslots array
Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.

Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
statically allocated to be the max supported by the kernel.  However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.

What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups.  It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.

Fix this issue by making the memslots to be allocated dynamically.

Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.

There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots.  For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used.  So it could
be good enough as of now.

In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
to any KVM enabled VM even without VFIO.

NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value.  However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-17 19:41:30 +02:00
..
accel-blocker.h bulk: Do not declare function prototypes using 'extern' keyword 2023-08-31 19:47:43 +02:00
accel-ops.h sysemu: add set_virtual_time to accel ops 2024-06-24 10:14:34 +01:00
arch_init.h system: Remove support for CRIS target 2024-09-13 20:11:13 +02:00
balloon.h qapi: Restrict balloon-related commands to machine code 2020-09-29 15:41:35 +02:00
block-backend-common.h block: drain from main loop thread in bdrv_co_yield_to_drain() 2023-05-30 17:32:02 +02:00
block-backend-global-state.h block-backend: Remove deadcode 2024-10-03 17:26:06 +03:00
block-backend-io.h util/defer-call: move defer_call() to util/ 2023-10-31 15:41:42 +01:00
block-backend.h include/sysemu/block-backend: split header into I/O and global state (GS) API 2022-03-04 18:18:25 +01:00
block-ram-registrar.h block: add BlockRAMRegistrar 2022-10-26 14:56:42 -04:00
blockdev.h include/sysemu/blockdev.h: global state API 2022-03-04 18:18:25 +01:00
cpu-throttle.h cpu-throttle: new module, extracted from cpus.c 2020-07-10 18:04:49 -04:00
cpu-timers-internal.h system: Rename softmmu/ directory as system/ 2023-10-08 21:08:08 +02:00
cpu-timers.h sysemu: add set_virtual_time to accel ops 2024-06-24 10:14:34 +01:00
cpus.h cpus: Remove unused smp_cores/smp_threads declarations 2023-10-12 00:37:39 +03:00
cryptodev-vhost-user.h cryptodev: Fix Lesser GPL version number 2020-10-27 16:48:49 +01:00
cryptodev-vhost.h include/: spelling fixes 2023-09-08 13:08:52 +03:00
cryptodev.h qapi/cryptodev: Rename QCryptodevBackendAlgType to *Algo, and drop prefix 2024-09-10 14:03:30 +02:00
device_tree.h kconfig: allow compiling out QEMU device tree code per target 2024-05-10 15:45:15 +02:00
dirtylimit.h migration: Extend query-migrate to provide dirty page limit info 2023-07-26 10:55:56 +02:00
dirtyrate.h include: Include headers where needed 2023-01-08 01:54:22 -05:00
dma.h dma: Fix function names in documentation 2024-10-15 15:16:17 +01:00
dump-arch.h dump: Add arch cleanup function 2023-11-14 10:42:32 +01:00
dump.h dump: Allow directly outputting raw kdump format 2023-11-02 18:05:02 +04:00
event-loop-base.h Don't include headers already included by qemu/osdep.h 2023-02-08 07:28:05 +01:00
host_iommu_device.h vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps 2024-07-23 17:14:52 +02:00
hostmem.h backends/hostmem: Report error when memory size is unaligned 2024-06-08 10:33:38 +02:00
hvf_int.h hvf: Split up hv_vm_create logic per arch 2024-09-13 15:31:46 +01:00
hvf.h exec: Rename NEED_CPU_H -> COMPILING_PER_TARGET 2024-04-26 09:49:51 +02:00
hw_accel.h accel: Remove HAX accelerator 2023-08-31 19:46:43 +02:00
iommufd.h vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support 2024-07-23 17:14:52 +02:00
iothread.h include/: spelling fixes 2023-09-08 13:08:52 +03:00
kvm_int.h KVM: Dynamic sized kvm memslots array 2024-10-17 19:41:30 +02:00
kvm_xen.h hw/xen: select kernel mode for per-vCPU event channel upcall vector 2023-11-06 10:03:45 +00:00
kvm.h kvm: Allow kvm_arch_get/put_registers to accept Error** 2024-10-03 22:04:19 +02:00
memory_mapping.h memory: follow Error API guidelines 2023-10-19 23:13:27 +02:00
numa.h numa: remove types from typedefs.h 2024-05-03 15:47:48 +02:00
nvmm.h exec: Rename NEED_CPU_H -> COMPILING_PER_TARGET 2024-04-26 09:49:51 +02:00
os-posix.h qemu_init: increase NOFILE soft limit on POSIX 2024-02-09 12:47:58 +00:00
os-win32.h qemu_init: increase NOFILE soft limit on POSIX 2024-02-09 12:47:58 +00:00
qtest.h qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.c 2024-06-24 10:14:56 +01:00
replay.h replay: Remove unused replay_disable_events 2024-10-03 17:26:06 +03:00
reset.h reset: Use ResetType for qemu_devices_reset() and MachineClass::reset() 2024-09-24 11:33:34 +02:00
rng-random.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
rng.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
rtc.h rtc: Use time_t for passing and returning time offsets 2023-08-31 09:45:18 +01:00
runstate-action.h system: Rename softmmu/ directory as system/ 2023-10-08 21:08:08 +02:00
runstate.h replay: allow runstate shutdown->running when replaying trace 2024-08-16 14:04:19 +01:00
seccomp.h sandbox: disable -sandbox if CONFIG_SECCOMP undefined 2018-06-01 13:44:15 +02:00
spdm-socket.h backends: Initial support for SPDM socket support 2024-07-22 20:15:42 -04:00
stats.h include/: spelling fixes 2023-09-08 13:08:52 +03:00
sysemu.h vl.c: Remove pxa2xx-specific -portrait and -rotate options 2024-10-15 15:16:17 +01:00
tcg.h accel: Document generic accelerator headers 2023-06-28 13:55:35 +02:00
tpm_backend.h include/: spelling fixes 2023-09-08 13:08:52 +03:00
tpm_util.h tpm: Fix Lesser GPL version number 2020-11-15 16:44:18 +01:00
tpm.h sysemu/tpm: Clean up global variable shadowing 2023-10-06 13:27:48 +02:00
vhost-user-backend.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
watchdog.h watchdog: remove -watchdog option 2022-09-29 11:40:28 +02:00
whpx.h exec: Rename NEED_CPU_H -> COMPILING_PER_TARGET 2024-04-26 09:49:51 +02:00
xen-mapcache.h xen: mapcache: Pass the ram_addr offset to xen_map_cache() 2024-06-09 20:16:14 +02:00
xen.h xen: mapcache: Add support for grant mappings 2024-06-09 20:16:14 +02:00