qemu/include/sysemu
David Hildenbrand f39b7d2b96 kvm: Atomic memslot updates
If we update an existing memslot (e.g., resize, split), we temporarily
remove the memslot to re-add it immediately afterwards. These updates
are not atomic, especially not for KVM VCPU threads, such that we can
get spurious faults.

Let's inhibit most KVM ioctls while performing relevant updates, such
that we can perform the update just as if it would happen atomically
without additional kernel support.

We capture the add/del changes and apply them in the notifier commit
stage instead. There, we can check for overlaps and perform the ioctl
inhibiting only if really required (-> overlap).

To keep things simple we don't perform additional checks that wouldn't
actually result in an overlap -- such as !RAM memory regions in some
cases (see kvm_set_phys_mem()).

To minimize cache-line bouncing, use a separate indicator
(in_ioctl_lock) per CPU.  Also, make sure to hold the kvm_slots_lock
while performing both actions (removing+re-adding).

We have to wait until all IOCTLs were exited and block new ones from
getting executed.

This approach cannot result in a deadlock as long as the inhibitor does
not hold any locks that might hinder an IOCTL from getting finished and
exited - something fairly unusual. The inhibitor will always hold the BQL.

AFAIKs, one possible candidate would be userfaultfd. If a page cannot be
placed (e.g., during postcopy), because we're waiting for a lock, or if the
userfaultfd thread cannot process a fault, because it is waiting for a
lock, there could be a deadlock. However, the BQL is not applicable here,
because any other guest memory access while holding the BQL would already
result in a deadlock.

Nothing else in the kernel should block forever and wait for userspace
intervention.

Note: pause_all_vcpus()/resume_all_vcpus() or
start_exclusive()/end_exclusive() cannot be used, as they either drop
the BQL or require to be called without the BQL - something inhibitors
cannot handle. We need a low-level locking mechanism that is
deadlock-free even when not releasing the BQL.

Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221111154758.1372674-4-eesposit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 09:59:39 +01:00
..
accel-blocker.h accel: introduce accelerator blocker API 2023-01-11 09:59:39 +01:00
accel-ops.h gdbstub: move guest debug support check to ops 2022-10-06 11:53:41 +01:00
arch_init.h hw/loongarch: Add support loongson3 virt machine type. 2022-06-06 18:09:03 +00:00
balloon.h qapi: Restrict balloon-related commands to machine code 2020-09-29 15:41:35 +02:00
block-backend-common.h block-backend-common.h: split function pointers in BlockDevOps 2022-03-04 18:18:26 +01:00
block-backend-global-state.h block: return errors from bdrv_register_buf() 2022-10-26 14:56:42 -04:00
block-backend-io.h block: rename generated_co_wrapper in co_wrapper_mixed 2022-12-15 16:07:43 +01:00
block-backend.h include/sysemu/block-backend: split header into I/O and global state (GS) API 2022-03-04 18:18:25 +01:00
block-ram-registrar.h block: add BlockRAMRegistrar 2022-10-26 14:56:42 -04:00
blockdev.h include/sysemu/blockdev.h: global state API 2022-03-04 18:18:25 +01:00
cpu-throttle.h
cpu-timers.h replay: notify vCPU when BH is scheduled 2022-06-06 09:26:53 +02:00
cpus.h gdbstub: move breakpoint logic to accel ops 2022-10-06 11:53:41 +01:00
cryptodev-vhost-user.h cryptodev: Fix Lesser GPL version number 2020-10-27 16:48:49 +01:00
cryptodev-vhost.h cryptodev: Fix Lesser GPL version number 2020-10-27 16:48:49 +01:00
cryptodev.h cryptodev: Add a lkcf-backend for cryptodev 2022-11-02 06:56:32 -04:00
device_tree.h device-tree: add re-randomization helper function 2022-10-27 11:34:31 +01:00
dirtylimit.h softmmu/dirtylimit: Implement virtual CPU throttle 2022-07-20 12:15:08 +01:00
dirtyrate.h include: Include headers where needed 2023-01-08 01:54:22 -05:00
dma.h hw/dma: Let dma_buf_read() / dma_buf_write() propagate MemTxResult 2022-01-18 12:56:29 +01:00
dump-arch.h dump: Add architecture section and section string table support 2022-10-24 22:30:10 +04:00
dump.h include: Include headers where needed 2023-01-08 01:54:22 -05:00
event-loop-base.h util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
hax.h accel/hax: Introduce CONFIG_HAX_IS_POSSIBLE 2022-03-06 13:15:42 +01:00
hostmem.h hostmem: Allow for specifying a ThreadContext for preallocation 2022-10-27 11:01:03 +02:00
hvf_int.h arm/hvf: Add a WFI handler 2021-09-21 16:28:26 +01:00
hvf.h include/sysemu: Poison all accelerator CONFIG switches in common code 2021-05-14 12:31:44 +02:00
hw_accel.h accel: Introduce AccelOpsClass::cpus_are_resettable() 2022-03-06 13:15:42 +01:00
iothread.h Introduce event-loop-base abstract class 2022-05-09 10:43:23 +01:00
kvm_int.h kvm: Atomic memslot updates 2023-01-11 09:59:39 +01:00
kvm.h kvm: allow target-specific accelerator properties 2022-10-10 09:23:16 +02:00
memory_mapping.h sysemu/memory_mapping: Become target-agnostic 2022-03-06 13:15:42 +01:00
numa.h numa: drop support for '-numa node' (without memory specified) 2020-09-30 19:09:20 +02:00
nvmm.h Only check CONFIG_NVMM when NEED_CPU_H is defined 2021-09-13 13:56:26 +02:00
os-posix.h block: move fcntl_setfl() 2022-05-03 15:17:53 +04:00
os-win32.h util/qemu-sockets: Enable unix socket support on Windows 2022-09-02 15:54:46 +04:00
qtest.h cpu-timers, icount: new modules 2020-10-05 16:41:22 +02:00
replay.h chardev: src buffer const for write functions 2022-09-29 14:38:05 +04:00
reset.h reset: allow registering handlers that aren't called by snapshot loading 2022-10-27 11:34:31 +01:00
rng-random.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
rng.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
rtc.h rtc: Move RTC function prototypes to their own header 2022-01-28 14:29:46 +00:00
runstate-action.h vl: Add option to avoid stopping VM upon guest panic 2020-12-15 12:51:58 -05:00
runstate.h whpx: Added support for breakpoints and stepping 2022-04-06 14:31:55 +02:00
seccomp.h
sysemu.h pci: Move HMP command from hw/pci/pcie_aer.c to pci-hmp-cmds.c 2022-12-19 16:21:56 +01:00
tcg.h accel/tcg: Merge tcg_exec_init into tcg_init_machine 2021-06-11 09:26:28 -07:00
tpm_backend.h sysemu: Make TPM structures inaccessible if CONFIG_TPM is not defined 2021-06-15 10:55:12 -04:00
tpm_util.h tpm: Fix Lesser GPL version number 2020-11-15 16:44:18 +01:00
tpm.h sysemu: tpm: Add a stub function for TPM_IS_CRB 2022-05-06 09:06:50 -06:00
vhost-user-backend.h qom: Remove module_obj_name parameter from OBJECT_DECLARE* macros 2020-09-18 14:12:32 -04:00
watchdog.h watchdog: remove -watchdog option 2022-09-29 11:40:28 +02:00
whpx.h include/sysemu: Poison all accelerator CONFIG switches in common code 2021-05-14 12:31:44 +02:00
xen-mapcache.h
xen.h sysemu/xen: Add missing 'exec/cpu-common.h' header for ram_addr_t type 2020-09-30 19:11:36 +02:00