Go to file
Peter Xu ae4e46cd20 KVM: Dynamic sized kvm memslots array
Zhiyi reported an infinite loop issue in VFIO use case.  The cause of that
was a separate discussion, however during that I found a regression of
dirty sync slowness when profiling.

Each KVMMemoryListerner maintains an array of kvm memslots.  Currently it's
statically allocated to be the max supported by the kernel.  However after
Linux commit 4fc096a99e ("KVM: Raise the maximum number of user memslots"),
the max supported memslots reported now grows to some number large enough
so that it may not be wise to always statically allocate with the max
reported.

What's worse, QEMU kvm code still walks all the allocated memslots entries
to do any form of lookups.  It can drastically slow down all memslot
operations because each of such loop can run over 32K times on the new
kernels.

Fix this issue by making the memslots to be allocated dynamically.

Here the initial size was set to 16 because it should cover the basic VM
usages, so that the hope is the majority VM use case may not even need to
grow at all (e.g. if one starts a VM with ./qemu-system-x86_64 by default
it'll consume 9 memslots), however not too large to waste memory.

There can also be even better way to address this, but so far this is the
simplest and should be already better even than before we grow the max
supported memslots.  For example, in the case of above issue when VFIO was
attached on a 32GB system, there are only ~10 memslots used.  So it could
be good enough as of now.

In the above VFIO context, measurement shows that the precopy dirty sync
shrinked from ~86ms to ~3ms after this patch applied.  It should also apply
to any KVM enabled VM even without VFIO.

NOTE: we don't have a FIXES tag for this patch because there's no real
commit that regressed this in QEMU. Such behavior existed for a long time,
but only start to be a problem when the kernel reports very large
nr_slots_max value.  However that's pretty common now (the kernel change
was merged in 2021) so we attached cc:stable because we'll want this change
to be backported to stable branches.

Cc: qemu-stable <qemu-stable@nongnu.org>
Reported-by: Zhiyi Guo <zhguo@redhat.com>
Tested-by: Zhiyi Guo <zhguo@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/r/20240917163835.194664-2-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 5504a81261)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in accel/kvm/kvm-all.c and accel/kvm/trace-events;
 also remove now-unused local variable `KVMState *s` in kvm-all.c:kvm_log_sync_global() )
2024-11-08 13:02:21 +03:00
.github/workflows github: fix config mistake preventing repo lockdown commenting 2022-04-26 16:12:26 +01:00
.gitlab/issue_templates GitLab: Add "Feature Request" issue template. 2021-06-25 10:08:37 +01:00
.gitlab-ci.d tests/docker: remove debian-armel-cross 2024-09-12 15:16:37 +03:00
accel KVM: Dynamic sized kvm memslots array 2024-11-08 13:02:21 +03:00
audio audio: Depend on dbus_display1_dep 2024-02-21 14:18:06 +03:00
authz configure, meson: convert pam detection to meson 2021-06-25 10:54:10 +02:00
backends backends/cryptodev-builtin: Fix local_error leaks 2024-04-30 20:10:14 +03:00
block block/reqlist: allow adding overlapping requests 2024-10-01 19:22:07 +03:00
bsd-user bsd-user: Catch up with sys/param.h requirement for machine/pmap.h 2022-10-26 14:09:17 -06:00
chardev chardev/char-win-stdio.c: restore old console mode 2024-07-24 07:47:42 +03:00
common-user common-user: Only compile the common user code if have_user is set 2022-06-28 11:12:05 +02:00
configs hw/isa/Kconfig: Fix dependencies of piix4 southbridge 2022-10-31 11:32:07 +01:00
contrib contrib/plugins: protect execlog's last_exec expansion 2022-10-31 20:37:59 +00:00
crypto crypto: check gnutls & gcrypt support the requested pbkdf hash 2024-09-12 09:20:33 +03:00
disas disas/riscv: Decode all of the pmpcfg and pmpaddr CSRs 2024-06-05 13:07:40 +03:00
docs docs/sphinx/depfile.py: Handle env.doc2path() returning a Path not a str 2024-08-02 10:27:13 +03:00
dtc@b6910bec11 dtc: Update to version 1.6.1 2021-10-14 08:08:11 +02:00
dump dump: kdump-zlib data pages not dumped with pvtime/aarch64 2023-09-11 10:53:50 +03:00
ebpf ebpf: replace deprecated bpf_program__set_socket_filter 2022-07-06 11:39:09 +08:00
fpu softfloat: Fix the incorrect computation in float32_exp2 2023-05-18 21:09:59 +03:00
fsdev 9pfs: prevent opening special files (CVE-2023-2861) 2023-06-08 23:52:29 +03:00
gdb-xml gdb-xml: Fix size of EFER register on i386 architecture when debugged by GDB 2022-11-06 09:48:26 +01:00
gdbstub gdbstub: move guest debug support check to ops 2022-10-06 11:53:41 +01:00
hw hw/audio/hda: free timer on exit 2024-10-16 11:14:05 +03:00
include KVM: Dynamic sized kvm memslots array 2024-11-08 13:02:21 +03:00
io io: remove io watch if TLS channel is closed during handshake 2023-08-02 17:22:20 +03:00
libdecnumber libdecnumber/dpd/decimal64: Fix compiler warning from Clang 15 2022-11-11 09:13:52 +01:00
linux-headers Update linux headers to v6.0-rc4 2022-09-26 17:23:47 +02:00
linux-user linux-user: Fix parse_elf_properties GNU0_MAGIC check 2024-10-10 21:04:46 +03:00
meson@3a9b285a55 meson: require 0.61.3 2022-10-01 21:16:36 +02:00
migration migration: Skip only empty block devices 2024-03-19 19:23:00 +03:00
monitor monitor/hmp-cmds-target: Append a space in error message in gpa2hva() 2024-04-09 20:09:20 +03:00
nbd nbd/server: CVE-2024-7409: Drop non-negotiating clients 2024-08-11 11:01:20 +03:00
net net: Update MemReentrancyGuard for NIC 2023-11-29 16:20:11 +03:00
pc-bios optionrom: Remove build-id section 2023-10-03 18:21:41 +03:00
plugins plugins: add [pre|post]fork helpers to linux-user 2022-10-06 11:53:41 +01:00
po po: add ukrainian translation 2022-07-05 10:15:49 +02:00
python python: drop pipenv 2023-09-11 10:53:50 +03:00
qapi nbd/server: CVE-2024-7409: Cap default max-connections to 100 2024-08-11 11:00:05 +03:00
qga qga/win32: Use rundll for VSS installation 2023-08-02 16:07:32 +03:00
qobject include/qapi: add g_autoptr support for qobject types 2022-04-06 10:50:38 +02:00
qom module: add Error arguments to module_load and module_load_qom 2022-11-06 09:48:50 +01:00
replay replay: Fix declaration of replay_read_next_clock 2022-11-29 11:09:11 -05:00
roms target/hppa: Update to SeaBIOS-hppa version 8 2023-06-26 19:35:39 +03:00
scripts tracetool: avoid invalid escape in Python string 2024-10-03 13:41:34 +03:00
scsi QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
semihosting semihosting/arm-compat-semi: Avoid using hardcoded /tmp 2022-10-31 20:37:58 +00:00
softmmu softmmu/physmem.c: Keep transaction attribute in address_space_map() 2024-09-19 17:22:52 +03:00
storage-daemon qsd: Unlink absolute PID file path 2022-07-12 14:30:38 +02:00
stubs qga: Allow building of the guest agent without system emulators or tools 2022-11-11 09:17:45 +01:00
subprojects libvhost-user: check for NULL when allocating a virtqueue element 2023-03-29 10:20:04 +03:00
target target/ppc: Fix lxvx/stxvx facility check 2024-09-28 07:25:13 +03:00
tcg tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers 2024-06-20 15:25:32 +03:00
tests tests/docker: remove debian-armel-cross 2024-09-12 15:16:37 +03:00
tools virtiofsd: Add sigreturn to the seccomp whitelist 2022-11-25 13:56:05 -05:00
trace include/hw/core: Create struct CPUJumpCache 2022-10-04 12:13:12 -07:00
ui ui/sdl2: Allow host to power down screen 2024-06-06 14:20:13 +03:00
util module: Prevent crash by resetting local_err in module_load_qom_all() 2024-09-12 09:20:33 +03:00
.cirrus.yml ci: Upgrade msys2 release to 20220603 2022-07-29 10:33:29 -07:00
.dir-locals.el
.editorconfig .editorconfig: update the automatic mode setting for Emacs 2021-03-10 15:34:11 +00:00
.exrc
.gdbinit
.gitattributes gitattributes: Cover Objective-C source files 2022-03-29 00:15:14 +02:00
.gitignore .gitignore: add multiple items to .gitignore 2022-10-22 22:48:17 +02:00
.gitlab-ci.yml docs: Document GitLab custom CI/CD variables 2021-07-29 07:56:01 +02:00
.gitmodules Remove the slirp submodule (i.e. compile only with an external libslirp) 2022-09-26 17:23:47 +02:00
.gitpublish
.mailmap MAINTAINERS: Replace my amsat.org email address 2022-10-17 17:21:22 -04:00
.patchew.yml scripts/checkpatch: roll diff tweaking into checkpatch itself 2021-06-25 10:08:33 +01:00
.readthedocs.yml readthodocs: fully specify a build environment 2024-01-23 18:48:46 +03:00
.travis.yml Revert "gitlab: disable accelerated zlib for s390x" 2022-07-20 12:15:09 +01:00
block.c block: Parse filenames only when explicitly requested 2024-07-04 00:08:21 +03:00
blockdev-nbd.c nbd/server: CVE-2024-7409: Avoid use-after-free when closing server 2024-08-27 20:33:00 +03:00
blockdev.c block: Fix use after free in blockdev_mark_auto_del() 2023-05-18 21:09:59 +03:00
blockjob.c block: Make bdrv_child_get_parent_aio_context I/O 2022-11-10 14:58:34 +01:00
configure configure: fix GLIB_VERSION for cross-compilation 2023-03-29 10:20:04 +03:00
COPYING
COPYING.LIB
cpu.c accel/tcg: Complete cpu initialization before registration 2022-11-01 08:31:41 +11:00
cpus-common.c cpus: Introduce cpu_list_generation_id 2022-07-20 12:15:08 +01:00
disas.c disas: use result of ->read_memory_func 2022-10-06 11:53:40 +01:00
event-loop-base.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
gitdm.config contrib/gitdm: add a new interns group-map for GSoC/Outreachy work 2021-07-23 17:22:16 +01:00
hmp-commands-info.hx hmp: add virtio commands 2022-10-09 16:38:45 -04:00
hmp-commands.hx qapi: net: add stream and dgram netdevs 2022-10-28 13:28:52 +08:00
iothread.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
job-qmp.c job.c: enable job lock/unlock and remove Aiocontext locks 2022-10-07 12:11:41 +02:00
job.c block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context 2022-10-27 20:14:11 +02:00
Kconfig meson: Introduce target-specific Kconfig 2021-07-09 18:21:34 +02:00
Kconfig.host vfio-user: build library 2022-06-15 16:42:33 +01:00
LICENSE tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing 2019-11-11 15:11:21 +01:00
MAINTAINERS gitlab-ci: Remove job building EDK2 firmware binaries 2024-04-24 12:29:57 +03:00
Makefile configure: cleanup creation of tests/tcg target config 2022-10-06 11:53:40 +01:00
memory_ldst.c.inc exec/memory_ldst: Use correct type sizes 2021-05-26 08:35:51 -07:00
meson_options.txt gtk: disable GTK Clipboard with a new meson option 2022-11-23 12:15:06 +01:00
meson.build block/blkio: use FUA flag on write zeroes only if supported 2024-09-12 09:20:33 +03:00
module-common.c
os-posix.c os-posix: asynchronous teardown for shutdown on Linux 2022-10-31 09:46:34 +01:00
os-win32.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary-common.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary.c include: move target page bits declaration to page-vary.h 2022-04-06 14:31:43 +02:00
qemu-bridge-helper.c qemu-bridge-helper: relocate path to default ACL 2020-09-30 19:11:36 +02:00
qemu-edid.c qemu-edid: Restrict input parameter -d to avoid division by zero 2022-10-12 13:38:15 +02:00
qemu-img-cmds.hx qemu-img: Unify [-b [-F]] documentation 2022-02-01 13:49:15 +01:00
qemu-img.c qemu-img: rebase: stop when reaching EOF of old backing file 2023-11-02 15:04:24 +03:00
qemu-io-cmds.c block: Change blk_pwrite_compressed() param order 2022-07-12 12:14:56 +02:00
qemu-io.c include: move qemu_*_exec_dir() to cutils 2022-05-28 11:42:56 +02:00
qemu-keymap.c qemu-keymap: Add license in generated files 2021-12-17 10:41:50 +01:00
qemu-nbd.c nbd/server: Plumb in new args to nbd_client_add() 2024-08-11 10:57:00 +03:00
qemu-options.hx qemu-options: Fix CXL Fixed Memory Window interleave-granularity typo 2024-04-10 19:27:46 +03:00
qemu.nsi nsis installer: Fix mouse-over descriptions for emulators 2022-03-18 10:55:15 +00:00
qemu.sasl sasl: remove comment about obsolete kerberos versions 2021-06-14 13:28:50 +01:00
README.rst README.rst: fix link formatting 2022-08-04 13:44:21 +02:00
replication.c replication: move include out of root directory 2021-05-26 14:49:46 +02:00
trace-events gdbstub: move into its own sub directory 2022-10-06 11:53:41 +01:00
VERSION Update version for 7.2.14 release 2024-09-18 19:14:56 +03:00
version.rc configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION 2021-01-02 21:03:37 +01:00

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_