Go to file
Stefan Hajnoczi 86a637e481 coroutine: cap per-thread local pool size
The coroutine pool implementation can hit the Linux vm.max_map_count
limit, causing QEMU to abort with "failed to allocate memory for stack"
or "failed to set up stack guard page" during coroutine creation.

This happens because per-thread pools can grow to tens of thousands of
coroutines. Each coroutine causes 2 virtual memory areas to be created.
Eventually vm.max_map_count is reached and memory-related syscalls fail.
The per-thread pool sizes are non-uniform and depend on past coroutine
usage in each thread, so it's possible for one thread to have a large
pool while another thread's pool is empty.

Switch to a new coroutine pool implementation with a global pool that
grows to a maximum number of coroutines and per-thread local pools that
are capped at hardcoded small number of coroutines.

This approach does not leave large numbers of coroutines pooled in a
thread that may not use them again. In order to perform well it
amortizes the cost of global pool accesses by working in batches of
coroutines instead of individual coroutines.

The global pool is a list. Threads donate batches of coroutines to when
they have too many and take batches from when they have too few:

.-----------------------------------.
| Batch 1 | Batch 2 | Batch 3 | ... | global_pool
`-----------------------------------'

Each thread has up to 2 batches of coroutines:

.-------------------.
| Batch 1 | Batch 2 | per-thread local_pool (maximum 2 batches)
`-------------------'

The goal of this change is to reduce the excessive number of pooled
coroutines that cause QEMU to abort when vm.max_map_count is reached
without losing the performance of an adequately sized coroutine pool.

Here are virtio-blk disk I/O benchmark results:

      RW BLKSIZE IODEPTH    OLD    NEW CHANGE
randread      4k       1 113725 117451 +3.3%
randread      4k       8 192968 198510 +2.9%
randread      4k      16 207138 209429 +1.1%
randread      4k      32 212399 215145 +1.3%
randread      4k      64 218319 221277 +1.4%
randread    128k       1  17587  17535 -0.3%
randread    128k       8  17614  17616 +0.0%
randread    128k      16  17608  17609 +0.0%
randread    128k      32  17552  17553 +0.0%
randread    128k      64  17484  17484 +0.0%

See files/{fio.sh,test.xml.j2} for the benchmark configuration:
https://gitlab.com/stefanha/virt-playbooks/-/tree/coroutine-pool-fix-sizing

Buglink: https://issues.redhat.com/browse/RHEL-28947
Reported-by: Sanjay Rao <srao@redhat.com>
Reported-by: Boaz Ben Shabat <bbenshab@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240318183429.1039340-1-stefanha@redhat.com>
2024-03-19 10:49:31 -04:00
.github/workflows github: fix config mistake preventing repo lockdown commenting 2022-04-26 16:12:26 +01:00
.gitlab/issue_templates .gitlab/issue_templates: Move suggestions into comments 2022-12-15 15:19:24 +01:00
.gitlab-ci.d gitlab-ci: add manual job to run Coverity 2024-03-08 19:11:00 +01:00
accel bulk: Call in place single use cpu_env() 2024-03-12 11:46:16 +01:00
audio audio: Depend on dbus_display1_dep 2024-02-16 17:27:22 +04:00
authz error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
backends backends/iommufd: Fix missing ERRP_GUARD() for error_prepend() 2024-03-12 11:45:33 +01:00
block qemu-img: Fix Column Width and Improve Formatting in snapshot list 2024-03-18 13:30:34 +01:00
bsd-user gdbstub: Save target's siginfo 2024-03-13 11:43:52 +00:00
chardev char: Slightly better error reporting when chardev is in use 2024-03-09 18:56:37 +03:00
common-user common-user/host/ppc: Implement safe-syscall.inc.S 2023-01-23 14:39:48 -10:00
configs mips: do not list individual devices from configs/ 2024-03-08 15:51:22 +01:00
contrib contrib/elf2dmp: Ensure phdrs fit in file 2024-03-11 17:06:27 +00:00
crypto crypto: Introduce 'detached-header' field in QCryptoBlockInfoLUKS 2024-02-09 12:50:37 +00:00
disas target/riscv: honour show_opcodes when disassembling 2024-03-06 12:35:51 +00:00
docs * Clarify s390x CPU topology docs and CPU compatibility error messages 2024-03-19 10:25:25 +00:00
dump dump: Fix HMP dump-guest-memory -z without -R 2024-01-30 21:20:20 +03:00
ebpf ebpf: Updated eBPF program and skeleton. 2024-03-12 19:31:47 +08:00
fpu fpu: Handle m68k extended precision denormals properly 2023-09-16 14:57:16 +00:00
fsdev configure, meson: rename targetos to host_os 2023-12-31 09:11:29 +01:00
gdb-xml gdb-xml: fix duplicate register in arm-neon.xml 2023-11-08 15:15:23 +00:00
gdbstub gdbstub: Fix double close() of the follow-fork-mode socket 2024-03-13 11:43:52 +00:00
host/include host/include/generic/host/atomic128: Fix compilation problem with Clang 17 2023-11-13 11:35:47 +01:00
hw smbios: add extra comments to smbios_get_table_legacy() 2024-03-18 08:42:46 -04:00
include virtio,pc,pci: bugfixes 2024-03-19 10:25:15 +00:00
io io: Introduce qio_channel_file_new_dupfd 2024-03-12 15:22:23 -04:00
libdecnumber libdecnumber/dpd/decimal64: Fix compiler warning from Clang 15 2022-11-11 09:13:52 +01:00
linux-headers linux-headers: Update to Linux v6.8-rc6 2024-03-08 20:48:03 +10:00
linux-user final updates for 9.0 (testing, gdbstub): 2024-03-13 15:12:14 +00:00
migration Migration pull for 9.0-rc0 2024-03-18 17:16:00 +00:00
monitor monitor/target: Include missing 'exec/memory.h' header 2024-02-13 10:59:25 +03:00
nbd nbd/server: Fix race in draining the export 2024-03-18 12:38:02 +01:00
net virtio,pc,pci: features, cleanups, fixes 2024-03-13 15:11:53 +00:00
pc-bios pc-bios/README: Add information about hppa-firmware 2024-03-03 06:41:19 +01:00
plugins plugins: cleanup codepath for previous inline operation 2024-03-06 12:35:50 +00:00
po po: add ukrainian translation 2022-07-05 10:15:49 +02:00
python buildsys: Bump known good meson version to v1.2.3 2023-11-24 16:21:55 +01:00
qapi smbios: extend smbios-entry-point-type with 'auto' value 2024-03-18 08:42:46 -04:00
qga qga-win: Add support of Windows Server 2025 in get-osinfo command 2024-03-11 18:24:39 +02:00
qobject docs/interop: Convert qmp-spec.txt to rST 2023-05-22 10:21:01 +02:00
qom hw/acpi: move object_resolve_type_unambiguous to core QOM 2024-02-27 09:36:41 +01:00
replay replay: Improve error messages about configuration conflicts 2024-03-09 18:56:36 +03:00
roms roms/hppa: Add build rules for hppa-firmware 2024-03-03 06:41:19 +01:00
scripts tracetool: remove redundant --target-type / --target-name args 2024-03-12 14:52:07 -04:00
scsi configure, meson: rename targetos to host_os 2023-12-31 09:11:29 +01:00
semihosting {linux,bsd}-user: Introduce get_task_state() 2024-03-06 12:35:19 +00:00
stats meson: Replace softmmu_ss -> system_ss 2023-06-20 10:01:30 +02:00
storage-daemon meson: remove config_targetos 2023-12-31 09:11:28 +01:00
stubs migration: privatize colo interfaces 2024-03-11 16:28:59 -04:00
subprojects libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMP 2024-03-12 17:56:55 -04:00
system physmem: Factor cpu_physical_memory_dirty_bits_cleared() out 2024-03-12 17:39:40 -04:00
target target/sparc/cpu: Improve the CPU help text 2024-03-18 17:11:19 +01:00
tcg tcg/aarch64: Fix tcg_out_brcond for test comparisons 2024-03-12 04:09:21 -10:00
tests virtio,pc,pci: bugfixes 2024-03-19 10:25:15 +00:00
tools ebpf: Updated eBPF program and skeleton. 2024-03-12 19:31:47 +08:00
trace tracing: install trace events file only if necessary 2023-12-27 05:01:55 -05:00
ui ui/dbus: filter out pending messages when scanout 2024-03-12 17:57:58 +04:00
util coroutine: cap per-thread local pool size 2024-03-19 10:49:31 -04:00
.dir-locals.el Add .dir-locals.el file to configure emacs coding style 2015-10-08 19:46:01 +03:00
.editorconfig .editorconfig: update the automatic mode setting for Emacs 2021-03-10 15:34:11 +00:00
.exrc qemu: add .exrc 2012-09-07 09:02:44 +03:00
.gdbinit .gdbinit: load QEMU sub-commands when gdb starts 2017-06-07 14:38:45 +01:00
.git-blame-ignore-revs metadata: add .git-blame-ignore-revs 2023-04-04 15:56:44 +01:00
.gitattributes gitattributes: Cover Objective-C source files 2022-03-29 00:15:14 +02:00
.gitignore configure: rename --enable-pypi to --enable-download, control subprojects too 2023-06-06 16:30:01 +02:00
.gitlab-ci.yml docs: Document GitLab custom CI/CD variables 2021-07-29 07:56:01 +02:00
.gitmodules meson: subprojects: replace berkeley-{soft,test}float-3 with wraps 2023-06-06 16:30:01 +02:00
.gitpublish Add a git-publish configuration file 2018-03-05 09:03:17 +00:00
.mailmap mailmap: Fix Stefan Weil email 2024-01-30 21:20:20 +03:00
.patchew.yml scripts/checkpatch: roll diff tweaking into checkpatch itself 2021-06-25 10:08:33 +01:00
.readthedocs.yml readthodocs: fully specify a build environment 2024-01-12 13:23:48 +00:00
.travis.yml travis-ci: Rename SOFTMMU -> SYSTEM 2024-03-18 17:18:05 +01:00
block.c error: Move ERRP_GUARD() to the beginning of the function 2024-03-12 11:45:45 +01:00
blockdev-nbd.c qapi block: Elide redundant has_FOO in generated C 2022-12-14 20:03:25 +01:00
blockdev.c blockdev: Fix blockdev-snapshot-sync error reporting for no medium 2024-03-18 13:13:08 +01:00
blockjob.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
configure configure: put all symlink creation together 2024-02-16 13:56:09 +01:00
COPYING COPYING: update from FSF 2008-10-12 17:54:42 +00:00
COPYING.LIB COPYING.LIB: Synchronize the LGPL 2.1 with the version from gnu.org 2019-01-30 11:01:22 +01:00
cpu-common.c system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() 2024-01-08 10:45:43 -05:00
cpu-target.c cpu: Remove page_size_init 2024-02-29 11:35:37 -10:00
event-loop-base.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
gitdm.config contrib/gitdm: add group map for AMD 2023-03-22 15:08:26 +00:00
hmp-commands-info.hx hmp: Add option to info qtree to omit details 2024-03-09 19:17:01 +01:00
hmp-commands.hx hmp: Remove deprecated 'singlestep' command 2024-01-19 11:38:32 +01:00
iothread.c iothread: Simplify expression in qemu_in_iothread() 2024-02-13 10:59:25 +03:00
job-qmp.c qapi job: Elide redundant has_FOO in generated C 2022-12-14 20:04:47 +01:00
job.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
Kconfig meson: Introduce target-specific Kconfig 2021-07-09 18:21:34 +02:00
Kconfig.host build-sys: add a "pixman" feature 2023-11-07 14:04:24 +04:00
LICENSE tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing 2019-11-11 15:11:21 +01:00
MAINTAINERS virtio,pc,pci: features, cleanups, fixes 2024-03-13 15:11:53 +00:00
Makefile Makefile: clean qemu-iotests output 2023-12-31 09:11:28 +01:00
meson_options.txt meson: fix type of "relocatable" option 2023-12-31 09:11:27 +01:00
meson.build Pull request 2024-03-13 12:37:15 +00:00
module-common.c all: Clean up includes 2016-02-04 17:41:30 +00:00
os-posix.c qemu_init: increase NOFILE soft limit on POSIX 2024-02-09 12:47:58 +00:00
os-win32.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary-common.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary-target.c exec: Rename target specific page-vary.c -> page-vary-target.c 2023-10-04 11:03:54 -07:00
pythondeps.toml buildsys: Bump known good meson version to v1.2.3 2023-11-24 16:21:55 +01:00
qemu-bridge-helper.c qemu-bridge-helper: relocate path to default ACL 2020-09-30 19:11:36 +02:00
qemu-edid.c qemu-edid: Restrict input parameter -d to avoid division by zero 2022-10-12 13:38:15 +02:00
qemu-img-cmds.hx docs/devel/docs: Document .hx file syntax 2024-01-15 17:12:22 +00:00
qemu-img.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
qemu-io-cmds.c block: Mark bdrv_get_specific_info() and callers GRAPH_RDLOCK 2023-10-12 16:31:33 +02:00
qemu-io.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
qemu-keymap.c qemu-keymap: properly check return from xkb_keymap_mod_get_index 2023-07-03 12:51:21 +01:00
qemu-nbd.c qemu-nbd: mention --tls-hostname option in qemu-nbd --help 2024-02-13 10:59:25 +03:00
qemu-options.hx qemu-options.hx: Document the virtio-iommu-pci aw-bits option 2024-03-12 17:59:10 -04:00
qemu.nsi nsis installer: Fix mouse-over descriptions for emulators 2022-03-18 10:55:15 +00:00
qemu.sasl sasl: remove comment about obsolete kerberos versions 2021-06-14 13:28:50 +01:00
README.rst README.rst: fix link formatting 2022-08-04 13:44:21 +02:00
replication.c replication: move include out of root directory 2021-05-26 14:49:46 +02:00
trace-events trace-events: remove the remaining vcpu trace events 2023-06-01 11:05:05 -04:00
VERSION Open 9.0 development tree 2023-12-19 09:46:22 -05:00
version.rc configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION 2021-01-02 21:03:37 +01:00

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_