qemu/util
Stefan Hajnoczi 86a637e481 coroutine: cap per-thread local pool size
The coroutine pool implementation can hit the Linux vm.max_map_count
limit, causing QEMU to abort with "failed to allocate memory for stack"
or "failed to set up stack guard page" during coroutine creation.

This happens because per-thread pools can grow to tens of thousands of
coroutines. Each coroutine causes 2 virtual memory areas to be created.
Eventually vm.max_map_count is reached and memory-related syscalls fail.
The per-thread pool sizes are non-uniform and depend on past coroutine
usage in each thread, so it's possible for one thread to have a large
pool while another thread's pool is empty.

Switch to a new coroutine pool implementation with a global pool that
grows to a maximum number of coroutines and per-thread local pools that
are capped at hardcoded small number of coroutines.

This approach does not leave large numbers of coroutines pooled in a
thread that may not use them again. In order to perform well it
amortizes the cost of global pool accesses by working in batches of
coroutines instead of individual coroutines.

The global pool is a list. Threads donate batches of coroutines to when
they have too many and take batches from when they have too few:

.-----------------------------------.
| Batch 1 | Batch 2 | Batch 3 | ... | global_pool
`-----------------------------------'

Each thread has up to 2 batches of coroutines:

.-------------------.
| Batch 1 | Batch 2 | per-thread local_pool (maximum 2 batches)
`-------------------'

The goal of this change is to reduce the excessive number of pooled
coroutines that cause QEMU to abort when vm.max_map_count is reached
without losing the performance of an adequately sized coroutine pool.

Here are virtio-blk disk I/O benchmark results:

      RW BLKSIZE IODEPTH    OLD    NEW CHANGE
randread      4k       1 113725 117451 +3.3%
randread      4k       8 192968 198510 +2.9%
randread      4k      16 207138 209429 +1.1%
randread      4k      32 212399 215145 +1.3%
randread      4k      64 218319 221277 +1.4%
randread    128k       1  17587  17535 -0.3%
randread    128k       8  17614  17616 +0.0%
randread    128k      16  17608  17609 +0.0%
randread    128k      32  17552  17553 +0.0%
randread    128k      64  17484  17484 +0.0%

See files/{fio.sh,test.xml.j2} for the benchmark configuration:
https://gitlab.com/stefanha/virt-playbooks/-/tree/coroutine-pool-fix-sizing

Buglink: https://issues.redhat.com/browse/RHEL-28947
Reported-by: Sanjay Rao <srao@redhat.com>
Reported-by: Boaz Ben Shabat <bbenshab@redhat.com>
Reported-by: Joe Mario <jmario@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20240318183429.1039340-1-stefanha@redhat.com>
2024-03-19 10:49:31 -04:00
..
aio-posix.c iothread: Remove unused Error** argument in aio_context_set_aio_params 2024-01-08 10:45:34 -05:00
aio-posix.h aio: remove aio_disable_external() API 2023-05-30 17:37:26 +02:00
aio-wait.c aio-wait: avoid AioContext lock in aio_wait_bh_oneshot() 2023-05-10 14:15:13 +02:00
aio-win32.c iothread: Remove unused Error** argument in aio_context_set_aio_params 2024-01-08 10:45:34 -05:00
aiocb.c
async.c util/async: Only call icount_notify_exit() if icount is enabled 2024-01-19 12:28:59 +01:00
atomic64.c osdep: Move memalign-related functions to their own header 2022-03-07 13:16:49 +00:00
base64.c nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
bitmap.c migration: Use non-atomic ops for clear log bitmap 2022-11-21 11:58:10 +01:00
bitops.c replace TABs with spaces 2023-03-20 12:43:50 +01:00
block-helpers.c
block-helpers.h
buffer.c nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
bufferiszero.c util/bufferiszero: Use i386 host/cpuinfo.h 2023-05-23 16:51:13 -07:00
cacheflush.c util/cacheflush: Avoid possible redundant dcache flush on Darwin 2023-06-13 11:28:58 +02:00
chardev_open.c util/char_dev: Add open_cdev() 2023-12-19 19:03:38 +01:00
compatfd.c util: replace pipe()+cloexec with g_unix_open_pipe() 2022-05-03 15:18:14 +04:00
coroutine-sigaltstack.c osdep: set _FORTIFY_SOURCE=2 when optimization is enabled 2023-10-04 09:52:06 -04:00
coroutine-ucontext.c coroutine-ucontext: Save fake stack for pooled coroutine 2024-01-22 11:00:12 -05:00
coroutine-windows.c build: move coroutine backend selection to meson 2023-05-18 08:53:52 +02:00
cpuinfo-aarch64.c *: Delete checks for old host definitions 2023-09-19 13:20:54 -04:00
cpuinfo-i386.c host/include/i386: Implement clmul.h 2023-09-15 13:57:00 +00:00
cpuinfo-loongarch.c util: Add cpuinfo for loongarch64 2023-11-06 08:27:21 -08:00
cpuinfo-ppc.c util: fix build with musl libc on ppc64le 2024-01-11 08:48:16 +11:00
crc32c.c igb: Implement Rx SCTP CSO 2023-05-23 15:20:15 +08:00
crc-ccitt.c util: Add CRC16 (CCITT) calculation routines 2021-01-24 20:10:54 +01:00
cutils.c cutils: Fix get_relocated_path on Windows 2023-10-19 23:13:27 +02:00
dbus.c
defer-call.c util/defer-call: move defer_call() to util/ 2023-10-31 15:41:42 +01:00
drm.c
envlist.c replace TABs with spaces 2023-03-20 12:43:50 +01:00
error-report.c util/error: add G_GNUC_PRINTF for various functions 2023-01-11 10:44:34 +01:00
error.c util/error: Fix use-after-free errors reported by Coverity 2023-04-06 12:38:42 -04:00
event_notifier-posix.c Replace qemu_pipe() with g_unix_open_pipe() 2022-05-03 15:17:56 +04:00
event_notifier-win32.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
fdmon-epoll.c aio: remove aio_disable_external() API 2023-05-30 17:37:26 +02:00
fdmon-io_uring.c remove unnecessary casts from uintptr_t 2024-01-18 10:43:51 +01:00
fdmon-poll.c aio: remove aio_disable_external() API 2023-05-30 17:37:26 +02:00
fifo8.c util/fifo8: Introduce fifo8_peek_buf() 2024-01-10 06:58:50 +00:00
filemonitor-inotify.c util/filemonitor-inotify.c: spelling fix: kenel 2023-11-15 12:06:05 +03:00
filemonitor-stub.c nomaintainer: Fix Lesser GPL version number 2020-11-15 17:04:40 +01:00
getauxval.c
guest-random.c util/guest-random: Clean up global variable shadowing 2023-10-06 13:27:48 +02:00
hbitmap.c hbitmap: fix hbitmap_status() return value for first dirty bit case 2023-02-17 14:34:24 +01:00
hexdump.c include: move C/util-related declarations to cutils.h 2022-04-06 14:31:43 +02:00
host-utils.c host-utils: Implemented signed 256-by-128 division 2022-06-20 08:38:58 -03:00
id.c net: Use id_generate() in the network subsystem, too 2021-03-09 21:47:45 +01:00
int128.c include/qemu/int128: Use Int128 structure for TCI 2023-02-04 06:19:42 -10:00
interval-tree.c util/interval-tree: Check root for null in interval_tree_iter_first 2023-08-09 09:26:32 -07:00
iov.c util/iov: Avoid dynamic stack allocation 2023-09-07 20:32:11 -05:00
iova-tree.c util: accept iova_tree_remove_parameter by value 2022-09-02 10:22:39 +08:00
keyval.c include: add qemu/keyval.h 2022-04-21 17:03:51 +04:00
lockcnt.c
log.c util/log: re-allow switching away from stderr log file 2023-10-07 19:02:33 +02:00
main-loop.c system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() 2024-01-08 10:45:43 -05:00
memalign.c osdep: Move memalign-related functions to their own header 2022-03-07 13:16:49 +00:00
memfd.c
meson.build meson: Link with libinotify on FreeBSD 2024-02-06 10:27:50 +01:00
mmap-alloc.c util/mmap-alloc: qemu_fd_getfs() 2023-04-24 11:29:00 +02:00
module.c module: add Error arguments to module_load and module_load_qom 2022-11-06 09:48:50 +01:00
notify.c notify: pass error to notifier with return 2024-02-28 11:31:28 +08:00
nvdimm-utils.c Clean up includes 2020-12-10 17:16:44 +01:00
osdep.c error handling: Use RETRY_ON_EINTR() macro where applicable 2023-01-09 13:50:47 +01:00
oslib-posix.c oslib-posix: fix memory leak in touch_all_pages 2024-03-08 15:51:22 +01:00
oslib-win32.c oslib-posix: initialize backend memory objects in parallel 2024-02-06 08:15:22 +01:00
path.c
qdist.c util: spelling fixes 2023-08-31 19:47:43 +02:00
qemu-co-shared-resource.c co-shared-resource: protect with a mutex 2021-06-25 14:24:24 +03:00
qemu-co-timeout.c util: add qemu-co-timeout 2022-06-29 10:56:12 +03:00
qemu-config.c error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
qemu-coroutine-io.c aio: remove aio_disable_external() API 2023-05-30 17:37:26 +02:00
qemu-coroutine-lock.c atomics: eliminate mb_read/mb_set 2023-06-06 09:42:14 +02:00
qemu-coroutine-sleep.c coroutine: Clean up superfluous inclusion of qemu/coroutine.h 2023-01-19 10:18:28 +01:00
qemu-coroutine.c coroutine: cap per-thread local pool size 2024-03-19 10:49:31 -04:00
qemu-option.c qemu-option: Allow deleting opts during qemu_opts_foreach() 2021-10-15 16:11:22 +02:00
qemu-print.c
qemu-progress.c include: move progress API to qemu-progress.h 2022-04-06 14:31:43 +02:00
qemu-sockets.c qapi: Improve documentation of file descriptor socket addresses 2024-02-12 10:04:32 +01:00
qemu-thread-common.h
qemu-thread-posix.c qemu-thread-posix: cleanup, fix, document QemuEvent 2023-03-07 12:38:40 +01:00
qemu-thread-win32.c qemu-thread-win32: cleanup, fix, document QemuEvent 2023-03-07 12:38:40 +01:00
qemu-timer-common.c semihosting: Implement SYS_ELAPSED and SYS_TICKFREQ 2021-01-18 10:05:06 +00:00
qemu-timer.c qemu-timer: Skip empty timer lists before locking in qemu_clock_deadline_ns_all 2022-06-21 09:24:34 -07:00
qht.c util/qht: use striped locks under TSAN 2023-02-02 11:48:20 +00:00
qsp.c system/cpus: rename qemu_mutex_lock_iothread() to bql_lock() 2024-01-08 10:45:43 -05:00
qtree.c tcg: use QTree instead of GTree 2023-03-28 15:23:10 -07:00
range.c util/range.c: spelling fix: inbetween 2023-11-15 12:06:05 +03:00
rcu.c Replace "iothread lock" with "BQL" in comments 2024-01-08 10:45:43 -05:00
readline.c readline: Extract readline_add_completion_of() from monitor 2023-02-04 07:56:54 +01:00
reserved-region.c util/reserved-region: Add new ReservedRegion helpers 2023-11-03 09:20:31 +01:00
selfmap.c util/selfmap: Use dev_t and ino_t in MapInfo 2023-09-01 13:34:03 -07:00
stats64.c stat64: Add stat64_set() operation 2023-04-27 16:39:43 +02:00
sys_membarrier.c
systemd.c systemd: Also clear LISTEN_FDNAMES during systemd socket activation 2023-05-03 14:00:08 -05:00
thread-context.c qapi: Use returned bool to check for failure (again) 2022-12-14 16:19:35 +01:00
thread-pool.c virtio: use defer_call() in virtio_irqfd_notify() 2023-10-31 15:42:14 +01:00
throttle.c throttle: use THROTTLE_MAX/ARRAY_SIZE for hard code 2023-08-29 10:49:24 +02:00
timed-average.c
trace-events console/win32: allocate shareable display surface 2023-06-27 17:08:56 +02:00
trace.h
transactions.c transactions: Invoke clean() after everything else 2021-11-16 09:43:44 +01:00
unicode.c
uri.c util/uri: Remove unused macros ISA_RESERVED() and ISA_GEN_DELIM() 2024-01-24 09:54:05 +01:00
userfaultfd.c misc: Clean up includes 2024-01-30 21:20:20 +03:00
uuid.c util/uuid: Add UUID_STR_LEN definition 2023-11-03 09:20:31 +01:00
vfio-helpers.c util/vfio-helpers: Use g_file_read_link() 2023-05-24 09:21:22 +02:00
vhost-user-server.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
yank.c qapi: Fix dangling references to docs/devel/qapi-code-gen.txt 2024-01-26 07:04:53 +01:00