Go to file
thomas 681e338ee6 virtio-net: Fix network stall at the host side waiting for kick
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f937309fbd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in include/hw/virtio/virtio.h)
2024-08-06 17:18:25 +03:00
.github/workflows github: fix config mistake preventing repo lockdown commenting 2022-04-26 16:12:26 +01:00
.gitlab/issue_templates GitLab: Add "Feature Request" issue template. 2021-06-25 10:08:37 +01:00
.gitlab-ci.d gitlab-ci: Disable the riscv64-debian-cross-container by default 2024-06-30 18:11:21 +03:00
accel accel/tcg: Revert mapping of PCREL translation block to multiple virtual addresses 2024-01-25 19:14:18 +03:00
audio audio: Depend on dbus_display1_dep 2024-02-21 14:18:06 +03:00
authz configure, meson: convert pam detection to meson 2021-06-25 10:54:10 +02:00
backends backends/cryptodev-builtin: Fix local_error leaks 2024-04-30 20:10:14 +03:00
block qcow2: Don't open data_file with BDRV_O_NO_IO 2024-07-03 23:51:20 +03:00
bsd-user bsd-user: Catch up with sys/param.h requirement for machine/pmap.h 2022-10-26 14:09:17 -06:00
chardev chardev/char-win-stdio.c: restore old console mode 2024-07-24 07:47:42 +03:00
common-user common-user: Only compile the common user code if have_user is set 2022-06-28 11:12:05 +02:00
configs hw/isa/Kconfig: Fix dependencies of piix4 southbridge 2022-10-31 11:32:07 +01:00
contrib contrib/plugins: protect execlog's last_exec expansion 2022-10-31 20:37:59 +00:00
crypto crypto: Support export akcipher to pkcs8 2022-11-02 06:56:32 -04:00
disas disas/riscv: Decode all of the pmpcfg and pmpaddr CSRs 2024-06-05 13:07:40 +03:00
docs docs/sphinx/depfile.py: Handle env.doc2path() returning a Path not a str 2024-08-02 10:27:13 +03:00
dtc@b6910bec11 dtc: Update to version 1.6.1 2021-10-14 08:08:11 +02:00
dump dump: kdump-zlib data pages not dumped with pvtime/aarch64 2023-09-11 10:53:50 +03:00
ebpf ebpf: replace deprecated bpf_program__set_socket_filter 2022-07-06 11:39:09 +08:00
fpu softfloat: Fix the incorrect computation in float32_exp2 2023-05-18 21:09:59 +03:00
fsdev 9pfs: prevent opening special files (CVE-2023-2861) 2023-06-08 23:52:29 +03:00
gdb-xml gdb-xml: Fix size of EFER register on i386 architecture when debugged by GDB 2022-11-06 09:48:26 +01:00
gdbstub gdbstub: move guest debug support check to ops 2022-10-06 11:53:41 +01:00
hw virtio-net: Fix network stall at the host side waiting for kick 2024-08-06 17:18:25 +03:00
include virtio-net: Fix network stall at the host side waiting for kick 2024-08-06 17:18:25 +03:00
io io: remove io watch if TLS channel is closed during handshake 2023-08-02 17:22:20 +03:00
libdecnumber libdecnumber/dpd/decimal64: Fix compiler warning from Clang 15 2022-11-11 09:13:52 +01:00
linux-headers Update linux headers to v6.0-rc4 2022-09-26 17:23:47 +02:00
linux-user linux-user: Make TARGET_NR_setgroups affect only the current thread 2024-06-20 15:22:30 +03:00
meson@3a9b285a55 meson: require 0.61.3 2022-10-01 21:16:36 +02:00
migration migration: Skip only empty block devices 2024-03-19 19:23:00 +03:00
monitor monitor/hmp-cmds-target: Append a space in error message in gpa2hva() 2024-04-09 20:09:20 +03:00
nbd nbd/server: Mark negotiation functions as coroutine_fn 2024-04-28 15:16:47 +03:00
net net: Update MemReentrancyGuard for NIC 2023-11-29 16:20:11 +03:00
pc-bios optionrom: Remove build-id section 2023-10-03 18:21:41 +03:00
plugins plugins: add [pre|post]fork helpers to linux-user 2022-10-06 11:53:41 +01:00
po po: add ukrainian translation 2022-07-05 10:15:49 +02:00
python python: drop pipenv 2023-09-11 10:53:50 +03:00
qapi qapi/qom: Document feature unstable of @x-vfio-user-server 2024-07-19 19:59:34 +03:00
qga qga/win32: Use rundll for VSS installation 2023-08-02 16:07:32 +03:00
qobject include/qapi: add g_autoptr support for qobject types 2022-04-06 10:50:38 +02:00
qom module: add Error arguments to module_load and module_load_qom 2022-11-06 09:48:50 +01:00
replay replay: Fix declaration of replay_read_next_clock 2022-11-29 11:09:11 -05:00
roms target/hppa: Update to SeaBIOS-hppa version 8 2023-06-26 19:35:39 +03:00
scripts make-release: switch to .xz format by default 2024-03-13 23:09:00 +03:00
scsi QIOChannel: Add flags on io_writev and introduce io_flush callback 2022-05-16 13:56:24 +01:00
semihosting semihosting/arm-compat-semi: Avoid using hardcoded /tmp 2022-10-31 20:37:58 +00:00
softmmu system/qdev-monitor: move drain_call_rcu call under if (!dev) in qmp_device_add() 2024-03-13 23:09:00 +03:00
storage-daemon qsd: Unlink absolute PID file path 2022-07-12 14:30:38 +02:00
stubs qga: Allow building of the guest agent without system emulators or tools 2022-11-11 09:17:45 +01:00
subprojects libvhost-user: check for NULL when allocating a virtqueue element 2023-03-29 10:20:04 +03:00
target target/arm: Handle denormals correctly for FMOPA (widening) 2024-08-02 14:01:04 +03:00
tcg tcg/loongarch64: Fix tcg_out_movi vs some pcrel pointers 2024-06-20 15:25:32 +03:00
tests iotests/270: Don't store data-file with json: prefix in image 2024-07-03 23:51:32 +03:00
tools virtiofsd: Add sigreturn to the seccomp whitelist 2022-11-25 13:56:05 -05:00
trace include/hw/core: Create struct CPUJumpCache 2022-10-04 12:13:12 -07:00
ui ui/sdl2: Allow host to power down screen 2024-06-06 14:20:13 +03:00
util util/async.c: Forbid negative min/max in aio_context_set_thread_pool_params() 2024-07-27 22:20:09 +03:00
.cirrus.yml ci: Upgrade msys2 release to 20220603 2022-07-29 10:33:29 -07:00
.dir-locals.el Add .dir-locals.el file to configure emacs coding style 2015-10-08 19:46:01 +03:00
.editorconfig .editorconfig: update the automatic mode setting for Emacs 2021-03-10 15:34:11 +00:00
.exrc
.gdbinit .gdbinit: load QEMU sub-commands when gdb starts 2017-06-07 14:38:45 +01:00
.gitattributes gitattributes: Cover Objective-C source files 2022-03-29 00:15:14 +02:00
.gitignore .gitignore: add multiple items to .gitignore 2022-10-22 22:48:17 +02:00
.gitlab-ci.yml docs: Document GitLab custom CI/CD variables 2021-07-29 07:56:01 +02:00
.gitmodules Remove the slirp submodule (i.e. compile only with an external libslirp) 2022-09-26 17:23:47 +02:00
.gitpublish Add a git-publish configuration file 2018-03-05 09:03:17 +00:00
.mailmap MAINTAINERS: Replace my amsat.org email address 2022-10-17 17:21:22 -04:00
.patchew.yml scripts/checkpatch: roll diff tweaking into checkpatch itself 2021-06-25 10:08:33 +01:00
.readthedocs.yml readthodocs: fully specify a build environment 2024-01-23 18:48:46 +03:00
.travis.yml Revert "gitlab: disable accelerated zlib for s390x" 2022-07-20 12:15:09 +01:00
block.c block: Parse filenames only when explicitly requested 2024-07-04 00:08:21 +03:00
blockdev-nbd.c nbd/server: Allow MULTI_CONN for shared writable exports 2022-05-12 13:10:52 +02:00
blockdev.c block: Fix use after free in blockdev_mark_auto_del() 2023-05-18 21:09:59 +03:00
blockjob.c block: Make bdrv_child_get_parent_aio_context I/O 2022-11-10 14:58:34 +01:00
configure configure: fix GLIB_VERSION for cross-compilation 2023-03-29 10:20:04 +03:00
COPYING
COPYING.LIB COPYING.LIB: Synchronize the LGPL 2.1 with the version from gnu.org 2019-01-30 11:01:22 +01:00
cpu.c accel/tcg: Complete cpu initialization before registration 2022-11-01 08:31:41 +11:00
cpus-common.c cpus: Introduce cpu_list_generation_id 2022-07-20 12:15:08 +01:00
disas.c disas: use result of ->read_memory_func 2022-10-06 11:53:40 +01:00
event-loop-base.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
gitdm.config contrib/gitdm: add a new interns group-map for GSoC/Outreachy work 2021-07-23 17:22:16 +01:00
hmp-commands-info.hx hmp: add virtio commands 2022-10-09 16:38:45 -04:00
hmp-commands.hx qapi: net: add stream and dgram netdevs 2022-10-28 13:28:52 +08:00
iothread.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
job-qmp.c job.c: enable job lock/unlock and remove Aiocontext locks 2022-10-07 12:11:41 +02:00
job.c block: remove bdrv_try_set_aio_context and replace it with bdrv_try_change_aio_context 2022-10-27 20:14:11 +02:00
Kconfig meson: Introduce target-specific Kconfig 2021-07-09 18:21:34 +02:00
Kconfig.host vfio-user: build library 2022-06-15 16:42:33 +01:00
LICENSE tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing 2019-11-11 15:11:21 +01:00
MAINTAINERS gitlab-ci: Remove job building EDK2 firmware binaries 2024-04-24 12:29:57 +03:00
Makefile configure: cleanup creation of tests/tcg target config 2022-10-06 11:53:40 +01:00
memory_ldst.c.inc exec/memory_ldst: Use correct type sizes 2021-05-26 08:35:51 -07:00
meson_options.txt gtk: disable GTK Clipboard with a new meson option 2022-11-23 12:15:06 +01:00
meson.build Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h" 2023-11-02 15:09:22 +03:00
module-common.c all: Clean up includes 2016-02-04 17:41:30 +00:00
os-posix.c os-posix: asynchronous teardown for shutdown on Linux 2022-10-31 09:46:34 +01:00
os-win32.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary-common.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary.c include: move target page bits declaration to page-vary.h 2022-04-06 14:31:43 +02:00
qemu-bridge-helper.c qemu-bridge-helper: relocate path to default ACL 2020-09-30 19:11:36 +02:00
qemu-edid.c qemu-edid: Restrict input parameter -d to avoid division by zero 2022-10-12 13:38:15 +02:00
qemu-img-cmds.hx qemu-img: Unify [-b [-F]] documentation 2022-02-01 13:49:15 +01:00
qemu-img.c qemu-img: rebase: stop when reaching EOF of old backing file 2023-11-02 15:04:24 +03:00
qemu-io-cmds.c block: Change blk_pwrite_compressed() param order 2022-07-12 12:14:56 +02:00
qemu-io.c include: move qemu_*_exec_dir() to cutils 2022-05-28 11:42:56 +02:00
qemu-keymap.c qemu-keymap: Add license in generated files 2021-12-17 10:41:50 +01:00
qemu-nbd.c qemu-nbd: regression with arguments passing into nbd_client_thread() 2023-07-31 09:11:17 +03:00
qemu-options.hx qemu-options: Fix CXL Fixed Memory Window interleave-granularity typo 2024-04-10 19:27:46 +03:00
qemu.nsi nsis installer: Fix mouse-over descriptions for emulators 2022-03-18 10:55:15 +00:00
qemu.sasl sasl: remove comment about obsolete kerberos versions 2021-06-14 13:28:50 +01:00
README.rst README.rst: fix link formatting 2022-08-04 13:44:21 +02:00
replication.c replication: move include out of root directory 2021-05-26 14:49:46 +02:00
trace-events gdbstub: move into its own sub directory 2022-10-06 11:53:41 +01:00
VERSION Update version for 7.2.13 release 2024-07-16 08:40:38 +03:00
version.rc configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION 2021-01-02 21:03:37 +01:00

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_