Go to file
thomas f937309fbd virtio-net: Fix network stall at the host side waiting for kick
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-08-02 11:09:52 +08:00
.github/workflows github: fix config mistake preventing repo lockdown commenting 2022-04-26 16:12:26 +01:00
.gitlab/issue_templates .gitlab/issue_templates: Move suggestions into comments 2022-12-15 15:19:24 +01:00
.gitlab-ci.d gitlab: display /packages.txt in build jobs 2024-07-30 11:38:34 +01:00
accel accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logic 2024-08-01 10:15:03 +01:00
audio ui: add more tracing for dbus 2024-07-22 12:47:28 +04:00
authz error: Drop superfluous #include "qapi/qmp/qerror.h" 2023-02-23 13:56:14 +01:00
backends vfio queue: 2024-07-24 12:58:46 +10:00
block block/curl: rewrite http header parsing function 2024-07-17 14:04:15 +03:00
bsd-user bsd-user: Add target.h for aarch64. 2024-07-24 16:02:07 -06:00
chardev chardev: add tracing of socket error conditions 2024-07-24 10:39:10 +01:00
common-user common-user/host/ppc: Implement safe-syscall.inc.S 2023-01-23 14:39:48 -10:00
configs bsd-user: Add aarch64 build to tree 2024-07-23 10:56:30 -06:00
contrib contrib/plugins: add compat for g_memdup2 2024-07-30 11:44:21 +01:00
crypto crypto: propagate errors from TLS session I/O callbacks 2024-07-24 10:39:10 +01:00
disas disas/riscv: Add decode for Zawrs extension 2024-07-18 12:08:44 +10:00
docs Revert "pcie_sriov: Ensure VF function number does not overflow" 2024-08-01 04:32:00 -04:00
dump dump: make range overlap check more readable 2024-07-23 20:30:36 +02:00
ebpf ebpf: Added traces back. Changed source set for eBPF to 'system'. 2024-06-04 15:14:26 +08:00
fpu target/sparc: Implement FMAf extension 2024-06-05 09:05:10 -07:00
fsdev configure, meson: rename targetos to host_os 2023-12-31 09:11:29 +01:00
gdb-xml target/loongarch/gdbstub: Add vector registers support 2024-07-19 10:40:04 +08:00
gdbstub virtio,pci,pc: features,fixes 2024-07-24 09:32:04 +10:00
host/include util/cpuinfo-riscv: Support host/cpuinfo.h for riscv 2024-07-03 10:24:12 -07:00
hw virtio-net: Fix network stall at the host side waiting for kick 2024-08-02 11:09:52 +08:00
include virtio-net: Fix network stall at the host side waiting for kick 2024-08-02 11:09:52 +08:00
io crypto: propagate errors from TLS session I/O callbacks 2024-07-24 10:39:10 +01:00
libdecnumber libdecnumber/dpd/decimal64: Fix compiler warning from Clang 15 2022-11-11 09:13:52 +01:00
linux-headers linux-header: PPC: KVM: Update one-reg ids for DEXCR, HASHKEYR and HASHPKEYR 2024-07-26 09:21:06 +10:00
linux-user linux-user: open_self_stat: Implement num_threads 2024-07-30 07:59:23 +10:00
migration migration/postcopy: Add postcopy-recover-setup phase 2024-06-21 09:47:59 -03:00
monitor gdbstub: move enums into separate header 2024-06-24 10:14:17 +01:00
nbd nbd/server: Mark negotiation functions as coroutine_fn 2024-04-25 12:59:19 -05:00
net vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits 2024-07-21 14:45:56 -04:00
pc-bios roms/opensbi: Update to v1.5 2024-07-18 12:08:45 +10:00
plugins plugin/loader: handle basic help query 2024-07-30 11:44:21 +01:00
po po: add ukrainian translation 2022-07-05 10:15:49 +02:00
python python: enable testing for 3.13 2024-07-12 16:36:20 -04:00
qapi s390x updates: 2024-07-30 19:21:58 +10:00
qga qga/linux: Add new api 'guest-network-get-route' 2024-07-23 09:49:07 +03:00
qobject docs/interop: Convert qmp-spec.txt to rST 2023-05-22 10:21:01 +02:00
qom target/sparc/cpu: Rename the CPU models with a "+" in their names 2024-05-05 21:02:47 +01:00
replay replay: Improve error messages about configuration conflicts 2024-03-09 18:56:36 +03:00
roms roms/opensbi: Update to v1.5 2024-07-18 12:08:45 +10:00
scripts trivial patches for 2024-07-17 2024-07-18 10:07:23 +10:00
scsi configure, meson: rename targetos to host_os 2023-12-31 09:11:29 +01:00
semihosting semihosting: Restrict to TCG 2024-07-22 09:38:16 +01:00
stats meson: Replace softmmu_ss -> system_ss 2023-06-20 10:01:30 +02:00
storage-daemon Revert "meson: Propagate gnutls dependency" 2024-07-03 18:41:26 +02:00
stubs meson: Drop the .fa library suffix 2024-07-03 18:41:26 +02:00
subprojects libvhost-user: mask F_INFLIGHT_SHMFD if memfd is not supported 2024-07-02 09:27:56 -04:00
system system/physmem: Where we assume we have a RAM MR, assert it 2024-07-29 17:03:35 +01:00
target target/xtensa: Correct assert condition in handle_interrupt() 2024-08-01 10:59:01 +01:00
tcg * meson: Pass objects and dependencies to declare_dependency(), not static_library() 2024-07-04 09:16:07 -07:00
tests tests/vm/openbsd: Install tomli 2024-07-31 13:13:31 +02:00
tools qemu-vmsr-helper: implement --verbose/-v 2024-07-31 13:15:06 +02:00
trace trace: Remove deprecated 'vcpu' field from QMP trace events 2024-06-04 11:53:43 +02:00
ui util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr() 2024-07-23 22:34:54 +02:00
util util/cpuinfo: Make use of elf_aux_info(3) on OpenBSD 2024-07-30 07:59:23 +10:00
.dir-locals.el Add .dir-locals.el file to configure emacs coding style 2015-10-08 19:46:01 +03:00
.editorconfig .editorconfig: update the automatic mode setting for Emacs 2021-03-10 15:34:11 +00:00
.exrc qemu: add .exrc 2012-09-07 09:02:44 +03:00
.gdbinit .gdbinit: load QEMU sub-commands when gdb starts 2017-06-07 14:38:45 +01:00
.git-blame-ignore-revs metadata: add .git-blame-ignore-revs 2023-04-04 15:56:44 +01:00
.gitattributes gitattributes: Cover Objective-C source files 2022-03-29 00:15:14 +02:00
.gitignore configure: rename --enable-pypi to --enable-download, control subprojects too 2023-06-06 16:30:01 +02:00
.gitlab-ci.yml docs: Document GitLab custom CI/CD variables 2021-07-29 07:56:01 +02:00
.gitmodules meson: subprojects: replace berkeley-{soft,test}float-3 with wraps 2023-06-06 16:30:01 +02:00
.gitpublish Add a git-publish configuration file 2018-03-05 09:03:17 +00:00
.mailmap MAINTAINERS: Update Sriram Yagnaraman mail address 2024-04-24 16:03:38 +02:00
.patchew.yml scripts/checkpatch: roll diff tweaking into checkpatch itself 2021-06-25 10:08:33 +01:00
.readthedocs.yml readthodocs: fully specify a build environment 2024-01-12 13:23:48 +00:00
.travis.yml .travis.yml: Install python3-tomli in all build jobs 2024-07-02 09:47:47 +02:00
block.c block: Parse filenames only when explicitly requested 2024-07-02 18:12:30 +02:00
blockdev-nbd.c qapi block: Elide redundant has_FOO in generated C 2022-12-14 20:03:25 +01:00
blockdev.c qapi: blockdev-backup: add discard-source parameter 2024-05-28 15:52:15 +03:00
blockjob.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
configure tests/tcg/aarch64: Add MTE gdbstub tests 2024-07-05 12:35:36 +01:00
COPYING
COPYING.LIB COPYING.LIB: Synchronize the LGPL 2.1 with the version from gnu.org 2019-01-30 11:01:22 +01:00
cpu-common.c cpu-common.c: export cpu_get_free_index to be reused later 2024-07-26 09:21:06 +10:00
cpu-target.c cpu-target: don't set cpu->thread_id to bogus value 2024-06-04 10:02:39 +02:00
event-loop-base.c util/event-loop-base: Introduce options to set the thread pool size 2022-05-09 10:43:23 +01:00
gitdm.config contrib/gitdm: add group map for AMD 2023-03-22 15:08:26 +00:00
hmp-commands-info.hx hmp-commands-info.hx: Add missing info command for stats subcommand 2024-06-30 19:51:44 +03:00
hmp-commands.hx hmp/migration: Fix "migrate" command's documentation 2024-05-08 09:22:37 -03:00
iothread.c iothread: Simplify expression in qemu_in_iothread() 2024-02-13 10:59:25 +03:00
job-qmp.c qapi job: Elide redundant has_FOO in generated C 2022-12-14 20:04:47 +01:00
job.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
Kconfig meson: Introduce target-specific Kconfig 2021-07-09 18:21:34 +02:00
Kconfig.host kconfig: express dependency of individual boards on libfdt 2024-05-10 15:45:15 +02:00
LICENSE tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing 2019-11-11 15:11:21 +01:00
MAINTAINERS Revert "docs: Document composable SR-IOV device" 2024-08-01 04:32:00 -04:00
Makefile Makefile: fix use of -j without an argument 2024-04-12 12:02:12 +02:00
meson_options.txt meson: remove dead optimization option 2024-06-28 14:44:51 +02:00
meson.build util/cpuinfo: Make use of elf_aux_info(3) on OpenBSD 2024-07-30 07:59:23 +10:00
module-common.c all: Clean up includes 2016-02-04 17:41:30 +00:00
os-posix.c os-posix: Expand setrlimit() syscall compatibility 2024-06-30 19:51:44 +03:00
os-win32.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-target.c exec: Expose 'target_page.h' API to user emulation 2024-04-26 15:28:11 +02:00
page-vary-common.c Remove qemu-common.h include from most units 2022-04-06 14:31:55 +02:00
page-vary-target.c exec: Rename target specific page-vary.c -> page-vary-target.c 2023-10-04 11:03:54 -07:00
pythondeps.toml Python: bump minimum sphinx version to 3.4.3 2024-07-12 16:36:20 -04:00
qemu-bridge-helper.c qemu-bridge-helper: relocate path to default ACL 2020-09-30 19:11:36 +02:00
qemu-edid.c qemu-edid: Restrict input parameter -d to avoid division by zero 2022-10-12 13:38:15 +02:00
qemu-img-cmds.hx docs/devel/docs: Document .hx file syntax 2024-01-15 17:12:22 +00:00
qemu-img.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
qemu-io-cmds.c qemu-io: add cvtnum() error handling for zone commands 2024-06-10 11:05:43 +02:00
qemu-io.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00
qemu-keymap.c qemu-keymap: Make references to allocations static 2024-05-29 12:41:56 +02:00
qemu-nbd.c qemu-nbd: mention --tls-hostname option in qemu-nbd --help 2024-02-13 10:59:25 +03:00
qemu-options.hx trivial patches for 2024-07-17 2024-07-18 10:07:23 +10:00
qemu.nsi nsis installer: Fix mouse-over descriptions for emulators 2022-03-18 10:55:15 +00:00
qemu.sasl sasl: remove comment about obsolete kerberos versions 2021-06-14 13:28:50 +01:00
README.rst README.rst: add the missing punctuations 2024-07-17 14:04:15 +03:00
replication.c replication: move include out of root directory 2021-05-26 14:49:46 +02:00
trace-events tracepoints: move physmem trace points 2024-07-05 12:33:37 +01:00
VERSION Update version for v9.1.0-rc0 release 2024-07-31 16:21:21 +10:00
version.rc configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION 2021-01-02 21:03:37 +01:00

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website:

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to:

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC:

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_