qemu/hw
thomas 6ce7791497 virtio-net: Fix network stall at the host side waiting for kick
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit f937309fbd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: context fixup in include/hw/virtio/virtio.h)
2024-08-28 08:37:28 +03:00
..
9pfs * configure: use a native non-cross compiler for linux-user 2024-01-04 19:55:20 +00:00
acpi hmat acpi: Fix out of bounds access due to missing use of indirection 2024-03-12 17:59:46 -04:00
adc hw/adc: Constify VMState 2023-12-29 11:17:30 +11:00
alpha hw/alpha/dp264: use pci_init_nic_devices() 2024-02-02 16:23:47 +00:00
arm hw/arm/mps2-tz.c: fix RX/TX interrupts order 2024-08-28 08:37:28 +03:00
audio virtio-snd: check for invalid param shift operands 2024-07-24 07:38:33 +03:00
avr hw/avr/atmega: Fix wrong initial value of stack pointer 2023-11-28 14:27:12 +01:00
block hw/pflash: fix block write start 2024-05-27 07:37:34 +03:00
char hw/char/bcm2835_aux: Fix assert when receive FIFO fills up 2024-08-28 08:37:28 +03:00
core hw/core: allow parameter=1 for SMP topology on any machine 2024-07-03 11:44:24 +03:00
cpu target/arm: Move GTimer definitions to new 'gtimer.h' header 2024-01-26 11:30:49 +00:00
cris hw/net/etraxfs-eth: use qemu_configure_nic_device() 2024-02-02 16:23:47 +00:00
cxl hw/cxl/cxl-host: Fix segmentation fault when getting cxl-fmw property 2024-07-24 07:31:09 +03:00
display stdvga: fix screen blanking 2024-06-20 10:04:31 +03:00
dma hw/dmax/xlnx_dpdma: fix handling of address_extension descriptor fields 2024-05-02 13:16:29 +03:00
fsi hw/fsi: Aspeed APB2OPB & On-chip peripheral bus 2024-02-01 08:33:18 +01:00
gpio * Fix timeouts in Travis-CI jobs 2024-03-25 14:19:42 +00:00
hppa hw/hppa: do not require CONFIG_USB 2024-02-27 09:37:13 +01:00
hyperv vmbus: Print a warning when enabled without the recommended set of features 2024-03-08 14:18:56 +01:00
i2c hw/i2c: Implement Broadcom Serial Controller (BSC) 2024-03-05 13:22:55 +00:00
i386 hw/i386/amd_iommu: Don't leak memory in amdvi_update_iotlb() 2024-08-28 08:37:28 +03:00
ide hw/ide/ahci: Rename ahci_internal.h to ahci-internal.h 2024-03-11 22:09:42 +01:00
input hw/input/pckbd: Open-code i8042_setup_a20_line() wrapper 2024-02-22 12:47:35 +01:00
intc hw/intc/loongson_ipi: Fix resource leak 2024-07-24 16:14:36 +03:00
ipack hw/ipack: Constify VMState 2023-12-29 11:17:30 +11:00
ipmi hw/ipmi: Constify VMState 2023-12-29 11:17:30 +11:00
isa hw/isa/vt82c686: Keep track of PIRQ/PINT pins separately 2024-04-15 13:07:11 +02:00
loongarch hw/loongarch/virt: Fix FDT memory node address width 2024-05-27 07:50:35 +03:00
m68k virt: set the CPU type in BOOTINFO 2024-03-11 09:38:08 +01:00
mem hw/mem/cxl_type3: Fix missing ERRP_GUARD() in ct3_realize() 2024-03-12 17:56:55 -04:00
microblaze hw/microblaze: Do not allow xlnx-zynqmp-pmu-soc to be created by the user 2024-03-25 09:57:43 +01:00
mips mips: do not list individual devices from configs/ 2024-03-08 15:51:22 +01:00
misc hw/misc/bcm2835_property: Fix handling of FRAMEBUFFER_SET_PALETTE 2024-08-28 08:37:28 +03:00
net virtio-net: Fix network stall at the host side waiting for kick 2024-08-28 08:37:28 +03:00
nios2 target/nios2: Deprecate the Nios II architecture 2023-11-23 14:10:04 +00:00
nubus hw/nubus: add nubus-virtio-mmio device 2024-02-27 09:36:39 +01:00
nvme hw/nvme: fix memory leak in nvme_dsm 2024-07-23 21:00:28 +03:00
nvram hw/nvram/mac_nvram: Report failure to write data 2024-03-25 10:41:01 +00:00
openrisc hw/openrisc/openrisc_sim: use qemu_create_nic_device() 2024-02-02 16:23:47 +00:00
pci virtio,pc,pci: features, cleanups, fixes 2024-03-13 15:11:53 +00:00
pci-bridge virtio,pc,pci: features, cleanups, fixes 2024-03-13 15:11:53 +00:00
pci-host hw/pci-host/ppc440_pcix: Do not expose a bridge device on PCI bus 2024-04-15 13:07:15 +02:00
pcmcia hw/pcmcia/pxa2xx: Inline pxa2xx_pcmcia_init() 2023-10-27 12:48:57 +01:00
ppc hw/ppc/spapr: Include missing 'sysemu/tcg.h' header 2024-03-30 18:50:23 +10:00
rdma hw/rdma/vmw/pvrdma_cmd: Use correct struct in query_port() 2023-10-21 15:00:22 +03:00
remote hw/remote/vfio-user: Fix config space access byte order 2024-05-10 13:20:20 +03:00
riscv target/riscv/kvm: fix timebase-frequency when using KVM acceleration 2024-03-22 15:41:01 +10:00
rtc hw/rtc/sun4v-rtc: Relicense to GPLv2-or-later 2024-03-07 12:54:56 +00:00
rx hw/rx/rx62n: Only call qdev_get_gpio_in() when necessary 2024-02-15 16:58:46 +01:00
s390x hw/core: Declare CPUArchId::cpu as CPUState instead of Object 2024-03-12 11:46:16 +01:00
scsi scsi: fix regression and honor bootindex again for legacy drives 2024-07-19 19:43:05 +03:00
sd hw/sd/sdhci: Do not update TRNMOD when Command Inhibit (DAT) is set 2024-04-10 09:09:34 +02:00
sensor hw/sensor: Constify VMState 2023-12-30 07:38:06 +11:00
sh4 hw/usb: extract sysbus-ohci to a separate file 2024-02-27 09:37:25 +01:00
smbios hw/smbios: add stub for smbios_get_table_legacy() 2024-03-26 14:32:54 +01:00
sparc hw/sparc/leon3: Fix wrong usage of DO_UPCAST macro 2024-02-22 12:47:40 +01:00
sparc64 sun4u: remap ebus BAR0 to use unassigned_io_ops instead of alias to PCI IO space 2024-03-11 22:10:18 +01:00
ssi aspeed/smc: Only wire flash devices at reset 2024-03-19 11:58:15 +01:00
timer misc: pxa2xx_timer: replace qemu_system_reset_request() call with watchdog_perform_action() 2024-02-27 13:01:41 +00:00
tpm hw/tpm: Remove HOST_PAGE_ALIGN from tpm_ppi_init 2024-02-29 11:35:36 -10:00
tricore hw/tricore/testboard: Use qdev_new() instead of QOM basic API 2024-02-22 12:47:40 +01:00
ufs hw/ufs: Fix buffer overflow bug 2024-05-02 13:03:05 +03:00
usb usb-storage: Fix BlockConf defaults 2024-04-16 11:50:52 +01:00
vfio vfio/iommufd: Fix memory leak 2024-03-19 11:56:37 +01:00
virtio virtio-net: Fix network stall at the host side waiting for kick 2024-08-28 08:37:28 +03:00
watchdog hw/watchdog: Constify VMState 2023-12-30 07:38:06 +11:00
xen Xen queue: 2024-03-12 21:32:31 +00:00
xenpv hw/xen: use qemu_create_nic_bus_devices() to instantiate Xen NICs 2024-02-02 16:23:47 +00:00
xtensa hw/xtensa/xtfpga: use qemu_create_nic_device() 2024-02-02 16:23:47 +00:00
Kconfig hw/fsi: Introduce IBM's Local bus 2024-02-01 08:13:30 +01:00
meson.build hw/fsi: Introduce IBM's Local bus 2024-02-01 08:13:30 +01:00