qemu/hw/net
thomas f937309fbd virtio-net: Fix network stall at the host side waiting for kick
Patch 06b1297017 ("virtio-net: fix network stall under load")
added double-check to test whether the available buffer size
can satisfy the request or not, in case the guest has added
some buffers to the avail ring simultaneously after the first
check. It will be lucky if the available buffer size becomes
okay after the double-check, then the host can send the packet
to the guest. If the buffer size still can't satisfy the request,
even if the guest has added some buffers, viritio-net would
stall at the host side forever.

The patch enables notification and checks whether the guest has
added some buffers since last check of available buffers when
the available buffers are insufficient. If no buffer is added,
return false, else recheck the available buffers in the loop.
If the available buffers are sufficient, disable notification
and return true.

Changes:
1. Change the return type of virtqueue_get_avail_bytes() from void
   to int, it returns an opaque that represents the shadow_avail_idx
   of the virtqueue on success, else -1 on error.
2. Add a new API: virtio_queue_enable_notification_and_check(),
   it takes an opaque as input arg which is returned from
   virtqueue_get_avail_bytes(). It enables notification firstly,
   then checks whether the guest has added some buffers since
   last check of available buffers or not by virtio_queue_poll(),
   return ture if yes.

The patch also reverts patch "06b12970174".

The case below can reproduce the stall.

                                       Guest 0
                                     +--------+
                                     | iperf  |
                    ---------------> | server |
         Host       |                +--------+
       +--------+   |                    ...
       | iperf  |----
       | client |----                  Guest n
       +--------+   |                +--------+
                    |                | iperf  |
                    ---------------> | server |
                                     +--------+

Boot many guests from qemu with virtio network:
 qemu ... -netdev tap,id=net_x \
    -device virtio-net-pci-non-transitional,\
    iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x

Each guest acts as iperf server with commands below:
 iperf3 -s -D -i 10 -p 8001
 iperf3 -s -D -i 10 -p 8002

The host as iperf client:
 iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000
 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000

After some time, the host loses connection to the guest,
the guest can send packet to the host, but can't receive
packet from the host.

It's more likely to happen if SWIOTLB is enabled in the guest,
allocating and freeing bounce buffer takes some CPU ticks,
copying from/to bounce buffer takes more CPU ticks, compared
with that there is no bounce buffer in the guest.
Once the rate of producing packets from the host approximates
the rate of receiveing packets in the guest, the guest would
loop in NAPI.

         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               |  need kick the host?
               |  NAPI continues
               v
         receive packets    ---
               |             |
               v             |
           free buf      virtnet_poll
               |             |
               v             |
     add buf to avail ring  ---
               |
               v
              ...           ...

On the other hand, the host fetches free buf from avail
ring, if the buf in the avail ring is not enough, the
host notifies the guest the event by writing the avail
idx read from avail ring to the event idx of used ring,
then the host goes to sleep, waiting for the kick signal
from the guest.

Once the guest finds the host is waiting for kick singal
(in virtqueue_kick_prepare_split()), it kicks the host.

The host may stall forever at the sequences below:

         Host                        Guest
     ------------                 -----------
 fetch buf, send packet           receive packet ---
         ...                          ...         |
 fetch buf, send packet             add buf       |
         ...                        add buf   virtnet_poll
    buf not enough      avail idx-> add buf       |
    read avail idx                  add buf       |
                                    add buf      ---
                                  receive packet ---
    write event idx                   ...         |
    wait for kick                   add buf   virtnet_poll
                                      ...         |
                                                 ---
                                 no more packet, exit NAPI

In the first loop of NAPI above, indicated in the range of
virtnet_poll above, the host is sending packets while the
guest is receiving packets and adding buffers.
 step 1: The buf is not enough, for example, a big packet
         needs 5 buf, but the available buf count is 3.
         The host read current avail idx.
 step 2: The guest adds some buf, then checks whether the
         host is waiting for kick signal, not at this time.
         The used ring is not empty, the guest continues
         the second loop of NAPI.
 step 3: The host writes the avail idx read from avail
         ring to used ring as event idx via
         virtio_queue_set_notification(q->rx_vq, 1).
 step 4: At the end of the second loop of NAPI, recheck
         whether kick is needed, as the event idx in the
         used ring written by the host is beyound the
         range of kick condition, the guest will not
         send kick signal to the host.

Fixes: 06b1297017 ("virtio-net: fix network stall under load")
Cc: qemu-stable@nongnu.org
Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
2024-08-02 11:09:52 +08:00
..
can hw/net/can/xlnx-versal-canfd: Fix sorting of the tx queue 2024-06-21 14:01:58 +01:00
fsl_etsec net: Provide MemReentrancyGuard * to qemu_new_nic() 2023-11-21 15:42:34 +08:00
rocker net: Provide MemReentrancyGuard * to qemu_new_nic() 2023-11-21 15:42:34 +08:00
allwinner_emac.c util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr() 2024-07-23 22:34:54 +02:00
allwinner-sun8i-emac.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
cadence_gem.c hw/net: cadence_gem: Fix MDIO_OP_xxx values 2024-01-05 22:28:54 +03:00
dp8393x.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
e1000_common.h e1000: Split header files 2023-03-10 15:35:38 +08:00
e1000_regs.h hw/net: spelling fixes 2023-09-20 07:54:34 +03:00
e1000.c hw, target: Add ResetType argument to hold and exit phase methods 2024-04-25 10:21:06 +01:00
e1000e_core.c e1000e: fix link state on resume 2024-03-12 19:28:32 +08:00
e1000e_core.h e1000e: fix link state on resume 2024-03-12 19:28:32 +08:00
e1000e.c tap: Remove qemu_using_vnet_hdr() 2024-06-04 15:14:25 +08:00
e1000x_common.c e1000x: Take CRC into consideration for size check 2023-05-23 15:20:15 +08:00
e1000x_common.h e1000x: Share more Rx filtering logic 2023-05-23 15:20:15 +08:00
e1000x_regs.h hw/net: spelling fixes 2023-09-20 07:54:34 +03:00
eepro100.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
etraxfs_eth.c hw/net/etraxfs-eth: use qemu_configure_nic_device() 2024-02-02 16:23:47 +00:00
ftgmac100.c hw/net:ftgmac100: update TX and RX packet buffers address to 64 bits 2024-07-09 08:05:44 +02:00
i82596.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
i82596.h hw/net: Make NetCanReceive() return a boolean 2020-03-31 21:14:35 +08:00
igb_common.h igb: Add a VF reset handler 2023-11-13 15:33:37 +08:00
igb_core.c igb: fix link state on resume 2024-03-12 19:28:31 +08:00
igb_core.h igb: fix link state on resume 2024-03-12 19:28:31 +08:00
igb_regs.h hw/net: spelling fixes 2023-09-20 07:54:34 +03:00
igb.c Revert "pcie_sriov: Ensure VF function number does not overflow" 2024-08-01 04:32:00 -04:00
igbvf.c hw, target: Add ResetType argument to hold and exit phase methods 2024-04-25 10:21:06 +01:00
imx_fec.c hw/misc/imx: Replace sprintf() by snprintf() 2024-04-25 12:48:12 +02:00
Kconfig kconfig: Add PCIe devices to s390x machines 2023-07-14 11:10:57 +02:00
lan9118.c hw/net/lan9118: Fix overflow in MIL TX FIFO 2024-04-10 09:09:34 +02:00
lance.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
lasi_i82596.c hw/net/lasi_i82596: use qemu_create_nic_device() 2024-02-02 16:23:47 +00:00
mcf_fec.c net: Provide MemReentrancyGuard * to qemu_new_nic() 2023-11-21 15:42:34 +08:00
meson.build target/arm: fix exception syndrome for AArch32 bkpt insn 2024-02-02 18:56:32 +00:00
mipsnet.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
msf2-emac.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
mv88w8618_eth.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
ne2000-isa.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
ne2000-pci.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
ne2000.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
ne2000.h Include hw/hw.h exactly where needed 2019-08-16 13:31:52 +02:00
net_rx_pkt.c igb: Strip the second VLAN tag for extended VLAN 2023-05-23 15:20:15 +08:00
net_rx_pkt.h igb: Strip the second VLAN tag for extended VLAN 2023-05-23 15:20:15 +08:00
net_tx_pkt.c tap: Remove qemu_using_vnet_hdr() 2024-06-04 15:14:25 +08:00
net_tx_pkt.h igb: Implement Tx SCTP CSO 2023-05-23 15:20:15 +08:00
npcm7xx_emc.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
npcm_gmac.c hw/net: GMAC Tx Implementation 2024-02-02 13:51:59 +00:00
opencores_eth.c net: Provide MemReentrancyGuard * to qemu_new_nic() 2023-11-21 15:42:34 +08:00
pcnet-pci.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
pcnet.c Avoid unaligned fetch in ladr_match() 2024-03-12 19:28:32 +08:00
pcnet.h net: Replace TAB indentations with spaces 2022-11-11 09:39:03 +01:00
rtl8139.c rtl8139: Fix behaviour for old kernels. 2024-08-02 11:04:03 +08:00
smc91c111.c hw/net/smc91c111: use qemu_configure_nic_device() 2024-02-02 16:23:47 +00:00
spapr_llan.c hw/net/spapr: prevent potential NULL dereference 2024-07-02 06:47:51 +02:00
stellaris_enet.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
sungem.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
sunhme.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
trace-events hw/net: GMAC Tx Implementation 2024-02-02 13:51:59 +00:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
tulip.c hw/net/tulip: add chip status register values 2024-02-11 13:20:23 +01:00
tulip.h Use OBJECT_DECLARE_SIMPLE_TYPE when possible 2020-09-18 14:12:32 -04:00
vhost_net-stub.c virtio-net: add support for configure interrupt 2023-01-08 01:54:22 -05:00
vhost_net.c vhost,vhost-user: Add VIRTIO_F_IN_ORDER to vhost feature bits 2024-07-21 14:45:56 -04:00
virtio-net.c virtio-net: Fix network stall at the host side waiting for kick 2024-08-02 11:09:52 +08:00
vmware_utils.h hw/net/vmxnet3: Fix code to work on big endian hosts, too 2017-11-20 11:08:00 +08:00
vmxnet3_defs.h include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
vmxnet3.c tap: Remove qemu_using_vnet_hdr() 2024-06-04 15:14:25 +08:00
vmxnet3.h hw/net: spelling fixes 2023-09-20 07:54:34 +03:00
vmxnet_debug.h Clean up ill-advised or unusual header guards 2016-07-12 16:20:46 +02:00
xen_nic.c hw/net/xen_nic: Fix missing ERRP_GUARD() for error_prepend() 2024-03-09 18:51:45 +01:00
xgmac.c hw/net: Constify VMState 2023-12-30 07:38:06 +11:00
xilinx_axienet.c hw/net: Fix the transmission return size 2024-06-18 14:52:05 +02:00
xilinx_ethlite.c net: Provide MemReentrancyGuard * to qemu_new_nic() 2023-11-21 15:42:34 +08:00