qemu/hw
Yuri Benditovich 2974e916df virtio-net: support RSC v4/v6 tcp traffic for Windows HCK
This commit adds implementation of RX packets
coalescing, compatible with requirements of Windows
Hardware compatibility kit.

The device enables feature VIRTIO_NET_F_RSC_EXT in
host features if it supports extended RSC functionality
as defined in the specification.
This feature requires at least one of VIRTIO_NET_F_GUEST_TSO4,
VIRTIO_NET_F_GUEST_TSO6. Windows guest driver acks
this feature only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
is also present.

If the guest driver acks VIRTIO_NET_F_RSC_EXT feature,
the device coalesces TCPv4 and TCPv6 packets (if
respective VIRTIO_NET_F_GUEST_TSO feature is on,
populates extended RSC information in virtio header
and sets VIRTIO_NET_HDR_F_RSC_INFO bit in header flags.
The device does not recalculate checksums in the coalesced
packet, so they are not valid.

In this case:
All the data packets in a tcp connection are cached
to a single buffer in every receive interval, and will
be sent out via a timer, the 'virtio_net_rsc_timeout'
controls the interval, this value may impact the
performance and response time of tcp connection,
50000(50us) is an experience value to gain a performance
improvement, since the whql test sends packets every 100us,
so '300000(300us)' passes the test case, it is the default
value as well, tune it via the command line parameter
'rsc_interval' within 'virtio-net-pci' device, for example,
to launch a guest with interval set as '500000':

'virtio-net-pci,netdev=hostnet1,bus=pci.0,id=net1,mac=00,
guest_rsc_ext=on,rsc_interval=500000'

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets.

'NetRscChain' is used to save the segments of IPv4/6 in a
VirtIONet device.

A new segment becomes a 'Candidate' as well as it passed sanity check,
the main handler of TCP includes TCP window update, duplicated
ACK check and the real data coalescing.

An 'Candidate' segment means:
1. Segment is within current window and the sequence is the expected one.
2. 'ACK' of the segment is in the valid window.

Sanity check includes:
1. Incorrect version in IP header
2. An IP options or IP fragment
3. Not a TCP packet
4. Sanity size check to prevent buffer overflow attack.
5. An ECN packet

Even though, there might more cases should be considered such as
ip identification other flags, while it breaks the test because
windows set it to the same even it's not a fragment.

Normally it includes 2 typical ways to handle a TCP control flag,
'bypass' and 'finalize', 'bypass' means should be sent out directly,
while 'finalize' means the packets should also be bypassed, but this
should be done after search for the same connection packets in the
pool and drain all of them out, this is to avoid out of order fragment.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flags such 'URG/FIN/RST/CWR/ECE' will trigger a
finalization, because this normally happens upon a connection is going
to be closed, an 'URG' packet also finalize current coalescing unit.

Statistics can be used to monitor the basic coalescing status, the
'out of order' and 'out of window' means how many retransmitting packets,
thus describe the performance intuitively.

Difference between ip v4 and v6 processing:
 Fragment length in ipv4 header includes itself, while it's not
 included for ipv6, thus means ipv6 can carry a real 65535 payload.

Note that main goal of implementing this feature in software
is to create reference setup for certification tests. In such
setups guest migration is not required, so the coalesced packets
not yet delivered to the guest will be lost in case of migration.

Signed-off-by: Wei Xu <wexu@redhat.com>
Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-01-17 21:10:57 -05:00
..
9pfs xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
acpi pci/pcihp: perform unplug via the hotplug handler 2018-12-20 11:19:12 -05:00
adc
alpha avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
arm avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
audio hw/audio/marvell: Don't include unnecessary i2c.h header file 2019-01-10 09:51:42 +01:00
block qemu: avoid memory leak while remove disk 2019-01-14 19:31:04 -05:00
bt
char xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
core * HAX support for Linux hosts (Alejandro) 2019-01-11 15:46:09 +00:00
cpu hw/cpu: introduce CPU clusters 2019-01-07 15:23:45 +00:00
cris hw/cris: Use the IEC binary prefix definitions 2018-07-02 15:41:15 +02:00
display xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
dma avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
gpio avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
hppa hw/hppa/dino: Remove unuseful code 2018-10-24 06:44:59 -03:00
hyperv hw/hyperv: fix NULL dereference with pure-kvm SynIC 2018-11-26 14:14:38 -02:00
i2c i2c-ddc: fix oob read 2019-01-11 11:45:00 +01:00
i386 hw/misc/ivshmem: Remove deprecated "ivshmem" legacy device 2019-01-14 19:31:04 -05:00
ide avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
input avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
intc avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
ipack hw/ipack: Use the IEC binary prefix definitions 2018-07-02 15:41:12 +02:00
ipmi ipmi: Use proper struct reference for BT vmstate 2018-08-23 18:46:25 +02:00
isa configs: Add a CONFIG_SMC37C669 switch for the "smc37c669-superio" device 2018-10-24 07:33:44 +01:00
lm32 milkymist: Check for failure trying to load BIOS image 2018-11-06 11:32:14 +00:00
m68k hw/m68k: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
mem memory-device: rewrite address assignment using ranges 2019-01-09 22:09:31 -02:00
microblaze Support u-boot noload images for arm as used by, NetBSD/evbarm GENERIC kernel. 2019-01-07 15:46:20 +00:00
mips avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
misc hw/misc/ivshmem: Remove deprecated "ivshmem" legacy device 2019-01-14 19:31:04 -05:00
moxie trivial: Don't include isa.h if it is not really necessary 2019-01-09 11:24:35 +01:00
net virtio-net: support RSC v4/v6 tcp traffic for Windows HCK 2019-01-17 21:10:57 -05:00
nios2 Support u-boot noload images for arm as used by, NetBSD/evbarm GENERIC kernel. 2019-01-07 15:46:20 +00:00
nvram trivial: Don't include isa.h if it is not really necessary 2019-01-09 11:24:35 +01:00
openrisc
pci msix: make pba size math more uniform 2019-01-14 19:31:04 -05:00
pci-bridge pci/shpc: perform unplug via the hotplug handler 2018-12-20 11:19:12 -05:00
pci-host pam: wrap MemoryRegion initialization in a transaction 2019-01-11 13:57:23 +01:00
pcmcia
ppc * HAX support for Linux hosts (Alejandro) 2019-01-11 15:46:09 +00:00
rdma pvrdma: check return value from pvrdma_idx_ring_has_ routines 2018-12-22 11:09:57 +02:00
riscv sifive_uart: Implement interrupt pending register 2018-12-20 12:08:43 -08:00
s390x machine: Use shorter format for GlobalProperty arrays 2019-01-09 22:10:00 -02:00
scsi qemu: avoid memory leak while remove disk 2019-01-14 19:31:04 -05:00
sd hw/sd/sdhci: Don't leak memory region in sdhci_sysbus_realize() 2018-12-14 13:30:54 +00:00
sh4 avoid TABs in files that only contain a few 2019-01-11 15:46:56 +01:00
smbios hw/smbios: Move to the hw/firmware/ subdirectory 2018-12-19 16:48:16 -05:00
sparc trivial: Don't include isa.h if it is not really necessary 2019-01-09 11:24:35 +01:00
sparc64 hw/sparc64/niagara: Model the I/O Bridge with the 'unimplemented_device' 2018-10-24 06:44:59 -03:00
ssi hw/ssi/xilinx_spi: Use DeviceState::realize rather than SysBusDevice::init 2018-10-24 06:44:59 -03:00
timer trivial: Don't include isa.h if it is not really necessary 2019-01-09 11:24:35 +01:00
tpm tpm: Make sure the locality received from backend is valid 2018-12-04 10:21:25 -05:00
tricore hw/tricore: Use the IEC binary prefix definitions 2018-07-02 15:41:14 +02:00
unicore32
usb xen: re-name XenDevice to XenLegacyDevice... 2019-01-14 13:45:40 +00:00
vfio qemu/queue.h: typedef QTAILQ heads 2019-01-11 15:46:55 +01:00
virtio vhost-user: fix ioeventfd_enabled 2019-01-14 19:31:04 -05:00
watchdog hw/watchdog/wdt_i6300esb: remove a unnecessary comment 2019-01-11 15:46:55 +01:00
xen xen: automatically create XenBlockDevice-s 2019-01-14 13:45:40 +00:00
xenpv xen: Replace few mentions of xend by libxl 2019-01-14 13:45:40 +00:00
xtensa target/xtensa: xtfpga: provide default memory sizes 2018-11-21 10:53:21 -08:00
Makefile.objs memory-device: introduce separate config option 2018-10-24 06:44:59 -03:00