qemu/hw
Klaus Jensen 9529aa6bb4 hw/nvme: fix handling of over-committed queues
If a host chooses to use the SQHD "hint" in the CQE to know if there is
room in the submission queue for additional commands, it may result in a
situation where there are not enough internal resources (struct
NvmeRequest) available to process the command. For a lack of a better
term, the host may "over-commit" the device (i.e., it may have more
inflight commands than the queue size).

For example, assume a queue with N entries. The host submits N commands
and all are picked up for processing, advancing the head and emptying
the queue. Regardless of which of these N commands complete first, the
SQHD field of that CQE will indicate to the host that the queue is
empty, which allows the host to issue N commands again. However, if the
device has not posted CQEs for all the previous commands yet, the device
will have less than N resources available to process the commands, so
queue processing is suspended.

And here lies an 11 year latent bug. In the absense of any additional
tail updates on the submission queue, we never schedule the processing
bottom-half again unless we observe a head update on an associated full
completion queue. This has been sufficient to handle N-to-1 SQ/CQ setups
(in the absense of over-commit of course). Incidentially, that "kick all
associated SQs" mechanism can now be killed since we now just schedule
queue processing when we return a processing resource to a non-empty
submission queue, which happens to cover both edge cases. However, we
must retain kicking the CQ if it was previously full.

So, apparently, no previous driver tested with hw/nvme has ever used
SQHD (e.g., neither the Linux NVMe driver or SPDK uses it). But then OSv
shows up with the driver that actually does. I salute you.

Fixes: f3c507adcd ("NVMe: Initial commit for new storage interface")
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2388
Reported-by: Waldemar Kozaczuk <jwkozaczuk@gmail.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2024-11-08 09:14:30 +01:00
..
9pfs 9p: remove 'proxy' filesystem backend driver 2024-10-03 19:33:25 +02:00
acpi hw/acpi: Update GED with vCPU Hotplug VMSD for migration 2024-11-04 16:03:25 -05:00
adc hw/adc: Remove MAX111X device 2024-10-15 15:16:17 +01:00
alpha alpha: switch boards to "default y" 2024-05-03 15:47:47 +02:00
arm virtio,pc,pci: features, fixes, cleanups 2024-11-05 15:47:52 +00:00
audio replace error_setg(&error_fatal, ...) with error_report() 2024-10-21 22:40:47 +03:00
avr avr: switch boards to "default y" 2024-05-03 15:47:47 +02:00
block Misc HW patch queue 2024-11-06 17:28:45 +00:00
char hw/char/sifive_uart: Fix broken UART on big endian hosts 2024-11-07 08:16:53 +10:00
core * rust: cleanups 2024-11-06 21:27:47 +00:00
cpu hw: Add a Kconfig switch for the TYPE_CPU_CLUSTER device 2024-04-25 12:48:12 +02:00
cxl hw/cxl: Ensure there is enough data to read the input header in cmd_get_physical_port_state() 2024-11-04 16:03:25 -05:00
display virtio-gpu: Support Venus context 2024-10-28 16:56:36 +00:00
dma hw/dma: Remove omap_dma4 device 2024-10-01 14:58:07 +01:00
fsi hw: Use device_class_set_legacy_reset() instead of opencoding 2024-09-13 15:31:44 +01:00
gpio hw/gpio/mpc8xxx: Prefer DEFINE_TYPES() macro 2024-11-05 23:32:25 +00:00
hppa hw/char: Extract serial-mm 2024-10-03 19:33:23 +02:00
hyperv hw/hyperv: remove return after g_assert_not_reached() 2024-09-24 13:53:35 +02:00
i2c hw/i2c/smbus_eeprom: Prefer DEFINE_TYPES() macro 2024-11-05 23:32:25 +00:00
i386 Misc HW patch queue 2024-11-06 17:28:45 +00:00
ide hw/ide: Remove DSCM-1XXXX microdrive device model 2024-10-15 15:16:17 +01:00
input hw/input: Remove lm832x device 2024-10-01 14:41:10 +01:00
intc pnv/xive2: TIMA CI ops using alternative offsets or byte lengths 2024-11-04 09:14:54 +10:00
ipack hw/ipack: Constify VMState 2023-12-29 11:17:30 +11:00
ipmi hw/ipmi: Constify VMState 2023-12-29 11:17:30 +11:00
isa hw/char/serial.h: Extract serial-isa.h 2024-10-03 19:33:23 +02:00
loongarch hw/loongarch/boot: Use warn_report when no kernel filename 2024-11-02 15:20:41 +08:00
m68k next-cube: remove cpu parameter from next_scsi_init() 2024-11-04 14:16:11 +01:00
mem hw/cxl/cxl-mailbox-utils: Fix for device DDR5 ECS control feature tables 2024-11-04 16:03:24 -05:00
microblaze hw/microblaze/s3adsp1800: Declare machine type using DEFINE_TYPES macro 2024-11-05 23:32:13 +00:00
mips hw/mips: Have mips_cpu_create_with_clock() take an endianness argument 2024-10-15 12:21:06 -03:00
misc hw/misc/aspeed_hace: Fix SG Accumulative hashing 2024-10-24 07:57:47 +02:00
net Misc HW patch queue 2024-11-06 17:28:45 +00:00
nubus hw/nubus/nubus-device: Range check 'slot' property 2024-09-08 11:49:49 +02:00
nvme hw/nvme: fix handling of over-committed queues 2024-11-08 09:14:30 +01:00
nvram hw: Remove unused fw_cfg_init_io 2024-10-03 17:26:06 +03:00
openrisc hw/char: Extract serial-mm 2024-10-03 19:33:23 +02:00
pci pcie: enable Extended tag field support 2024-11-04 16:03:25 -05:00
pci-bridge hw/pci-bridge: Make pxb_dev_realize_common() return if it succeeded 2024-11-04 16:03:25 -05:00
pci-host Misc HW patch queue 2024-11-06 17:28:45 +00:00
ppc hw/ppc/mpc8544_guts: Prefer DEFINE_TYPES() macro 2024-11-05 23:32:25 +00:00
remote remote: Remove unused remote_iohub_finalize 2024-10-03 17:26:06 +03:00
riscv hw/riscv/riscv-iommu: fix riscv_iommu_validate_process_ctx() check 2024-11-07 08:19:39 +10:00
rtc Misc HW patch queue 2024-11-06 17:28:45 +00:00
rx kconfig: express dependency of individual boards on libfdt 2024-05-10 15:45:15 +02:00
s390x hw/s390x: Re-enable the pci-bridge device on s390x 2024-11-04 14:16:11 +01:00
scsi hw/vhost-scsi: fix -Werror=maybe-uninitialized 2024-10-02 16:14:29 +04:00
sd hw/sd/sdhci: Prefer DEFINE_TYPES() macro 2024-11-05 23:32:25 +00:00
sensor hw/sensor/tmp105: Convert printf() to trace event, add tracing for read/write access 2024-11-05 10:10:00 +00:00
sh4 Revert "hw/sh4/r2d: Realize IDE controller before accessing it" 2024-10-21 16:40:11 +02:00
smbios smbios: make memory device size configurable per Machine 2024-07-22 20:15:41 -04:00
sparc hw: Use device_class_set_legacy_reset() instead of opencoding 2024-09-13 15:31:44 +01:00
sparc64 hw/char: Extract serial-mm 2024-10-03 19:33:23 +02:00
ssi hw/ssi/pnv_spi: Fixes Coverity CID 1558831 2024-11-04 09:09:15 +10:00
timer target-arm queue: 2024-11-05 21:27:18 +00:00
tpm hw/tpm: remove break after g_assert_not_reached() 2024-09-24 13:53:35 +02:00
tricore hw: Use device_class_set_legacy_reset() instead of opencoding 2024-09-13 15:31:44 +01:00
ufs hw/ufs: minor bug fixes related to ufs-test 2024-09-06 18:04:16 +09:00
usb hw/usb/hcd-ehci-sysbus: Prefer DEFINE_TYPES() macro 2024-11-05 23:32:25 +00:00
vfio vfio/migration: Add vfio_save_block_precopy_empty_hit trace event 2024-11-05 15:51:14 +01:00
virtio virtio,pc,pci: features, fixes, cleanups 2024-11-05 15:47:52 +00:00
watchdog hw/watchdog/wdt_imx2: Remove redundant assignment 2024-11-05 10:10:00 +00:00
xen hw/xen: Avoid use of uninitialized bufioreq_evtchn 2024-10-21 07:53:21 +02:00
xenpv hw/xen: Register framebuffer backend via xen_backend_init() 2024-06-04 11:53:43 +02:00
xtensa hw/xtensa/xtfpga: Remove TARGET_BIG_ENDIAN #ifdef'ry 2024-10-15 12:13:59 -03:00
Kconfig hw: Remove PCMCIA subsystem 2024-10-15 15:16:17 +01:00
meson.build hw: Remove PCMCIA subsystem 2024-10-15 15:16:17 +01:00