qemu/include/hw
Michael Roth 9d7bd0826f virtio-pci: disable vring processing when bus-mastering is disabled
Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.

In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.

However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:

  #2  0x0000555555922651 in virtqueue_map_desc (vdev=vdev@entry=0x555556dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
  #3  0x0000555555922a89 in virtqueue_pop (vq=vq@entry=0x555556dc8660, sz=sz@entry=184)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
  #4  0x00005555558d3eca in virtio_blk_get_request (vq=0x555556dc8660, s=0x555556dbcfb0)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
  #5  0x00005555558d3eca in virtio_blk_handle_vq (s=0x555556dbcfb0, vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
  #6  0x000055555591dd66 in virtio_queue_notify_aio_vq (vq=vq@entry=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
  #7  0x000055555591ecef in virtio_queue_notify_aio_vq (vq=0x555556dc8660)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
  #8  0x000055555591ecef in virtio_queue_host_notifier_aio_poll (opaque=0x555556dc86c8)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
  #9  0x0000555555d02164 in run_poll_handlers_once (ctx=ctx@entry=0x55555688bfc0, timeout=timeout@entry=0x7fffe65844a8)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:520
  #10 0x0000555555d02d1b in try_poll_mode (timeout=0x7fffe65844a8, ctx=0x55555688bfc0)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:607
  #11 0x0000555555d02d1b in aio_poll (ctx=ctx@entry=0x55555688bfc0, blocking=blocking@entry=true)
      at /home/mdroth/w/qemu.git/util/aio-posix.c:639
  #12 0x0000555555d0004d in aio_wait_bh_oneshot (ctx=0x55555688bfc0, cb=cb@entry=0x5555558d5130 <virtio_blk_data_plane_stop_bh>, opaque=opaque@entry=0x555556de86f0)
      at /home/mdroth/w/qemu.git/util/aio-wait.c:71
  #13 0x00005555558d59bf in virtio_blk_data_plane_stop (vdev=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
  #14 0x0000555555b906a1 in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
  #15 0x0000555555b90dbb in virtio_bus_stop_ioeventfd (bus=bus@entry=0x555556dbcf38)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
  #16 0x0000555555b92a8e in virtio_pci_stop_ioeventfd (proxy=0x555556db4e40)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
  #17 0x0000555555b92a8e in virtio_write_config (pci_dev=0x555556db4e40, address=<optimized out>, val=1048832, len=<optimized out>)
      at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613

I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:

  static bool virtio_queue_host_notifier_aio_poll(void *opaque)
  {
      EventNotifier *n = opaque;
      VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
      bool progress;

      if (!vq->vring.desc || virtio_queue_empty(vq)) {
          return false;
      }

      progress = virtio_queue_notify_aio_vq(vq);

namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:

  int virtio_queue_empty(VirtQueue *vq)
  {
      bool empty;
      ...

      if (vq->shadow_avail_idx != vq->last_avail_idx) {
          return 0;
      }

      rcu_read_lock();
      empty = vring_avail_idx(vq) == vq->last_avail_idx;
      rcu_read_unlock();
      return empty;

but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:

  "virtio: zero sized buffers are not allowed"

or

  "virtio-blk missing headers"

and puts the device in an error state.

This patch works around the issue by introducing virtio_set_disabled(),
which sets a 'disabled' flag to bypass checks like virtio_queue_empty()
when bus-mastering is disabled. Since we'd check this flag at all the
same sites as vdev->broken, we replace those checks with an inline
function which checks for either vdev->broken or vdev->disabled.

The 'disabled' flag is only migrated when set, which should be fairly
rare, but to maintain migration compatibility we disable it's use for
older machine types. Users requiring the use of the flag in conjunction
with older machine types can set it explicitly as a virtio-device
option.

NOTES:

 - This leaves some other oddities in play, like the fact that
   DRIVER_OK also gets unset in response to bus-mastering being
   disabled, but not restored (however the device seems to continue
   working)
 - Similarly, we disable the host notifier via
   virtio_bus_stop_ioeventfd(), which seems to move the handling out
   of virtio-blk dataplane and back into the main IO thread, and it
   ends up staying there till a reset (but otherwise continues working
   normally)

Cc: David Gibson <david@gibson.dropbear.id.au>,
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Message-Id: <20191120005003.27035-1-mdroth@linux.vnet.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-01-05 07:03:03 -05:00
..
acpi piix4: Add a MC146818 RTC Controller as specified in datasheet 2019-11-05 23:33:12 +01:00
adc include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
arm hw/arm/virt: Simplify by moving the gic in the machine state 2019-12-16 10:46:35 +00:00
audio Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
block block: Support providing LCHS from user 2019-10-31 11:47:11 -04:00
char escc: introduce a selector for the register bit 2019-09-07 08:32:12 +02:00
core Remove unassigned_access CPU hook 2019-11-11 13:44:16 +00:00
cpu Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
cris Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
display hw/m68k: add Nubus macfb video card 2019-10-28 19:06:49 +01:00
dma Include hw/hw.h exactly where needed 2019-08-16 13:31:52 +02:00
firmware machine: Refactor smp-related call chains to pass MachineState 2019-07-05 17:07:36 -03:00
gpio hw/gpio: Add basic Aspeed GPIO model for AST2400 and AST2500 2019-09-13 16:05:00 +01:00
hyperv hyperv: process POST_MESSAGE hypercall 2018-10-19 13:44:14 +02:00
i2c aspeed/i2c: Add support for DMA transfers 2019-12-16 10:46:34 +00:00
i386 hw/i386/pc: Extract the port92 device 2019-12-17 19:33:51 +01:00
ide sysemu: Move the VMChangeStateEntry typedef to qemu/typedefs.h 2019-08-16 13:31:53 +02:00
input Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
intc hw: replace hw/i386/pc.h with a header just for the i8259 2019-12-17 19:33:49 +01:00
ipack Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
ipmi ipmi: Add support to customize OEM functions 2019-12-17 10:39:47 +11:00
isa hw/isa/isa-bus: cleanup irq functions 2019-12-17 19:33:51 +01:00
kvm Supply missing header guards 2019-06-12 13:20:21 +02:00
lm32 Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
m68k m68k: Add NeXTcube machine 2019-09-07 08:31:51 +02:00
mem Include sysemu/hostmem.h less 2019-08-16 13:31:53 +02:00
mips Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
misc hw/m68k: implement ADB bus support for via 2019-10-28 19:06:45 +01:00
net aspeed: add support for the Aspeed MII controller of the AST2600 2019-10-15 18:09:05 +01:00
nubus hw/m68k: add Nubus support 2019-10-28 19:06:47 +01:00
nvram fw_cfg: add "modify" functions for all types 2019-10-22 09:39:54 +02:00
pci hw/pci: Remove the "command_serr_enable" property 2019-12-18 02:34:12 +01:00
pci-bridge Supply missing header guards 2019-06-12 13:20:21 +02:00
pci-host hw/pci-host/i440fx: Extract PCII440FXState to "hw/pci-host/i440fx.h" 2019-12-18 02:34:11 +01:00
ppc ppc/pnv: Drop PnvChipClass::type 2019-12-17 10:59:11 +11:00
rdma {hmp, hw/pvrdma}: Expose device internals via monitor interface 2019-03-16 15:52:44 +02:00
riscv hw/riscv: Add optional symbol callback ptr to riscv_load_kernel() 2019-11-25 12:34:52 -08:00
rtc Merge commit 'df84f17' into HEAD 2019-10-26 15:38:02 +02:00
s390x Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
scsi scsi: Propagate unrealize() callback to scsi-hd 2019-10-31 11:47:25 -04:00
sd hw/sd/sdhci: Add dummy Samsung SDHCI controller 2019-10-22 17:44:00 +01:00
semihosting include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
sh4 Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
southbridge hw/pci-host/piix: Extract PIIX3 functions to hw/isa/piix3.c 2019-11-05 23:33:12 +01:00
sparc Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
ssi aspeed/smc: Add AST2600 timings registers 2019-12-16 10:46:34 +00:00
timer Fix typos and docs, trivial changes and RTC devices split 2019-10-25 14:17:08 +01:00
tricore Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
unicore32 hw/unicore32: restrict hw addr defines to source file 2017-12-18 17:07:02 +03:00
usb usb: Add basic code to emulate Chipidea USB IP 2018-02-09 10:40:30 +00:00
vfio vfio: Turn the container error into an Error handle 2019-10-04 18:49:18 +02:00
virtio virtio-pci: disable vring processing when bus-mastering is disabled 2020-01-05 07:03:03 -05:00
watchdog watchdog/aspeed: Fix AST2600 frequency behaviour 2019-12-16 10:46:34 +00:00
xen global: Squash 'the the' 2019-11-06 17:19:40 +01:00
xtensa Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
boards.h kvm: convert "-machine kernel_irqchip" to an accelerator property 2019-12-17 19:32:46 +01:00
elf_ops.h elf-ops.h: fix int overflow in load_elf() 2019-09-16 12:32:21 +02:00
empty_slot.h include: Make headers more self-contained 2019-08-16 13:31:51 +02:00
fw-path-provider.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
hotplug.h qom: make interface types abstract 2018-12-11 15:45:22 -02:00
hw.h Include hw/hw.h exactly where needed 2019-08-16 13:31:52 +02:00
ide.h ide/via: Rename functions to match device name 2019-01-25 14:52:12 -05:00
irq.h Revert "irq: introduce qemu_irq_proxy()" 2019-11-05 23:33:12 +01:00
loader-fit.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
loader.h elf-ops.h: fix int overflow in load_elf() 2019-09-16 12:32:21 +02:00
nmi.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
or-irq.h Include hw/irq.h a lot less 2019-08-16 13:31:52 +02:00
pcmcia.h Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
platform-bus.h platform-bus-device: use device plug callback instead of machine_done notifier 2018-05-10 18:10:56 +01:00
ptimer.h ptimer: Remove old ptimer_init_with_bh() API 2019-11-11 13:44:16 +00:00
qdev-core.h migration: allow unplug during migration for failover devices 2019-10-29 18:55:26 -04:00
qdev-dma.h Supply missing header guards 2019-06-12 13:20:21 +02:00
qdev-properties.h qdev: Add a no default uuid property 2019-09-20 14:09:02 -05:00
register.h hw: register: Run post_write hook on reset 2018-03-01 11:05:43 +00:00
registerfields.h Use #include "..." for our own headers, <...> for others 2018-02-09 05:05:11 +01:00
stream.h Include qemu-common.h exactly where needed 2019-06-12 13:20:20 +02:00
sysbus.h Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00
usb.h Include hw/qdev-properties.h less 2019-08-16 13:31:53 +02:00