qemu/hw
David Woodhouse 54ad31fb0a hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update
A Linux guest will perform IRQ migration after the IRQ has happened,
updating the RTE to point to the new destination CPU and then unmasking
the interrupt.

However, when the guest updates the RTE, ioapic_mem_write() calls
ioapic_service(), which redelivers the pending level interrupt via
kvm_set_irq(), *before* calling ioapic_update_kvm_routes() which sets
the new target CPU.

Thus, the IRQ which is supposed to go to the new target CPU is instead
misdelivered to the previous target. An example where the guest kernel
is attempting to migrate from CPU#2 to CPU#0 shows:

xenstore_read tx 0 path control/platform-feature-xs_reset_watches
ioapic_set_irq vector: 11 level: 1
ioapic_set_remote_irr set remote irr for pin 11
ioapic_service: trigger KVM IRQ 11
[    0.523627] The affinity mask was 0-3 and the handler is on 2
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
ioapic_update_kvm_routes: update KVM route for IRQ 11: fee02000 8021
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
xenstore_reset_watches
ioapic_set_irq vector: 11 level: 1
ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x1c021
[    0.524569] ioapic_ack_level IRQ 11 moveit = 1
ioapic_eoi_broadcast EOI broadcast for vector 33
ioapic_clear_remote_irr clear remote irr for pin 11 vector 33
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
ioapic_mem_read ioapic mem read addr 0x10 regsel: 0x26 size 0x4 retval 0x18021
[    0.525235] ioapic_finish_move IRQ 11 calls irq_move_masked_irq()
[    0.526147] irq_do_set_affinity for IRQ 11, 0
[    0.526732] ioapic_set_affinity for IRQ 11, 0
[    0.527330] ioapic_setup_msg_from_msi for IRQ11 target 0
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x27
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x27 size 0x4 val 0x0
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x27 size 0x4 val 0x26
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x18021
[    0.527623] ioapic_set_affinity returns 0
[    0.527623] ioapic_finish_move IRQ 11 calls unmask_ioapic_irq()
ioapic_mem_write ioapic mem write addr 0x0 regsel: 0x26 size 0x4 val 0x26
ioapic_mem_write ioapic mem write addr 0x10 regsel: 0x26 size 0x4 val 0x8021
ioapic_set_remote_irr set remote irr for pin 11
ioapic_service: trigger KVM IRQ 11
ioapic_update_kvm_routes: update KVM route for IRQ 11: fee00000 8021
[    0.529571] The affinity mask was 0 and the handler is on 2
[    xenstore_watch path memory/target token FFFFFFFF92847D40

There are no other code paths in ioapic_mem_write() which need the KVM
IRQ routing table to be updated, so just shift the call from the end
of the function to happen right before the call to ioapic_service()
and thus deliver the re-enabled IRQ to the right place.

Alternative fixes might have been just to remove the part in
ioapic_service() which delivers the IRQ via kvm_set_irq() because
surely delivering as MSI ought to work just fine anyway in all cases?
That code lacks a comment justifying its existence.

Or maybe in the specific case shown in the above log, it would have
sufficed for ioapic_update_kvm_routes() to update the route *even*
when the IRQ is masked. It's not like it's actually going to get
triggered unless QEMU deliberately does so, anyway? But that only
works because the target CPU happens to be in the high word of the
RTE; if something in the *low* word (vector, perhaps) was changed
at the same time as the unmask, we'd still trigger with stale data.

Fixes: 15eafc2e60 "kvm: x86: add support for KVM_CAP_SPLIT_IRQCHIP"
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20230308111952.2728440-2-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-15 11:52:25 +01:00
..
9pfs hw/xen: Build PV backend drivers for CONFIG_XEN_BUS 2023-03-07 17:04:30 +00:00
acpi pcihp: add ACPI PCI hotplug specific is_hotpluggable_bus() callback 2023-03-07 12:39:00 -05:00
adc
alpha Drop duplicate #include 2023-02-08 07:28:05 +01:00
arm aspeed queue: 2023-03-09 10:23:05 +00:00
audio hw/audio/via-ac97: Basic implementation of audio playback 2023-03-08 00:37:48 +01:00
avr
block Enable PV backends with Xen/KVM emulation 2023-03-09 13:22:05 +00:00
char hw/xen: Build PV backend drivers for CONFIG_XEN_BUS 2023-03-07 17:04:30 +00:00
core e1000e: Implement system clock 2023-03-10 15:35:38 +08:00
cpu hw/cpu: Mark arm11 and realview mpcore as target-independent code 2023-01-16 17:51:20 +01:00
cris
cxl hw/pxb-cxl: Support passthrough HDM Decoders unless overridden 2023-03-07 19:51:07 -05:00
display ui: rename cursor_{put->unref} 2023-03-13 22:57:39 +04:00
dma hw/isa: Rename isa_get_dma() -> isa_bus_get_dma() 2023-02-27 22:29:02 +01:00
gpio hw/gpio/max7310: Simplify max7310_realize() 2023-02-27 13:27:04 +00:00
hppa hw/isa: Rename isa_bus_irqs() -> isa_bus_register_input_irqs() 2023-02-27 22:29:02 +01:00
hyperv win32: replace closesocket() with close() wrapper 2023-03-13 15:39:31 +04:00
i2c hw: allwinner-i2c: Fix TWI_CNTR_INT_FLAG on SUN6i SoCs 2023-03-06 14:08:12 +00:00
i386 virtio,pc,pci: features, fixes 2023-03-10 14:31:37 +00:00
ide hw/ide/via: Replace magic 2 value by ARRAY_SIZE / MAX_IDE_DEVS 2023-02-27 22:29:02 +01:00
input hw/input: Clean up includes 2023-02-08 07:16:23 +01:00
intc hw/intc/ioapic: Update KVM routes before redelivering IRQ, on RTE update 2023-03-15 11:52:25 +01:00
ipack include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
ipmi include/hw/pci: Split pci_device.h off pci.h 2023-01-08 01:54:22 -05:00
isa virtio,pc,pci: features, fixes 2023-03-10 14:31:37 +00:00
loongarch hw/loongarch/virt: add system_powerdown hmp command support 2023-03-03 09:37:30 +08:00
m68k hw: Add compat machines for 8.0 2022-12-21 06:35:28 -05:00
mem hw/mem/cxl_type3: Add CXL RAS Error Injection Support. 2023-03-07 12:39:00 -05:00
microblaze hw/char/xilinx_uartlite: Open-code xilinx_uartlite_create() 2023-02-27 13:27:05 +00:00
mips hw/mips/itu: Pass SAAR using QOM link property 2023-03-08 00:37:48 +01:00
misc MIPS (and few misc) patches 2023-03-09 10:22:50 +00:00
net Intrdocue igb device emulation 2023-03-10 15:35:38 +08:00
nios2
nubus hw/nubus/nubus-device: Fix memory leak in nubus_device_realize 2023-02-27 22:29:01 +01:00
nvme hw/nvme: flexible data placement emulation 2023-03-06 15:28:02 +01:00
nvram aspeed queue: 2023-03-03 17:11:22 +00:00
openrisc
pci -----BEGIN PGP SIGNATURE----- 2023-03-11 17:17:18 +00:00
pci-bridge hw/pxb-cxl: Support passthrough HDM Decoders unless overridden 2023-03-07 19:51:07 -05:00
pci-host hw/ppc/pegasos2: Fix PCI interrupt routing 2023-03-08 00:37:48 +01:00
pcmcia
ppc gdbstub refactor: 2023-03-09 16:54:51 +00:00
rdma Drop duplicate #include 2023-02-08 07:28:05 +01:00
remote Drop duplicate #include 2023-02-08 07:28:05 +01:00
riscv hw/riscv/virt.c: Initialize the ACPI tables 2023-03-06 11:35:07 -08:00
rtc hw/rtc: Rename rtc_[get|set]_memory -> mc146818rtc_[get|set]_cmos_data 2023-02-27 22:29:02 +01:00
rx rx: re-randomize rng-seed on reboot 2022-10-27 11:34:31 +01:00
s390x s390x/pv: Add support for asynchronous teardown for reboot 2023-02-27 09:15:39 +01:00
scsi Updated the FSF address to <https://www.gnu.org/licenses/> 2023-02-27 09:15:39 +01:00
sd hw/arm/omap: Drop useless casts from void * to pointer 2023-01-12 17:15:09 +00:00
sensor Do not include hw/hw.h if it is not necessary 2023-02-27 09:15:38 +01:00
sh4 hw/ide/mmio: Extract TYPE_MMIO_IDE declarations to 'hw/ide/mmio.h' 2023-02-27 22:29:02 +01:00
smbios hw/smbios: fix field corruption in type 4 table 2023-03-02 03:10:46 -05:00
sparc
sparc64 hw/ide: Un-inline ide_set_irq() 2023-02-27 22:29:02 +01:00
ssi aspeed/smc: Replace SysBus IRQs with GPIO lines 2023-03-02 13:57:50 +01:00
timer hw/timer/hpet: Fix expiration time overflow 2023-03-02 03:10:47 -05:00
tpm hw/tpm: Move tpm_ppi.c out of target-specific source set 2023-01-16 17:51:20 +01:00
tricore
usb Enable PV backends with Xen/KVM emulation 2023-03-09 13:22:05 +00:00
vfio vfio: Fix vfio_get_dev_region() trace event 2023-03-07 11:19:07 -07:00
virtio virtio: fix reachable assertion due to stale value of cached region size 2023-03-07 19:51:07 -05:00
watchdog hw/watchdog/wdt_aspeed: Log unimplemented registers as UNIMP level 2023-02-07 09:02:05 +01:00
xen hw/xen: Avoid crash when backend watch fires too early 2023-03-07 17:04:30 +00:00
xenpv hw/xen: Subsume xen_be_register_common() into xen_be_init() 2023-03-01 09:09:22 +00:00
xtensa
Kconfig xen: add CONFIG_XEN_BUS and CONFIG_XEN_EMU options for Xen emulation 2023-03-01 08:22:49 +00:00
meson.build