qemu/hw/i386/kvm/xen_evtchn.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

88 lines
3.3 KiB
C
Raw Normal View History

/*
* QEMU Xen emulation: Event channel support
*
* Copyright © 2022 Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Authors: David Woodhouse <dwmw2@infradead.org>
*
* This work is licensed under the terms of the GNU GPL, version 2 or later.
* See the COPYING file in the top-level directory.
*/
#ifndef QEMU_XEN_EVTCHN_H
#define QEMU_XEN_EVTCHN_H
hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback The GSI callback (and later PCI_INTX) is a level triggered interrupt. It is asserted when an event channel is delivered to vCPU0, and is supposed to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0 is cleared again. Thankfully, Xen does *not* assert the GSI if the guest sets its own evtchn_upcall_pending field; we only need to assert the GSI when we have delivered an event for ourselves. So that's the easy part, kind of. There's a slight complexity in that we need to hold the BQL before we can call qemu_set_irq(), and we definitely can't do that while holding our own port_lock (because we'll need to take that from the qemu-side functions that the PV backend drivers will call). So if we end up wanting to set the IRQ in a context where we *don't* already hold the BQL, defer to a BH. However, we *do* need to poll for the evtchn_upcall_pending flag being cleared. In an ideal world we would poll that when the EOI happens on the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd pairs — one is used to trigger the interrupt, and the other works in the other direction to 'resample' on EOI, and trigger the first eventfd again if the line is still active. However, QEMU doesn't seem to do that. Even VFIO level interrupts seem to be supported by temporarily unmapping the device's BARs from the guest when an interrupt happens, then trapping *all* MMIO to the device and sending the 'resample' event on *every* MMIO access until the IRQ is cleared! Maybe in future we'll plumb the 'resample' concept through QEMU's irq framework but for now we'll do what Xen itself does: just check the flag on every vmexit if the upcall GSI is known to be asserted. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2022-12-15 23:35:24 +03:00
#include "hw/sysbus.h"
typedef uint32_t evtchn_port_t;
hw/xen: Simplify emulated Xen platform init I initially put the basic platform init (overlay pages, grant tables, event channels) into mc->kvm_type because that was the earliest place that could sensibly test for xen_mode==XEN_EMULATE. The intent was to do this early enough that we could then initialise the XenBus and other parts which would have depended on them, from a generic location for both Xen and KVM/Xen in the PC-specific code, as seen in https://lore.kernel.org/qemu-devel/20230116221919.1124201-16-dwmw2@infradead.org/ However, then the Xen on Arm patches came along, and *they* wanted to do the XenBus init from a 'generic' Xen-specific location instead: https://lore.kernel.org/qemu-devel/20230210222729.957168-4-sstabellini@kernel.org/ Since there's no generic location that covers all three, I conceded to do it for XEN_EMULATE mode in pc_basic_devices_init(). And now there's absolutely no point in having some of the platform init done from pc_machine_kvm_type(); we can move it all up to live in a single place in pc_basic_devices_init(). This has the added benefit that we can drop the separate xen_evtchn_connect_gsis() function completely, and pass just the system GSIs in directly to xen_evtchn_create(). While I'm at it, it does no harm to explicitly pass in the *number* of said GSIs, because it does make me twitch a bit to pass an array of impicit size. During the lifetime of the KVM/Xen patchset, that had already changed (albeit just cosmetically) from GSI_NUM_PINS to IOAPIC_NUM_PINS. And document a bit better that this is for the *output* GSI for raising CPU0's events when the per-CPU vector isn't available. The fact that we create a whole set of them and then only waggle the one we're told to, instead of having a single output and only *connecting* it to the GSI that it should be connected to, is still non-intuitive for me. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20230412185102.441523-2-dwmw2@infradead.org> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2023-04-12 21:50:58 +03:00
void xen_evtchn_create(unsigned int nr_gsis, qemu_irq *system_gsis);
int xen_evtchn_soft_reset(void);
int xen_evtchn_set_callback_param(uint64_t param);
hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback The GSI callback (and later PCI_INTX) is a level triggered interrupt. It is asserted when an event channel is delivered to vCPU0, and is supposed to be cleared when the vcpu_info->evtchn_upcall_pending field for vCPU0 is cleared again. Thankfully, Xen does *not* assert the GSI if the guest sets its own evtchn_upcall_pending field; we only need to assert the GSI when we have delivered an event for ourselves. So that's the easy part, kind of. There's a slight complexity in that we need to hold the BQL before we can call qemu_set_irq(), and we definitely can't do that while holding our own port_lock (because we'll need to take that from the qemu-side functions that the PV backend drivers will call). So if we end up wanting to set the IRQ in a context where we *don't* already hold the BQL, defer to a BH. However, we *do* need to poll for the evtchn_upcall_pending flag being cleared. In an ideal world we would poll that when the EOI happens on the PIC/IOAPIC. That's how it works in the kernel with the VFIO eventfd pairs — one is used to trigger the interrupt, and the other works in the other direction to 'resample' on EOI, and trigger the first eventfd again if the line is still active. However, QEMU doesn't seem to do that. Even VFIO level interrupts seem to be supported by temporarily unmapping the device's BARs from the guest when an interrupt happens, then trapping *all* MMIO to the device and sending the 'resample' event on *every* MMIO access until the IRQ is cleared! Maybe in future we'll plumb the 'resample' concept through QEMU's irq framework but for now we'll do what Xen itself does: just check the flag on every vmexit if the upcall GSI is known to be asserted. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2022-12-15 23:35:24 +03:00
void xen_evtchn_set_callback_level(int level);
int xen_evtchn_set_port(uint16_t port);
bool xen_evtchn_set_gsi(int gsi, int level);
hw/xen: Support MSI mapping to PIRQ The way that Xen handles MSI PIRQs is kind of awful. There is a special MSI message which targets a PIRQ. The vector in the low bits of data must be zero. The low 8 bits of the PIRQ# are in the destination ID field, the extended destination ID field is unused, and instead the high bits of the PIRQ# are in the high 32 bits of the address. Using the high bits of the address means that we can't intercept and translate these messages in kvm_send_msi(), because they won't be caught by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range. So we catch them in pci_msi_trigger() instead, and deliver the event channel directly. That isn't even the worst part. The worst part is that Xen snoops on writes to devices' MSI vectors while they are *masked*. When a MSI message is written which looks like it targets a PIRQ, it remembers the device and vector for later. When the guest makes a hypercall to bind that PIRQ# (snooped from a marked MSI vector) to an event channel port, Xen *unmasks* that MSI vector on the device. Xen guests using PIRQ delivery of MSI don't ever actually unmask the MSI for themselves. Now that this is working we can finally enable XENFEAT_hvm_pirqs and let the guest use it all. Tested with passthrough igb and emulated e1000e + AHCI. CPU0 CPU1 0: 65 0 IO-APIC 2-edge timer 1: 0 14 xen-pirq 1-ioapic-edge i8042 4: 0 846 xen-pirq 4-ioapic-edge ttyS0 8: 1 0 xen-pirq 8-ioapic-edge rtc0 9: 0 0 xen-pirq 9-ioapic-level acpi 12: 257 0 xen-pirq 12-ioapic-edge i8042 24: 9600 0 xen-percpu -virq timer0 25: 2758 0 xen-percpu -ipi resched0 26: 0 0 xen-percpu -ipi callfunc0 27: 0 0 xen-percpu -virq debug0 28: 1526 0 xen-percpu -ipi callfuncsingle0 29: 0 0 xen-percpu -ipi spinlock0 30: 0 8608 xen-percpu -virq timer1 31: 0 874 xen-percpu -ipi resched1 32: 0 0 xen-percpu -ipi callfunc1 33: 0 0 xen-percpu -virq debug1 34: 0 1617 xen-percpu -ipi callfuncsingle1 35: 0 0 xen-percpu -ipi spinlock1 36: 8 0 xen-dyn -event xenbus 37: 0 6046 xen-pirq -msi ahci[0000:00:03.0] 38: 1 0 xen-pirq -msi-x ens4 39: 0 73 xen-pirq -msi-x ens4-rx-0 40: 14 0 xen-pirq -msi-x ens4-rx-1 41: 0 32 xen-pirq -msi-x ens4-tx-0 42: 47 0 xen-pirq -msi-x ens4-tx-1 Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-01-14 02:35:46 +03:00
void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
uint64_t addr, uint32_t data, bool is_masked);
void xen_evtchn_remove_pci_device(PCIDevice *dev);
struct kvm_irq_routing_entry;
int xen_evtchn_translate_pirq_msi(struct kvm_irq_routing_entry *route,
uint64_t address, uint32_t data);
bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data);
/*
* These functions mirror the libxenevtchn library API, providing the QEMU
* backend side of "interdomain" event channels.
*/
struct xenevtchn_handle;
struct xenevtchn_handle *xen_be_evtchn_open(void);
int xen_be_evtchn_bind_interdomain(struct xenevtchn_handle *xc, uint32_t domid,
evtchn_port_t guest_port);
int xen_be_evtchn_unbind(struct xenevtchn_handle *xc, evtchn_port_t port);
int xen_be_evtchn_close(struct xenevtchn_handle *xc);
int xen_be_evtchn_fd(struct xenevtchn_handle *xc);
int xen_be_evtchn_notify(struct xenevtchn_handle *xc, evtchn_port_t port);
int xen_be_evtchn_unmask(struct xenevtchn_handle *xc, evtchn_port_t port);
int xen_be_evtchn_pending(struct xenevtchn_handle *xc);
/* Apart from this which is a local addition */
int xen_be_evtchn_get_guest_port(struct xenevtchn_handle *xc);
struct evtchn_status;
struct evtchn_close;
struct evtchn_unmask;
struct evtchn_bind_virq;
struct evtchn_bind_pirq;
struct evtchn_bind_ipi;
struct evtchn_send;
struct evtchn_alloc_unbound;
struct evtchn_bind_interdomain;
struct evtchn_bind_vcpu;
struct evtchn_reset;
int xen_evtchn_status_op(struct evtchn_status *status);
int xen_evtchn_close_op(struct evtchn_close *close);
int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq);
int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
int xen_evtchn_send_op(struct evtchn_send *send);
int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain);
int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
int xen_evtchn_reset_op(struct evtchn_reset *reset);
struct physdev_map_pirq;
struct physdev_unmap_pirq;
struct physdev_eoi;
struct physdev_irq_status_query;
struct physdev_get_free_pirq;
int xen_physdev_map_pirq(struct physdev_map_pirq *map);
int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap);
int xen_physdev_eoi_pirq(struct physdev_eoi *eoi);
int xen_physdev_query_pirq(struct physdev_irq_status_query *query);
int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get);
#endif /* QEMU_XEN_EVTCHN_H */