mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Cédric Le Goater	eab0a2d06e	spapr/xive: Allocate vCPU IPIs from the vCPU contexts When QEMU switches to the XIVE interrupt mode, it creates all the guest interrupts at the level of the KVM device. These interrupts are backed by real HW interrupts from the IPI interrupt pool of the XIVE controller. Currently, this is done from the QEMU main thread, which results in allocating all interrupts from the chip on which QEMU is running. IPIs are not distributed across the system and the load is not well balanced across the interrupt controllers. Change the vCPU IPI allocation to run from the vCPU context. The associated XIVE IPI interrupt will be allocated on the chip on which the vCPU is running and improve distribution of the IPIs in the system. When the vCPUs are pinned, this will make the IPI local to the chip of the vCPU. It will reduce rerouting between interrupt controllers and gives better performance. Device interrupts are still treated the same. To improve placement, we would need some information on the chip owning the virtual source or the HW source in case of a passthrough device but this reuires changes in PAPR. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:08:42 +10:00
Cédric Le Goater	acbdb9956f	spapr/xive: Allocate IPIs independently from the other sources The vCPU IPIs are now allocated in kvmppc_xive_cpu_connect() when the vCPU connects to the KVM device and not when all the sources are reset in kvmppc_xive_source_reset() This requires extra care for hotplug vCPUs and VM restore. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:08:42 +10:00
Cédric Le Goater	fa94447a2c	spapr/xive: Use kvmppc_xive_source_reset() in post_load This is doing an extra loop but should be equivalent. It also differentiate the reset of the sources from the restore of the sources configuration. This will help in allocating the vCPU IPIs independently. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:08:42 +10:00
Cédric Le Goater	235d3b1162	spapr/xive: Modify kvm_cpu_is_enabled() interface We will use to check if a vCPU IPI has been created. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200820134547.2355743-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-09-08 10:08:42 +10:00
Greg Kurz	3110f0ee19	spapr/xive: Use xive_source_esb_len() static inline size_t xive_source_esb_len(XiveSource xsrc) { return (1ull << xsrc->esb_shift) xsrc->nr_irqs; } Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159733969034.320580.6571451425779179477.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-14 13:35:45 +10:00
Greg Kurz	1118b6b727	spapr/xive: Simplify error handling of kvmppc_xive_cpu_synchronize_state() Now that kvmppc_xive_cpu_get_state() returns negative on error, use that and get rid of the temporary Error object and error_propagate(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707852916.1489912.8376334685349668124.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:09:38 +10:00
Greg Kurz	6cdc0e2063	spapr/xive: Simplify error handling in kvmppc_xive_connect() Now that all these functions return a negative errno on failure, check that and get rid of the local_err boilerplate. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707851537.1489912.1030839306195472651.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:06:44 +10:00
Greg Kurz	a845a54cbe	spapr/xive: Fix error handling in kvmppc_xive_post_load() Now that all these functions return a negative errno on failure, check that because it is preferred to local_err. And most of all, propagate it because vmstate expects negative errnos. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707850148.1489912.18355118622296682631.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:04:05 +10:00
Greg Kurz	42a92d925d	spapr/kvm: Fix error handling in kvmppc_xive_pre_save() Now that kvmppc_xive_get_queues() returns a negative errno on failure, check with that because it is preferred to local_err. And most of all, propagate it because vmstate expects negative errnos. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707849455.1489912.6034461176847728064.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:03:35 +10:00
Greg Kurz	d55daadcb8	spapr/xive: Rework error handling of kvmppc_xive_set_source_config() Since kvm_device_access() returns a negative errno on failure, convert kvmppc_xive_set_source_config() to use it for error checking. This allows to get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707848764.1489912.17078842252160674523.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	d53482a73b	spapr/xive: Rework error handling in kvmppc_xive_get_queues() Since kvmppc_xive_get_queue_config() has a return value, convert kvmppc_xive_get_queues() to use it for error checking. This allows to get rid of the local_err boiler plate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707848069.1489912.14879208798696134531.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	f9a548edf2	spapr/xive: Rework error handling of kvmppc_xive_[gs]et_queue_config() Since kvm_device_access() returns a negative errno on failure, convert kvmppc_xive_get_queue_config() and kvmppc_xive_set_queue_config() to use it for error checking. This allows to get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707847357.1489912.2032291280645236480.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	5fa36b7ffb	spapr/xive: Rework error handling of kvmppc_xive_cpu_[gs]et_state() kvm_set_one_reg() returns a negative errno on failure, use that instead of errno. Also propagate it to callers so they can use it to check for failures and hopefully get rid of their local_err boilerplate. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707846665.1489912.14267225652103441921.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	b14adb4a27	spapr/xive: Rework error handling of kvmppc_xive_mmap() Callers currently check failures of kvmppc_xive_mmap() through the @errp argument, which isn't a recommanded practice. It is preferred to use a return value when possible. Since NULL isn't an invalid address in theory, it seems better to return MAP_FAILED and to teach callers to handle it. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707845972.1489912.719896767746375765.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	46407a2531	spapr/xive: Rework error handling of kvmppc_xive_source_reset() Since kvmppc_xive_source_reset_one() has a return value, convert kvmppc_xive_source_reset() to use it for error checking. This allows to get rid of the local_err boiler plate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707845245.1489912.9151822670764690034.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	3885ca6688	spapr/xive: Rework error handling of kvmppc_xive_cpu_connect() Use error_setg_errno() instead of error_setg(strerror()). While here, use -ret instead of errno since kvm_vcpu_enable_cap() returns a negative errno on failure. Use ERRP_GUARD() to ensure that errp can be passed to error_append_hint(), and get rid of the local_err boilerplate. Propagate the return value so that callers may use it as well to check failures. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159707844549.1489912.4862921680328017645.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 21:00:52 +10:00
Greg Kurz	a490711934	spapr/xive: Convert KVM device fd checks to assert() All callers guard these functions with an xive_in_kernel() helper. Make it clear that they are only to be called when the KVM XIVE device exists. Note that the check on xive is dropped in kvmppc_xive_disconnect(). It really cannot be NULL since it comes from set_active_intc() which only passes pointers to allocated objects. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <159679994169.876294.11026653581505077112.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 20:58:52 +10:00
Greg Kurz	cf36e5b376	ppc/xive: Rework setup of XiveSource::esb_mmio Depending on whether XIVE is emultated or backed with a KVM XIVE device, the ESB MMIOs of a XIVE source point to an I/O memory region or a mapped memory region. This is currently handled by checking kvm_irqchip_in_kernel() returns false in xive_source_realize(). This is a bit awkward as we usually need to do extra things when we're using the in-kernel backend, not less. But most important, we can do better: turn the existing "xive.esb" memory region into a plain container, introduce an "xive.esb-emulated" I/O subregion and rename the existing "xive.esb" subregion in the KVM code to "xive.esb-kvm". Since "xive.esb-kvm" is added with overlap and a higher priority, it prevails over "xive.esb-emulated" (ie. a guest using KVM XIVE will interact with "xive.esb-kvm" instead of the default "xive.esb-emulated" region. While here, consolidate the computation of the MMIO region size in a common helper. Suggested-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159679992680.876294.7520540158586170894.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-13 20:50:17 +10:00
Greg Kurz	e781139539	spapr/xive: Simplify kvmppc_xive_disconnect() Since this function begins with: /* The KVM XIVE device is not in use */ if (!xive \|\| xive->fd == -1) { return; } we obviously don't need to check xive->fd again. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159673297296.766512.14780055521619233656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-12 13:16:27 +10:00
Greg Kurz	82f086b5e7	spapr/xive: Fix xive->fd if kvm_create_device() fails If the creation of the KVM XIVE device fails for some reasons, the negative errno ends up in xive->fd, but the rest of the code assumes that xive->fd either contains an open fd, ie. positive value, or -1. This doesn't cause any misbehavior except kvmppc_xive_disconnect() that will try to close(xive->fd) during rollback and likely be rewarded with an EBADF. Only set xive->fd with a open fd. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <159673296585.766512.15404407281299745442.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-08-12 13:16:27 +10:00
Greg Kurz	74e51a38f7	spapr/xive: Deduce the SpaprXive pointer from XiveTCTX::xptr And use it instead of reaching out to the machine. This allows to get rid of a call to qdev_get_machine() and to reduce the scope of another one so that it is only used within the argument list of error_append_hint(). This is an acceptable tradeoff compared to all it would require to know about the maximum number of CPUs here without calling qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Greg Kurz	74f23d4332	spapr/xive: Configure number of servers in KVM The XIVE KVM devices now has an attribute to configure the number of interrupt servers. This allows to greatly optimize the usage of the VP space in the XIVE HW, and thus to start a lot more VMs. Only set this attribute if available in order to support older POWER9 KVM. The XIVE KVM device now reports the exhaustion of VPs upon the connection of the first VCPU. Check that in order to have a chance to provide a hint to the user. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157478679392.67101.7843580591407950866.stgit@bahia.tlslab.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	4ffb749688	spapr: Pass the maximum number of vCPUs to the KVM interrupt controller The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce their consumption of some scarce HW resources, namely Virtual Presenter identifiers, if they know the maximum number of vCPUs that may run in the VM. Prepare ground for this by passing the value down to xics_kvm_connect() and kvmppc_xive_connect(). This is purely mechanical, no functional change. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157478678301.67101.2717368060417156338.stgit@bahia.tlslab.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	58246041d3	xive/kvm: Trigger interrupts from userspace When using the XIVE KVM device, the trigger page is directly accessible in QEMU. Unlike with XICS, no need to ask KVM to fire the interrupt. A simple store on the trigger page does the job. Just call xive_esb_trigger(). This may improve performance of emulated devices that go through qemu_set_irq(), eg. virtio devices created with ioeventfd=off or configured by the guest to use LSI interrupts, which aren't really recommended setups. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157408992731.494439.3405812941731584740.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
David Gibson	98a39a7927	spapr, xics, xive: Match signatures for XICS and XIVE KVM connect routines Both XICS and XIVE have routines to connect and disconnect KVM with similar but not identical signatures. This adjusts them to match exactly, which will be useful for further cleanups later. While we're there, we add an explicit return value to the connect path to streamline error reporting in the callers. We remove error reporting the disconnect path. In the XICS case this wasn't used at all. In the XIVE case the only error case was if the KVM device was set up, but KVM didn't have the capability to do so which is pretty obviously impossible. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	e594c2ad1c	xive: Improve irq claim/free path spapr_xive_irq_claim() returns a bool to indicate if it succeeded. But most of the callers and one callee use int return values and/or an Error * with more information instead. In any case, ints are a more common idiom for success/failure states than bools (one never knows what sense they'll be in). So instead change to an int return value to indicate presence of error + an Error * to describe the details through that call chain. It also didn't actually check if the irq was already claimed, which is one of the primary purposes of the claim path, so do that. spapr_xive_irq_free() also returned a bool... which no callers checked and was always true, so just drop it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
Cédric Le Goater	4c3539d491	spapr/irq: Only claim VALID interrupts at the KVM level A typical pseries VM with 16 vCPUs, one disk, one network adapater uses less than 100 interrupts but the whole IRQ number space of the QEMU machine is allocated at reset time and it is 8K wide. This is wasting a considerable amount of interrupt numbers in the global IRQ space which has 1M interrupts per socket on a POWER9. To optimise the HW resources, only request at the KVM level interrupts which have been claimed by the guest. This will help to increase the maximum number of VMs per system and also help supporting nested guests using the XIVE interrupt mode. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190911133937.2716-3-clg@kaod.org> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <156942766014.1274533.10792048853177121231.stgit@bahia.lan> [dwg: Folded in fix up from Greg Kurz] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 10:25:23 +10:00
Markus Armbruster	54d31236b9	sysemu: Split sysemu/runstate.h off sysemu/sysemu.h sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]	2019-08-16 13:37:36 +02:00
Cédric Le Goater	310cda5b5e	spapr/xive: Fix migration of hot-plugged CPUs The migration sequence of a guest using the XIVE exploitation mode relies on the fact that the states of all devices are restored before the machine is. This is not true for hot-plug devices such as CPUs which state come after the machine. This breaks migration because the thread interrupt context registers are not correctly set. Fix migration of hotplugged CPUs by restoring their context in the 'post_load' handler of the XiveTCTX model. Fixes: `277dd3d771` ("spapr/xive: add migration support for KVM") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190813064853.29310-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-08-13 16:50:30 +10:00
Greg Kurz	1c3d4a8f4b	spapr/xive: Add proper rollback to kvmppc_xive_connect() Make kvmppc_xive_disconnect() able to undo the changes of a partial execution of kvmppc_xive_connect() and use it to perform rollback. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Message-Id: <156198735673.293938.7313195993600841641.stgit@bahia> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 10:11:44 +10:00
Cédric Le Goater	981b1c6266	spapr/xive: rework the mapping the KVM memory regions Today, the interrupt device is fully initialized at reset when the CAS negotiation process has completed. Depending on the KVM capabilities, the SpaprXive memory regions (ESB, TIMA) are initialized with a host MMIO backend or a QEMU emulated backend. This results in a complex initialization sequence partially done at realize and later at reset, and some memory region leaks. To simplify this sequence and to remove of the late initialization of the emulated device which is required to be done only once, we introduce new memory regions specific for KVM. These regions are mapped as overlaps on top of the emulated device to make use of the host MMIOs. Also provide proper cleanups of these regions when the XIVE KVM device is destroyed to fix the leaks. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190614165920.12670-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-07-02 09:43:58 +10:00
Cédric Le Goater	cdd71c8e9d	spapr/xive: fix multiple resets when using the 'dual' interrupt mode Today, when a reset occurs on a pseries machine using the 'dual' interrupt mode, the KVM devices are released and recreated depending on the interrupt mode selected by CAS. If XIVE is selected, the SysBus memory regions of the SpaprXive model are initialized by the KVM backend initialization routine each time a reset occurs. This leads to a crash after a couple of resets because the machine reaches the QDEV_MAX_MMIO limit of SysBusDevice : qemu-system-ppc64: hw/core/sysbus.c:193: sysbus_init_mmio: Assertion `dev->num_mmio < QDEV_MAX_MMIO' failed. To fix, initialize the SysBus memory regions in spapr_xive_realize() called only once and remove the same inits from the QEMU and KVM backend initialization routines which are called at each reset. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190522074016.10521-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:47 +10:00
Cédric Le Goater	3f777abc71	spapr/irq: add KVM support to the 'dual' machine The interrupt mode is chosen by the CAS negotiation process and activated after a reset to take into account the required changes in the machine. This brings new constraints on how the associated KVM IRQ device is initialized. Currently, each model takes care of the initialization of the KVM device in their realize method but this is not possible anymore as the initialization needs to be done globaly when the interrupt mode is known, i.e. when machine is reseted. It also means that we need a way to delete a KVM device when another mode is chosen. Also, to support migration, the QEMU objects holding the state to transfer should always be available but not necessarily activated. The overall approach of this proposal is to initialize both interrupt mode at the QEMU level to keep the IRQ number space in sync and to allow switching from one mode to another. For the KVM side of things, the whole initialization of the KVM device, sources and presenters, is grouped in a single routine. The XICS and XIVE sPAPR IRQ reset handlers are modified accordingly to handle the init and the delete sequences of the KVM device. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-15-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	3bf84e99c8	spapr: check for the activation of the KVM IRQ device The activation of the KVM IRQ device depends on the interrupt mode chosen at CAS time by the machine and some methods used at reset or by the migration need to be protected. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190513084245.25755-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	56b11587df	spapr: introduce routines to delete the KVM IRQ device If a new interrupt mode is chosen by CAS, the machine generates a reset to reconfigure. At this point, the connection with the previous KVM device needs to be closed and a new connection needs to opened with the KVM device operating the chosen interrupt mode. New routines are introduced to destroy the XICS and the XIVE KVM devices. They make use of a new KVM device ioctl which destroys the device and also disconnects the IRQ presenters from the vCPUs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	277dd3d771	spapr/xive: add migration support for KVM When the VM is stopped, the VM state handler stabilizes the XIVE IC and marks the EQ pages dirty. These are then transferred to destination before the transfer of the device vmstates starts. The SpaprXive interrupt controller model captures the XIVE internal tables, EAT and ENDT and the XiveTCTX model does the same for the thread interrupt context registers. At restart, the SpaprXive 'post_load' method restores all the XIVE states. It is called by the sPAPR machine 'post_load' method, when all XIVE states have been transferred and loaded. Finally, the source states are restored in the VM change state handler when the machine reaches the running state. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	9b88cd7673	spapr/xive: introduce a VM state change handler This handler is in charge of stabilizing the flow of event notifications in the XIVE controller before migrating a guest. This is a requirement before transferring the guest EQ pages to a destination. When the VM is stopped, the handler sets the source PQs to PENDING to stop the flow of events and to possibly catch a triggered interrupt occuring while the VM is stopped. Their previous state is saved. The XIVE controller is then synced through KVM to flush any in-flight event notification and to stabilize the EQs. At this stage, the EQ pages are marked dirty to make sure the EQ pages are transferred if a migration sequence is in progress. The previous configuration of the sources is restored when the VM resumes, after a migration or a stop. If an interrupt was queued while the VM was stopped, the handler simply generates the missing trigger. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	7bfc759c02	spapr/xive: add state synchronization with KVM This extends the KVM XIVE device backend with 'synchronize_state' methods used to retrieve the state from KVM. The HW state of the sources, the KVM device and the thread interrupt contexts are collected for the monitor usage and also migration. These get operations rely on their KVM counterpart in the host kernel which acts as a proxy for OPAL, the host firmware. The set operations will be added for migration support later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20190513084245.25755-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:46 +10:00
Cédric Le Goater	0c575703e4	spapr/xive: add hcall support when under KVM XIVE hcalls are all redirected to QEMU as none are on a fast path. When necessary, QEMU invokes KVM through specific ioctls to perform host operations. QEMU should have done the necessary checks before calling KVM and, in case of failure, H_HARDWARE is simply returned. H_INT_ESB is a special case that could have been handled under KVM but the impact on performance was low when under QEMU. Here are some figures : kernel irqchip OFF ON H_INT_ESB KVM QEMU rtl8139 (LSI ) 1.19 1.24 1.23 Gbits/sec virtio 31.80 42.30 -- Gbits/sec Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00
Cédric Le Goater	38afd772f8	spapr/xive: add KVM support This introduces a set of helpers when KVM is in use, which create the KVM XIVE device, initialize the interrupt sources at a KVM level and connect the interrupt presenters to the vCPU. They also handle the initialization of the TIMA and the source ESB memory regions of the controller. These have a different type under KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed to the guest and the associated VMAs on the host are populated dynamically with the appropriate pages using a fault handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20190513084245.25755-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-05-29 11:39:45 +10:00

40 Commits