mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Alexey Kardashevskiy	81b205cecf	ppc/spapr: Implement H_WATCHDOG The new PAPR 2.12 defines a watchdog facility managed via the new H_WATCHDOG hypercall. This adds H_WATCHDOG support which a proposed driver for pseries uses: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=303120 This was tested by running QEMU with a debug kernel and command line: -append \ "pseries-wdt.timeout=60 pseries-wdt.nowayout=1 pseries-wdt.action=2" and running "echo V > /dev/watchdog0" inside the VM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220622051008.1067464-1-aik@ozlabs.ru> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-07-06 10:22:38 -03:00
Alexey Kardashevskiy	31cc81f728	spapr/ddw: Reset DMA when the last non-default window is removed PAPR+/LoPAPR says: === The platform must restore the default DMA window for the PE on a call to the ibm,remove-pe-dma-window RTAS call when all of the following are true: a. The call removes the last DMA window remaining for the PE. b. The DMA window being removed is not the default window === This resets DMA as PAPR mandates. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220622052955.1069903-1-aik@ozlabs.ru> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-07-06 10:22:37 -03:00
Daniel Henrique Barboza	792e8bb629	ppc/pnv: assign pnv-phb-root-port chassis/slot earlier It is not advisable to execute an object_dynamic_cast() to poke into bus->qbus.parent and follow it up with a C cast into the PnvPHB type we think we got. In fact this is not needed. There is nothing sophisticated being done with the PHB object retrieved during root_port_realize() for both PHB3 and PHB4. We're retrieving a PHB reference just to access phb->chip_id and phb->phb_id and use them to define the chassis/slot of the root port. phb->phb_id is already being passed to pnv_phb_attach_root_port() via the 'index' parameter. Let's also add a 'chip_id' parameter to this function and assign chassis and slot right there. This will spare us from the hassle of accessing the PHB object inside realize(). Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Message-Id: <20220621173436.165912-4-danielhb413@gmail.com>	2022-07-06 10:22:37 -03:00
Daniel Henrique Barboza	8625164a38	ppc/pnv: attach phb3/phb4 root ports in QOM tree At this moment we leave the pnv-phb3(4)-root-port unattached in QOM: /unattached (container) (...) /device[2] (pnv-phb3-root-port) /bus master container[0] (memory-region) /bus master[0] (memory-region) /pci_bridge_io[0] (memory-region) /pci_bridge_io[1] (memory-region) /pci_bridge_mem[0] (memory-region) /pci_bridge_pci[0] (memory-region) /pci_bridge_pref_mem[0] (memory-region) /pci_bridge_vga_io_hi[0] (memory-region) /pci_bridge_vga_io_lo[0] (memory-region) /pci_bridge_vga_mem[0] (memory-region) /pcie.0 (PCIE) Let's make changes in pnv_phb_attach_root_port() to attach the created root ports to its corresponding PHB. This is the result afterwards: /pnv-phb3[0] (pnv-phb3) /lsi (ics) /msi (phb3-msi) /msi32[0] (memory-region) /msi64[0] (memory-region) /pbcq (pnv-pbcq) (...) /phb3_iommu[0] (pnv-phb3-iommu-memory-region) /pnv-phb3-root.0 (pnv-phb3-root) /pnv-phb3-root-port[0] (pnv-phb3-root-port) /bus master container[0] (memory-region) /bus master[0] (memory-region) /pci_bridge_io[0] (memory-region) /pci_bridge_io[1] (memory-region) /pci_bridge_mem[0] (memory-region) /pci_bridge_pci[0] (memory-region) /pci_bridge_pref_mem[0] (memory-region) /pci_bridge_vga_io_hi[0] (memory-region) /pci_bridge_vga_io_lo[0] (memory-region) /pci_bridge_vga_mem[0] (memory-region) /pcie.0 (PCIE) Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220621173436.165912-3-danielhb413@gmail.com>	2022-07-06 10:22:37 -03:00
Paolo Bonzini	f73eb9484b	pseries: allow setting stdout-path even on machines with a VGA -machine graphics=off is the usual way to tell the firmware or the OS that the user wants a serial console. The pseries machine however does not support this, and never adds the stdout-path node to the device tree if a VGA device is provided. This is in addition to the other magic behavior of VGA devices, which is to add a keyboard and mouse to the default USB bus. Split spapr->has_graphics in two variables so that the two behaviors can be separated: the USB devices remains the same, but the stdout-path is added even with "-device VGA -machine graphics=off". Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220507054826.124936-1-pbonzini@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-05-26 17:11:32 -03:00
Markus Armbruster	9c0928045c	Clean up ill-advised or unusual header guards Leading underscores are ill-advised because such identifiers are reserved. Trailing underscores are merely ugly. Strip both. Our header guards commonly end in _H. Normalize the exceptions. Macros should be ALL_CAPS. Normalize the exception. Done with scripts/clean-header-guards.pl. include/hw/xen/interface/ and tools/virtiofsd/ left alone, because these were imported from Xen and libfuse respectively. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-3-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2022-05-11 16:50:01 +02:00
Frederic Barrat	f657721187	ppc/xive: Update the state of the External interrupt signal When pulling or pushing an OS context from/to a CPU, we should re-evaluate the state of the External interrupt signal. Otherwise, we can end up catching the External interrupt exception in hypervisor mode, which is unexpected. The problem is best illustrated with the following scenario: 1. an External interrupt is raised while the guest is on the CPU. 2. before the guest can ack the External interrupt, an hypervisor interrupt is raised, for example the Hypervisor Decrementer or Hypervisor Virtualization interrupt. The hypervisor interrupt forces the guest to exit while the External interrupt is still pending. 3. the hypervisor handles the hypervisor interrupt. At this point, the External interrupt is still pending. So it's very likely to be delivered while the hypervisor is running. That's unexpected and can result in an infinite loop where the hypervisor catches the External interrupt, looks for an interrupt in its hypervisor queue, doesn't find any, exits the interrupt handler with the External interrupt still raised, repeat... The fix is simply to always lower the External interrupt signal when pulling an OS context. It means it needs to be raised again when re-pushing the OS context. Fortunately, it's already the case, as we now always call xive_tctx_ipb_update(), which will raise the signal if needed. Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Frederic Barrat <fbarrat@linux.ibm.com> Message-Id: <20220429071620.177142-3-fbarrat@linux.ibm.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-05-05 15:36:17 -03:00
Guo Zhi	2d94af4b16	hw/ppc: change indentation to spaces from TABs There are still some files in the QEMU PPC code base that use TABs for indentation instead of using spaces. The TABs should be replaced so that we have a consistent coding style. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/374 Signed-off-by: Guo Zhi <qtxuning1999@sjtu.edu.cn> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220412021240.2080218-1-qtxuning1999@sjtu.edu.cn> [danielhb: trimmed commit msg to 72 chars per line] Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Cédric Le Goater	dcf4ca4514	ppc/pnv: Remove PnvPsiClas::irq_set All devices raising PSI interrupts are now converted to use GPIO lines and the pnv_psi_irq_set() routines have become useless. Drop them. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220323072846.1780212-5-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Cédric Le Goater	b0ae5c69e1	ppc/pnv: Remove PnvOCC::psi link Use an anonymous output GPIO line to connect the OCC device with the PSIHB device and raise the appropriate PSI IRQ line depending on the processor model. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220323072846.1780212-4-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Cédric Le Goater	c05aa1406b	ppc/pnv: Remove PnvLpcController::psi link Create an anonymous output GPIO line to connect the LPC device with the PSIHB device and raise the appropriate PSI IRQ line depending on the processor model. A temporary __pnv_psi_irq_set() routine is introduced to handle the transition. It will be removed when all devices raising PSI interrupts are converted to use GPIOs. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220323072846.1780212-3-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Cédric Le Goater	58858759c1	ppc/pnv: Fix PSI IRQ definition On HW, the PSI and FSP interrupt levels are muxed under the same interrupt number. For coding reasons, an extra IRQ number was introduced to index register values in an array. It increased the count of IRQs which do not fit in the PSI IRQ range anymore. The PSI and FSP interrupts should be modeled with an extra level of GPIO lines but since QEMU does not support them, simply drop the extra number to stay within the IRQ range. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220323072846.1780212-2-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Alexey Kardashevskiy	4c7daca302	ppc/spapr/ddw: Add 2M pagesize Recently the LoPAPR spec got a new 2MB pagesize to support in Dynamic DMA Windows API (DDW), this adds the new flag. Linux supports it since https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=38727311871 Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20220321071945.918669-1-aik@ozlabs.ru> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>	2022-04-20 18:00:30 -03:00
Daniel Henrique Barboza	ef95a24494	hw/ppc: free env->tb_env in spapr_unrealize_vcpu() The timebase is allocated during spapr_realize_vcpu() and it's not freed. This results in memory leaks when doing vcpu unplugs: ==636935== ==636935== 144 (96 direct, 48 indirect) bytes in 1 blocks are definitely lost in loss record 6 ,461 of 8,135 ==636935== at 0x4897468: calloc (vg_replace_malloc.c:760) ==636935== by 0x5077213: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6400.4) ==636935== by 0x507757F: g_malloc0_n (in /usr/lib64/libglib-2.0.so.0.6400.4) ==636935== by 0x93C3FB: cpu_ppc_tb_init (ppc.c:1066) ==636935== by 0x97BC2B: spapr_realize_vcpu (spapr_cpu_core.c:268) ==636935== by 0x97C01F: spapr_cpu_core_realize (spapr_cpu_core.c:337) ==636935== by 0xD4626F: device_set_realized (qdev.c:531) ==636935== by 0xD55273: property_set_bool (object.c:2273) ==636935== by 0xD523DF: object_property_set (object.c:1408) ==636935== by 0xD588B7: object_property_set_qobject (qom-qobject.c:28) ==636935== by 0xD52897: object_property_set_bool (object.c:1477) ==636935== by 0xD4579B: qdev_realize (qdev.c:333) ==636935== This patch adds a cpu_ppc_tb_free() helper in hw/ppc/ppc.c to allow us to free the timebase. This leak is then solved by calling cpu_ppc_tb_free() in spapr_unrealize_vcpu(). Fixes: `6f4b5c3ec5` ("spapr: CPU hot unplug support") Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20220329124545.529145-2-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-04-04 08:49:06 +02:00
Cédric Le Goater	9c10d86fee	ppc/pnv: Remove user-created PHB{3,4,5} devices On a real system with POWER{8,9,10} processors, PHBs are sub-units of the processor, they can be deactivated by firmware but not plugged in or out like a PCI adapter on a slot. Nevertheless, having user-created PHBs in QEMU seemed to be a good idea for testing purposes : 1. having a limited set of PHBs speedups boot time. 2. it is useful to be able to mimic a partially broken topology you some time have to deal with during bring-up. PowerNV is also used for distro install tests and having libvirt support eases these tasks. libvirt prefers to run the machine with -nodefaults to be sure not to drag unexpected devices which would need to be defined in the domain file without being specified on the QEMU command line. For this reason : 3. -nodefaults should not include default PHBs User-created PHB{3,4,5} devices satisfied all these needs but reality proves to be a bit more complex, internally when modeling such devices, and externally when dealing with the user interface. Req 1. and 2. can be simply addressed differently with a machine option: "phb-mask=<uint>", which QEMU would use to enable/disable PHB device nodes when creating the device tree. For Req 3., we need to make sure we are taking the right approach. It seems that we should expose a new type of user-created PHB device, a generic virtualized one, that libvirt would use and not one depending on the processor revision. This needs more thinking. For now, remove user-created PHB{3,4,5} devices. All the cleanups we did are not lost and they will be useful for the next steps. Fixes: `5bc67b052b` ("ppc/pnv: Introduce user creatable pnv-phb4 devices") Fixes: `1f6a88fffc` ("ppc/pnv: Introduce support for user created PHB3 devices") Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220314130514.529931-1-clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-14 15:57:17 +01:00
Cédric Le Goater	09a7e60c64	pnv/xive2: Add support for 8bits thread id Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	835806f1f9	pnv/xive2: Add support for automatic save&restore The XIVE interrupt controller on P10 can automatically save and restore the state of the interrupt registers under the internal NVP structure representing the VCPU. This saves a costly store/load in guest entries and exits. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	e16032b8dc	xive2: Add a get_config() handler for the router configuration Add GEN1 config even if we don't use it yet in the core framework. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	95d729e2bc	ppc/pnv: add XIVE Gen2 TIMA support Only the CAM line updates done by the hypervisor are specific to POWER10. Instead of duplicating the TM ops table, we handle these commands locally under the PowerNV XIVE2 model. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	c6b8cc370d	ppc/pnv: Add support for PQ offload on PHB5 The PQ_disable configuration bit disables the check done on the PQ state bits when processing new MSI interrupts. When bit 9 is enabled, the PHB forwards any MSI trigger to the XIVE interrupt controller without checking the PQ state bits. The XIVE IC knows from the trigger message that the PQ bits have not been checked and performs the check locally. This configuration bit only applies to MSIs and LSIs are still checked on the PHB to handle the assertion level. PQ_disable enablement is a requirement for StoreEOI. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	0aa2612a01	ppc/xive: Add support for PQ state bits offload The trigger message coming from a HW source contains a special bit informing the XIVE interrupt controller that the PQ bits have been checked at the source or not. Depending on the value, the IC can perform the check and the state transition locally using its own PQ state bits. The following changes add new accessors to the XiveRouter required to query and update the PQ state bits. This only applies to the PowerNV machine. sPAPR accessors are provided but the pSeries machine should not be concerned by such complex configuration for the moment. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	aadf13abaa	ppc/xive2: Add support for notification injection on ESB pages This is an internal offset used to inject triggers when the PQ state bits are not controlled locally. Such as for LSIs when the PHB5 are using the Address-Based Interrupt Trigger mode and on the END. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	924996766b	ppc/pnv: Add a HOMER model to POWER10 Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	623575e16c	ppc/pnv: Add model for POWER10 PHB5 PCIe Host bridge PHB4 and PHB5 are very similar. Use the PHB4 models with some minor adjustements in a subclass for P10. Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	ae4c68e366	ppc/pnv: Add POWER10 quads and use a pnv_chip_power10_quad_realize() helper to avoid code duplication with P9. This still needs some refinements on the XSCOM registers handling in PnvQuad. Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	8bf682a349	ppc/pnv: Add a OCC model for POWER10 Our OCC model is very mininal and POWER10 can simply reuse the OCC model we introduced for POWER9. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:39 +01:00
Cédric Le Goater	da71b7e3ed	ppc/pnv: Add a XIVE2 controller to the POWER10 chip The XIVE2 interrupt controller of the POWER10 processor follows the same logic than on POWER9 but the HW interface has been largely reviewed. It has a new register interface, different BARs, extra VSDs, new layout for the XIVE2 structures, and a set of new features which are described below. This is a model of the POWER10 XIVE2 interrupt controller for the PowerNV machine. It focuses primarily on the needs of the skiboot firmware but some initial hypervisor support is implemented for KVM use (escalation). Support for new features will be implemented in time and will require new support from the OS. * XIVE2 BARS The interrupt controller BARs have a different layout outlined below. Each sub-engine has now own its range and the indirect TIMA access was replaced with a set of pages, one per CPU, under the IC BAR: - IC BAR (Interrupt Controller) . 4 pages, one per sub-engine . 128 indirect TIMA pages - TM BAR (Thread Interrupt Management Area) . 4 pages - ESB BAR (ESB pages for IPIs) . up to 1TB - END BAR (ESB pages for ENDs) . up to 2TB - NVC BAR (Notification Virtual Crowd) . up to 128 - NVPG BAR (Notification Virtual Process and Group) . up to 1TB - Direct mapped Thread Context Area (reads & writes) OPAL does not use the grouping and crowd capability. * Virtual Structure Tables XIVE2 adds new tables types and also changes the field layout of the END and NVP Virtualization Structure Descriptors. - EAS - END new layout - NVT was splitted in : . NVP (Processor), 32B . NVG (Group), 32B . NVC (Crowd == P9 block group) 32B - IC for remote configuration - SYNC for cache injection - ERQ for event input queue The setup is slighly different on XIVE2 because the indexing has changed for some of the tables, block ID or the chip topology ID can be used. * XIVE2 features SCOM and MMIO registers have a new layout and XIVE2 adds a new global capability and configuration registers. The lowlevel hardware offers a set of new features among which : - a configurable number of priorities : 1 - 8 - StoreEOI with load-after-store ordering is activated by default - Gen2 TIMA layout - A P9-compat mode, or Gen1, TIMA toggle bit for SW compatibility - increase to 24bit for VP number Other features will have some impact on the Hypervisor and guest OS when activated, but this is not required for initial support of the controller. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:38 +01:00
Cédric Le Goater	09a67f3d0e	ppc/xive2: Introduce a presenter matching routine The VP space is larger in XIVE2 (P10), 24 bits instead of 19bits on XIVE (P9), and the CAM line can use a 7bits or 8bits thread id. For now, we only use 7bits thread ids, same as P9, but because of the change of the size of the VP space, the CAM matching routine is different between P9 and P10. It is easier to duplicate the whole routine than to add extra handlers in xive_presenter_tctx_match() used for P9. We might come with a better solution later on, after we have added some more support for the XIVE2 controller. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:38 +01:00
Cédric Le Goater	f8a233dedf	ppc/xive2: Introduce a XIVE2 core framework The XIVE2 interrupt controller of the POWER10 processor as the same logic as on POWER9 but its SW interface has been largely reworked. The interrupt controller has a new register interface, different BARs, extra VSDs. These will be described when we add the device model for the baremetal machine. The XIVE internal structures for the EAS, END, NVT have different layouts which is a problem for the current core XIVE framework. To avoid adding too much complexity in the XIVE models, a new XIVE2 core framework is introduced. It duplicates the models which are closely linked to the XIVE internal structures : Xive2Router and Xive2ENDSource and reuses the XiveSource, XivePresenter, XiveTCTX models, as they are more generic. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-03-02 06:51:38 +01:00
Nicholas Piggin	120f738a46	spapr: implement nested-hv capability for the virtual hypervisor This implements the Nested KVM HV hcall API for spapr under TCG. The L2 is switched in when the H_ENTER_NESTED hcall is made, and the L1 is switched back in returned from the hcall when a HV exception is sent to the vhyp. Register state is copied in and out according to the nested KVM HV hcall API specification. The hdecr timer is started when the L2 is switched in, and it provides the HDEC / 0x980 return to L1. The MMU re-uses the bare metal radix 2-level page table walker by using the get_pate method to point the MMU to the nested partition table entry. MMU faults due to partition scope errors raise HV exceptions and accordingly are routed back to the L1. The MMU does not tag translations for the L1 (direct) vs L2 (nested) guests, so the TLB is flushed on any L1<->L2 transition (hcall entry and exit). Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-10-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-02-18 08:34:14 +01:00
Nicholas Piggin	93aeb70210	ppc: allow the hdecr timer to be created/destroyed Machines which don't emulate the HDEC facility are able to use the timer for something else. Provide functions to start and stop the hdecr timer. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [ clg: checkpatch fixes ] Message-Id: <20220216102545.1808018-4-npiggin@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-02-18 08:34:14 +01:00
Shivaprasad G Bhat	b5513584a0	spapr: nvdimm: Implement H_SCM_FLUSH hcall The patch adds support for the SCM flush hcall for the nvdimm devices. To be available for exploitation by guest through the next patch. The hcall is applicable only for new SPAPR specific device class which is also introduced in this patch. The hcall expects the semantics such that the flush to return with H_LONG_BUSY_ORDER_10_MSEC when the operation is expected to take longer time along with a continue_token. The hcall to be called again by providing the continue_token to get the status. So, all fresh requests are put into a 'pending' list and flush worker is submitted to the thread pool. The thread pool completion callbacks move the requests to 'completed' list, which are cleaned up after collecting the return status for the guest in subsequent hcall from the guest. The semantics makes it necessary to preserve the continue_tokens and their return status across migrations. So, the completed flush states are forwarded to the destination and the pending ones are restarted at the destination in post_load. The necessary nvdimm flush specific vmstate structures are also introduced in this patch which are to be saved in the new SPAPR specific nvdimm device to be introduced in the following patch. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <164396254862.109112.16675611182159105748.stgit@ltczzess4.aus.stglabs.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-02-18 08:34:14 +01:00
Philippe Mathieu-Daudé	dc10da64e1	hw/ppc/vof: Add missing includes vof.h requires "qom/object.h" for DECLARE_CLASS_CHECKERS(), "exec/memory.h" for address_space_read/write(), "exec/address-spaces.h" for address_space_memory and more importantly "cpu.h" for target_ulong. vof.c doesn't need "exec/ram_addr.h". Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220122003104.84391-1-f4bug@amsat.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-01-28 13:15:03 +01:00
Cédric Le Goater	eb93c82888	ppc/pnv: Move num_phbs under Pnv8Chip It is not used elsewhere so that's where it belongs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220105212338.49899-10-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-01-12 11:28:27 +01:00
Cédric Le Goater	c29dd0034d	ppc/pnv: Reparent user created PHB3 devices to the PnvChip The powernv machine uses the object hierarchy to populate the device tree and each device should be parented to the chip it belongs to. This is not the case for user created devices which are parented to the container "/unattached". Make sure a PHB3 device is parented to its chip by reparenting the object if necessary. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220105212338.49899-8-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-01-12 11:28:27 +01:00
Cédric Le Goater	1f6a88fffc	ppc/pnv: Introduce support for user created PHB3 devices PHB3 devices and PCI devices can now be added to the powernv8 machine using : -device pnv-phb3,chip-id=0,index=1 \ -device nec-usb-xhci,bus=pci.1,addr=0x0 The 'index' property identifies the PHB3 in the chip. In case of user created devices, a lookup on 'chip-id' is required to assign the owning chip. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20220105212338.49899-7-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-01-12 11:28:27 +01:00
Cédric Le Goater	a71cd51e2a	ppc/pnv: Attach PHB3 root port device when defaults are enabled This cleanups the PHB3 model a bit more since the root port is an independent device and it will ease our task when adding user created PHB3s. pnv_phb_attach_root_port() is made public in pnv.c so it can be reused with the pnv_phb4 root port later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20220105212338.49899-4-danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2022-01-12 11:28:27 +01:00
Philippe Mathieu-Daudé	cd1db8df74	dma: Let ld*_dma() propagate MemTxResult dma_memory_read() returns a MemTxResult type. Do not discard it, return it to the caller. Update the few callers. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211223115554.3155328-19-philmd@redhat.com>	2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé	34cdea1db6	dma: Let ld_dma() take MemTxAttrs argument Let devices specify transaction attributes when calling ld_dma(). Keep the default MEMTXATTRS_UNSPECIFIED in the few callers. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211223115554.3155328-17-philmd@redhat.com>	2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé	2280c27afc	dma: Let st_dma() take MemTxAttrs argument Let devices specify transaction attributes when calling st_dma(). Keep the default MEMTXATTRS_UNSPECIFIED in the few callers. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211223115554.3155328-16-philmd@redhat.com>	2021-12-31 01:05:27 +01:00
Philippe Mathieu-Daudé	ba06fe8add	dma: Let dma_memory_read/write() take MemTxAttrs argument Let devices specify transaction attributes when calling dma_memory_read() or dma_memory_write(). Patch created mechanically using spatch with this script: @@ expression E1, E2, E3, E4; @@ ( - dma_memory_read(E1, E2, E3, E4) + dma_memory_read(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED) \| - dma_memory_write(E1, E2, E3, E4) + dma_memory_write(E1, E2, E3, E4, MEMTXATTRS_UNSPECIFIED) ) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211223115554.3155328-6-philmd@redhat.com>	2021-12-30 17:16:32 +01:00
Philippe Mathieu-Daudé	7a36e42d91	dma: Let dma_memory_set() take MemTxAttrs argument Let devices specify transaction attributes when calling dma_memory_set(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211223115554.3155328-3-philmd@redhat.com>	2021-12-30 17:16:32 +01:00
Philippe Mathieu-Daudé	7ccb391ccd	dma: Let dma_memory_valid() take MemTxAttrs argument Let devices specify transaction attributes when calling dma_memory_valid(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211223115554.3155328-2-philmd@redhat.com>	2021-12-30 17:16:32 +01:00
Cédric Le Goater	422fd92e61	ppc/pnv: Introduce a num_pecs class attribute for PHB4 PEC devices POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and each PEC can have several PHBs : * PEC0 provides 1 PHB (PHB0) * PEC1 provides 2 PHBs (PHB1 and PHB2) * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5) A num_pecs class attribute represents better the logic units of the POWER9 chip. Use that instead of num_phbs which fits POWER8 chips. This will ease adding support for user created devices. Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20211213132830.108372-8-clg@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org>	2021-12-17 17:57:19 +01:00
Cédric Le Goater	621f70d210	spapr/xive: Add source status helpers and use them to set and test the ASSERTED bit of LSI sources. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20211004212141.432954-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-10-21 11:42:47 +11:00
Bin Meng	06caae8af0	hw/intc: openpic: Clean up the styles Correct the multi-line comment format. No functional changes. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-3-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Bin Meng	86229b68a2	hw/intc: openpic: Drop Raven related codes There is no machine that uses Motorola MCP750 (aka Raven) model. Drop the related codes. While we are here, drop the mentioning of Intel GW80314 I/O companion chip in the comments as it has been obsolete for years, and correct a typo too. Signed-off-by: Bin Meng <bin.meng@windriver.com> Message-Id: <20210918032653.646370-2-bin.meng@windriver.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	e0eb84d4f5	spapr_numa.c: FORM2 NUMA affinity support The main feature of FORM2 affinity support is the separation of NUMA distances from ibm,associativity information. This allows for a more flexible and straightforward NUMA distance assignment without relying on complex associations between several levels of NUMA via ibm,associativity matches. Another feature is its extensibility. This base support contains the facilities for NUMA distance assignment, but in the future more facilities will be added for latency, performance, bandwidth and so on. This patch implements the base FORM2 affinity support as follows: - the use of FORM2 associativity is indicated by using bit 2 of byte 5 of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1 or FORM2 affinity. Setting both forms will default to FORM2. We're not advertising FORM2 for pseries-6.1 and older machine versions to prevent guest visible changes in those; - ibm,associativity-reference-points has a new semantic. Instead of being used to calculate distances via NUMA levels, it's now used to indicate the primary domain index in the ibm,associativity domain of each resource. In our case it's set to {0x4}, matching the position where we already place logical_domain_id; - two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table and ibm,numa-distance-table. The index table is used to list all the NUMA logical domains of the platform, in ascending order, and allows for spartial NUMA configurations (although QEMU ATM doesn't support that). ibm,numa-distance-table is an array that contains all the distances from the first NUMA node to all other nodes, then the second NUMA node distances to all other nodes and so on; - get_max_dist_ref_points(), get_numa_assoc_size() and get_associativity() now checks for OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2 affinity during CAS. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-7-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	5dab5abe62	spapr: move FORM1 verifications to post CAS FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less) NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM device that has a different latency than the original NUMA node from the regular memory. FORM2 is also able to deal with asymmetric NUMA distances gracefully, something that our FORM1 implementation doesn't do. Move these FORM1 verifications to a new function and wait until after CAS, when we're sure that we're sticking with FORM1, to enforce them. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	a165ac67c3	spapr_numa.c: rename numa_assoc_array to FORM1_assoc_array Introducing a new NUMA affinity, FORM2, requires a new mechanism to switch between affinity modes after CAS. Also, we want FORM2 data structures and functions to be completely separated from the existing FORM1 code, allowing us to avoid adding new code that inherits the existing complexity of FORM1. The idea of switching values used by the write_dt() functions in spapr_numa.c was already introduced in the previous patch, and the same approach will be used when dealing with the FORM1 and FORM2 arrays. We can accomplish that by that by renaming the existing numa_assoc_array to FORM1_assoc_array, which now is used exclusively to handle FORM1 affinity data. A new helper get_associativity() is then introduced to be used by the write_dt() functions to retrieve the current ibm,associativity array of a given node, after considering affinity selection that might have been done during CAS. All code that was using numa_assoc_array now needs to retrieve the array by calling this function. This will allow for an easier plug of FORM2 data later on. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Daniel Henrique Barboza	3a6e4ce684	spapr_numa.c: parametrize FORM1 macros The next preliminary step to introduce NUMA FORM2 affinity is to make the existing code independent of FORM1 macros and values, i.e. MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch accomplishes that by doing the following: - move the NUMA related macros from spapr.h to spapr_numa.c where they are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used to refer to the maximum number of NUMA nodes, including GPU nodes, that the machine can support; - MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific macros are used in FORM1 init functions; - code that uses MAX_DISTANCE_REF_POINTS now retrieves the max_dist_ref_points value using get_max_dist_ref_points(). NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE is replaced by get_vcpu_assoc_size(). These functions are used by the generic device tree functions and h_home_node_associativity() and will allow them to switch between FORM1 and FORM2 without changing their core logic. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210920174947.556324-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-30 12:26:06 +10:00
Cédric Le Goater	92612f1550	ppc/pnv: Rename "id" to "quad-id" in PnvQuad This to avoid possible conflicts with the "id" property of QOM objects. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Cédric Le Goater	daf115cf9a	ppc/xive: Export xive_tctx_word2() helper Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Cédric Le Goater	89d2468d96	ppc/xive: Export priority_to_ipb() helper Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210901094153.227671-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-09-29 19:37:38 +10:00
Cédric Le Goater	dd4e4d1296	ppc/xive: Export xive_presenter_notify() It's generic enough to be used from the XIVE2 router and avoid more duplication. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210809134547.689560-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-08-27 12:41:13 +10:00
Cédric Le Goater	fb8dc327f4	ppc/xive: Export PQ get/set routines These will be shared with the XIVE2 router. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210809134547.689560-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-08-27 12:41:13 +10:00
Cédric Le Goater	ab17a3fe74	ppc/pnv: Use a simple incrementing index for the chip-id When the QEMU PowerNV machine was introduced, multi chip support modeled a two socket system with dual chip modules as found on some P8 Tuleta systems (8286-42A). But this is hardly used and not relevant for QEMU. Use a simple index instead. With this change, we can now increase the max socket number to 16 as found on high end systems. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210809134547.689560-5-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-08-27 12:41:13 +10:00
Cédric Le Goater	6bc8c04648	ppc/pnv: Change the POWER10 machine to support DD2 only There is no need to keep the DD1 chip model as it will never be publicly available. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210809134547.689560-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-08-27 12:41:13 +10:00
Alexey Kardashevskiy	14c7e06e72	ppc/vof: Fix Coverity issues Coverity reported issues which are caused by mixing of signed return codes from DTC and unsigned return codes of the client interface. This introduces PROM_ERROR and makes distinction between the error types. This fixes NEGATIVE_RETURNS, OVERRUN issues reported by Coverity. This adds a comment about the return parameters number in the VOF hcall. The reason for such counting is to keep the numbers look the same in vof_client_handle() and the Linux (an OF client). vmc->client_architecture_support() returns target_ulong and we want to propagate this to the client (for example H_MULTI_THREADS_ACTIVE). The VOF path to do_client_architecture_support() needs chopping off the top 32bit but SLOF's H_CAS does not; and either way the return values are either 0 or 32bit negative error code. For now this chops the top 32bits. This makes "claim" fail if the allocated address is above 4GB as the client interface is 32bit. This still allows claiming memory above 4GB as potentially initrd can be put there and the client can read the address from the FDT's "available" property. Fixes: CID 1458139, 1458138, 1458137, 1458133, 1458132 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210720050726.2737405-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-29 10:59:49 +10:00
Bharata B Rao	82123b756a	target/ppc: Support for H_RPT_INVALIDATE hcall If KVM_CAP_RPT_INVALIDATE KVM capability is enabled, then - indicate the availability of H_RPT_INVALIDATE hcall to the guest via ibm,hypertas-functions property. - Enable the hcall Both the above are done only if the new sPAPR machine capability cap-rpt-invalidate is set. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20210706112440.1449562-3-bharata@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 11:01:06 +10:00
Alexey Kardashevskiy	21bde1ecb6	spapr: Fix implementation of Open Firmware client interface This addresses the comments from v22. The functional changes are (the VOF ones need retesting with Pegasos2): (VOF) setprop will start failing if the machine class callback did not handle it; (VOF) unit addresses are lowered in path_offset(); (SPAPR) /chosen/bootargs is initialized from kernel_cmdline if the client did not change it. Fixes: 5c991e5d4378 ("spapr: Implement Open Firmware client interface") Cc: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210708065625.548396-1-aik@ozlabs.ru> Tested-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:55:11 +10:00
Nicholas Piggin	17fd09c021	target/ppc/spapr: Update H_GET_CPU_CHARACTERISTICS L1D cache flush bits There are several new L1D cache flush bits added to the hcall which reflect hardware security features for speculative cache access issues. These behaviours are now being specified as negative in order to simplify patched kernel compatibility with older firmware (a new problem found in existing systems would automatically be vulnerable). [dwg: Technically this changes behaviour for existing machine types. After discussion with Nick, we've determined this is safe, because the worst that will happen if a guest gets the wrong information due to a migration is that it will perform some unnecessary workarounds, but will remain correct and secure (well, as secure as it was going to be anyway). In addition the change only affects cap-cfpc=safe which is not enabled by default, and in fact is not possible to set on any current hardware (though it's expected it will be possible on POWER10)] Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210615044107.1481608-1-npiggin@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy	fc8c745d50	spapr: Implement Open Firmware client interface The PAPR platform describes an OS environment that's presented by a combination of a hypervisor and firmware. The features it specifies require collaboration between the firmware and the hypervisor. Since the beginning, the runtime component of the firmware (RTAS) has been implemented as a 20 byte shim which simply forwards it to a hypercall implemented in qemu. The boot time firmware component is SLOF - but a build that's specific to qemu, and has always needed to be updated in sync with it. Even though we've managed to limit the amount of runtime communication we need between qemu and SLOF, there's some, and it has become increasingly awkward to handle as we've implemented new features. This implements a boot time OF client interface (CI) which is enabled by a new "x-vof" pseries machine option (stands for "Virtual Open Firmware). When enabled, QEMU implements the custom H_OF_CLIENT hcall which implements Open Firmware Client Interface (OF CI). This allows using a smaller stateless firmware which does not have to manage the device tree. The new "vof.bin" firmware image is included with source code under pc-bios/. It also includes RTAS blob. This implements a handful of CI methods just to get -kernel/-initrd working. In particular, this implements the device tree fetching and simple memory allocator - "claim" (an OF CI memory allocator) and updates "/memory@0/available" to report the client about available memory. This implements changing some device tree properties which we know how to deal with, the rest is ignored. To allow changes, this skips fdt_pack() when x-vof=on as not packing the blob leaves some room for appending. In absence of SLOF, this assigns phandles to device tree nodes to make device tree traversing work. When x-vof=on, this adds "/chosen" every time QEMU (re)builds a tree. This adds basic instances support which are managed by a hash map ihandle -> [phandle]. Before the guest started, the used memory is: 0..e60 - the initial firmware 8000..10000 - stack 400000.. - kernel 3ea0000.. - initramdisk This OF CI does not implement "interpret". Unlike SLOF, this does not format uninitialized nvram. Instead, this includes a disk image with pre-formatted nvram. With this basic support, this can only boot into kernel directly. However this is just enough for the petitboot kernel and initradmdisk to boot from any possible source. Note this requires reasonably recent guest kernel with: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=df5be5be8735 The immediate benefit is much faster booting time which especially crucial with fully emulated early CPU bring up environments. Also this may come handy when/if GRUB-in-the-userspace sees light of the day. This separates VOF and sPAPR in a hope that VOF bits may be reused by other POWERPC boards which do not support pSeries. This assumes potential support for booting from QEMU backends such as blockdev or netdev without devices/drivers used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210625055155.2252896-1-aik@ozlabs.ru> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> [dwg: Adjusted some includes which broke compile in some more obscure compilation setups] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:19 +10:00
Alexey Kardashevskiy	7381c5d11f	spapr: tune rtas-size QEMU reserves space for RTAS via /rtas/rtas-size which tells the client how much space the RTAS requires to work which includes the RTAS binary blob implementing RTAS runtime. Because pseries supports FWNMI which requires plenty of space, QEMU reserves more than 2KB which is enough for the RTAS blob as it is just 20 bytes (under QEMU). Since FWNMI reset delivery was added, RTAS_SIZE macro is not used anymore. This replaces RTAS_SIZE with RTAS_MIN_SIZE and uses it in the /rtas/rtas-size calculation to account for the RTAS blob. Fixes: `0e236d3477` ("ppc/spapr: Implement FWNMI System Reset delivery") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210622070336.1463250-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-07-09 10:38:18 +10:00
Shivaprasad G Bhat	f93c8f148c	spapr: nvdimm: Forward declare and move the definitions The subsequent patches add definitions which tend to get the compilation to cyclic dependency. So, prepare with forward declarations, move the definitions and clean up. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <162133925415.610.11584121797866216417.stgit@4f1e6f2bd33e> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-06-03 13:22:06 +10:00
Greg Kurz	3bf0844f3b	spapr: Don't hijack current_machine->boot_order QEMU 6.0 moved all the -boot variables to the machine. Especially, the removal of the boot_order static changed the handling of '-boot once' from: if (boot_once) { qemu_boot_set(boot_once, &error_fatal); qemu_register_reset(restore_boot_order, g_strdup(boot_order)); } to if (current_machine->boot_once) { qemu_boot_set(current_machine->boot_once, &error_fatal); qemu_register_reset(restore_boot_order, g_strdup(current_machine->boot_order)); } This means that we now register as subsequent boot order a copy of current_machine->boot_once that was just set with the previous call to qemu_boot_set(), i.e. we never transition away from the once boot order. It is certainly fragile^Wwrong for the spapr code to hijack a field of the base machine type object like that. The boot order rework simply turned this software boundary violation into an actual bug. Have the spapr code to handle that with its own field in SpaprMachineState. Also kfree() the initial boot device string when "once" was used. Fixes: `4b7acd2ac8` ("vl: clean up -boot variables") Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1960119 Cc: pbonzini@redhat.com Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20210521160735.1901914-1-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-06-03 13:22:06 +10:00
Lucas Mateus Castro (alqotel)	962104f044	hw/ppc: moved hcalls that depend on softmmu The hypercalls h_enter, h_remove, h_bulk_remove, h_protect, and h_read, have been moved to spapr_softmmu.c with the functions they depend on. The functions is_ram_address and push_sregs_to_kvm_pr are not static anymore as functions on both spapr_hcall.c and spapr_softmmu.c depend on them. The hypercalls h_resize_hpt_prepare and h_resize_hpt_commit have been divided, the KVM part stayed in spapr_hcall.c while the softmmu part was moved to spapr_softmmu.c Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br> Message-Id: <20210506163941.106984-2-lucas.araujo@eldorado.org.br> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-19 10:30:28 +10:00
Fabiano Rosas	068479e1e1	hw/ppc/spapr.c: Extract MMU mode error reporting into a function A following patch will make use of it. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Message-Id: <20210505001130.3999968-2-farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-19 10:30:28 +10:00
Daniel Henrique Barboza	b7573092ab	spapr.h: increase FDT_MAX_SIZE Certain SMP topologies stress, e.g. 1 thread/core, 2048 cores and 1 socket, stress the current maximum size of the pSeries FDT: Calling ibm,client-architecture-support...qemu-system-ppc64: error creating device tree: (fdt_setprop(fdt, offset, "ibm,processor-segment-sizes", segs, sizeof(segs))): FDT_ERR_NOSPACE 2048 is the default NR_CPUS value for the pSeries kernel. It's expected that users will want QEMU to be able to handle this kind of configuration. Bumping FDT_MAX_SIZE to 2MB is enough for these setups to be created. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210408204049.221802-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-04 11:41:25 +10:00
Ravi Bangoria	a7913d5e3f	ppc: Rename current DAWR macros and variables Power10 is introducing second DAWR. Use real register names (with suffix 0) from ISA for current macros and variables used by Qemu. One exception to this is KVM_REG_PPC_DAWR[X]. This is from kernel uapi header and thus not changed in kernel as well as Qemu. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20210412114433.129702-3-ravi.bangoria@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-04 11:41:25 +10:00
Vaibhav Jain	53d7d7e2b1	ppc/spapr: Add support for implement support for H_SCM_HEALTH Add support for H_SCM_HEALTH hcall described at [1] for spapr nvdimms. This enables guest to detect the 'unarmed' status of a specific spapr nvdimm identified by its DRC and if its unarmed, mark the region backed by the nvdimm as read-only. The patch adds h_scm_health() to handle the H_SCM_HEALTH hcall which returns two 64-bit bitmaps (health bitmap, health bitmap mask) derived from 'struct nvdimm->unarmed' member. Linux kernel side changes to enable handling of 'unarmed' nvdimms for ppc64 are proposed at [2]. References: [1] "Hypercall Op-codes (hcalls)" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/powerpc/papr_hcalls.rst#n220 [2] "powerpc/papr_scm: Mark nvdimm as unarmed if needed during probe" https://lore.kernel.org/linux-nvdimm/20210329113103.476760-1-vaibhav@linux.ibm.com/ Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Message-Id: <20210402102128.213943-1-vaibhav@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-05-04 11:41:25 +10:00
Daniel Henrique Barboza	d522cb52e6	spapr: rollback 'unplug timeout' for CPU hotunplugs The pseries machines introduced the concept of 'unplug timeout' for CPU hotunplugs. The idea was to circunvent a deficiency in the pSeries specification (PAPR), that currently does not define a proper way for the hotunplug to fail. If the guest refuses to release the CPU (see [1] for an example) there is no way for QEMU to detect the failure. Further discussions about how to send a QAPI event to inform about the hotunplug timeout [2] exposed problems that weren't predicted back when the idea was developed. Other QEMU machines don't have any type of hotunplug timeout mechanism for any device, e.g. ACPI based machines have a way to make hotunplug errors visible to the hypervisor. This would make this timeout mechanism exclusive to pSeries, which is not ideal. The real problem is that a QAPI event that reports hotunplug timeouts puts the management layer (namely Libvirt) in a weird spot. We're not telling that the hotunplug failed, because we can't be 100% sure of that, and yet we're resetting the unplug state back, preventing any DEVICE_DEL events to reach out in case the guest decides to release the device. Libvirt would need to inspect the guest itself to see if the device was released or not, otherwise the internal domain states will be inconsistent. Moreover, Libvirt already has an 'unplug timeout' concept, and a QEMU side timeout would need to be juggled together with the existing Libvirt timeout. All this considered, this solution ended up creating more trouble than it solved. This patch reverts the 3 commits that introduced the timeout mechanism for CPU hotplugs in pSeries machines. This reverts commit `4515a5f786` "qemu_timer.c: add timer_deadline_ms() helper" This reverts commit `d1c2e3ce3d` "spapr_drc.c: add hotunplug timeout for CPUs" This reverts commit `51254ffb32` "spapr_drc.c: introduce unplug_timeout_timer" [1] https://bugzilla.redhat.com/show_bug.cgi?id=1911414 [2] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210401000437.131140-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-04-12 12:27:14 +10:00
Alexey Kardashevskiy	a40888bad6	spapr: Fix typo in the patb_entry comment There is no H_REGISTER_PROCESS_TABLE, it is H_REGISTER_PROC_TBL handler for which is still called h_register_process_table() though. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20210225032335.64245-1-aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-31 11:10:50 +11:00
Peter Maydell	1941858448	ppc patch queue for 2021-03-10 Next batch of patches for the ppc target and machine types. Includes: * Several cleanups for sm501 from Peter Maydell * An update to the SLOF guest firmware * Improved handling of hotplug failures in spapr, associated cleanups to the hotplug handling code * Several etsec fixes and cleanups from Bin Meng * Assorted other fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAmBIRlUACgkQbDjKyiDZ s5JGlxAApWKpxdtMwrxvQ7EX95XtDWY0v2Jpl3ZKLhYgWJ28pt1SfsDUlA9KhlDd syXITpyspECe9kjOAKEim4J0y5sMVlTw8KjzIVPMik4uyoLTOBwE+nRmwPnmnWEy 9ZH0J+QOonQYh3jCp7JbTGU2ZW5pJ9s/sv8bPbzXfrR07HbAJ2+MjUkTVxkSVJAq QUvo/jMntu+a1HFU8Eiw8VyyIcIOAQyS469xzUiHHzKFlR8XodE56Vj+oh6ZFtaA cB2h4U51uzGfpz+GISm3lZUHSVnWQSFwLAc4x66aRsnLiQ66iAu8N0jRh8lsoW0y FHF+uGp3AFUARHOiCRk0r7+s29gbu+lX2jogfddj+qj7mGIZXd2tMfrrG3eWsB2C HvNby4xzyyDaguHK7N0/C42B8OX5dy2pxOP5lvdzL20ip97AKRGXngyM7LhYH8yw 4uzdebYVFu0KkLri4Qzxjm/GxgzrCbWIe5ImsDIlnmY1cJ7NKQYPzFX56xqq147y 6USFQu7RM9E03vj3c9UIkmK0KhL8GQvYxX4dMWIUjtjeLGJuN5seKBkl5mH2OSEJ D9svKOanXmsZYS0A25VX9FRX263zbJ1HIkDmGzpLi7HULdRy78e89rJk6490WNDr mnLogO+ttBvhEaLUsIVrWwLd21JW/A2NHuEz0+KELr9ZOQMYRj8= =/uyx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dg-gitlab/tags/ppc-for-6.0-20210310' into staging ppc patch queue for 2021-03-10 Next batch of patches for the ppc target and machine types. Includes: * Several cleanups for sm501 from Peter Maydell * An update to the SLOF guest firmware * Improved handling of hotplug failures in spapr, associated cleanups to the hotplug handling code * Several etsec fixes and cleanups from Bin Meng * Assorted other fixes and cleanups # gpg: Signature made Wed 10 Mar 2021 04:08:53 GMT # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dg-gitlab/tags/ppc-for-6.0-20210310: spapr.c: send QAPI event when memory hotunplug fails spapr.c: remove duplicated assert in spapr_memory_unplug_request() target/ppc: fix icount support on Book-e vms accessing SPRs qemu_timer.c: add timer_deadline_ms() helper spapr_pci.c: add 'unplug already in progress' message for PCI unplug spapr.c: add 'unplug already in progress' message for PHB unplug hw/ppc: e500: Add missing <ranges> in the eTSEC node hw/net: fsl_etsec: Fix build error when HEX_DUMP is on spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state spapr_drc.c: add hotunplug timeout for CPUs spapr_drc.c: introduce unplug_timeout_timer target/ppc: Fix bcdsub. emulation when result overflows docs/system: Extend PPC section spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable pseries: Update SLOF firmware image spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() hw/display/sm501: Inline template header into C file hw/display/sm501: Expand out macros in template header hw/display/sm501: Remove dead code for non-32-bit RGB surfaces Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2021-03-12 11:30:55 +00:00
Daniel Henrique Barboza	eb7f80fd26	spapr.c: send QAPI event when memory hotunplug fails Recent changes allowed the pSeries machine to rollback the hotunplug process for the DIMM when the guest kernel signals, via a reconfiguration of the DR connector, that it's not going to release the LMBs. Let's also warn QAPI listerners about it. One place to do it would be right after the unplug state is cleaned up, spapr_clear_pending_dimm_unplug_state(). This would mean that the function is now doing more than cleaning up the pending dimm state though. This patch does the following changes in spapr.c: - send a QAPI event to inform that we experienced a failure in the hotunplug of the DIMM; - rename spapr_clear_pending_dimm_unplug_state() to spapr_memory_unplug_rollback(). This is a better fit for what the function is now doing, and it makes callers care more about what the function goal is and less about spapr.c internals such as clearing the pending dimm unplug state. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210302141019.153729-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza	fe1831eff8	spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Handling errors in memory hotunplug in the pSeries machine is more complex than any other device type, because there are all the complications that other devices has, and more. For instance, determining a timeout for a DIMM hotunplug must consider if it's a Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs. The size of the DIMM is also a factor, given that longer DIMMs naturally takes longer to be hotunplugged from the kernel. And there's also the guest memory usage to be considered: if there's a process that is consuming memory that would be lost by the DIMM unplug, the kernel will postpone the unplug process until the process finishes, and then initiate the regular hotunplug process. The first two considerations are manageable, but the last one is a deal breaker. There is no sane way for the pSeries machine to determine the memory load in the guest when attempting a DIMM hotunplug - and even if there was a way, the guest can start using all the RAM in the middle of the unplug process and invalidate our previous assumptions - and in result we can't even begin to calculate a timeout for the operation. This means that we can't implement a viable timeout mechanism for memory unplug in pSeries. Going back to why we would consider an unplug timeout, the reason is that we can't know if the kernel is giving up the unplug. Turns out that, sometimes, we can. Consider a failed memory hotunplug attempt where the kernel will error out with the following message: 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs' This happens when there is a LMB that the kernel gave up in removing, and the LMBs previously marked for removal are now being added back. This happens in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and after that update_lmb_associativity_index(). In this function, the kernel is configuring the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in section "ibm,configure-connector RTAS Call": 'A subsequent sequence of calls to ibm,configure-connector with the same entry from the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of devices which were not completely configured.' We can use this kernel behavior in our favor. If a DRC connector reconfiguration for a LMB that we marked as unplug pending happens, this indicates that the kernel changed its mind about the unplug and is reasserting that it will keep using all the LMBs of the DIMM. In this case, it's safe to assume that the whole DIMM device unplug was cancelled. This patch hops into rtas_ibm_configure_connector() and, in the scenario described above, clear the unplug state for the DIMM device. This will not solve all the problems we still have with memory unplug, but it will cover this case where the kernel reconfigures LMBs after a failed unplug. We are a bit more resilient, without using an unreliable timeout, and we didn't make the remaining error cases any worse. [1] arch/powerpc/platforms/pseries/hotplug-memory.c Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-6-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza	d1c2e3ce3d	spapr_drc.c: add hotunplug timeout for CPUs There is a reliable way to make a CPU hotunplug fail in the pseries machine. Hotplug a CPU A, then offline all other CPUs inside the guest but A. When trying to hotunplug A the guest kernel will refuse to do it, because A is now the last online CPU of the guest. PAPR has no 'error callback' in this situation to report back to the platform, so the guest kernel will deny the unplug in silent and QEMU will never know what happened. The unplug pending state of A will remain until the guest is shutdown or rebooted. Previous attempts of fixing it (see [1] and [2]) were aimed at trying to mitigate the effects of the problem. In [1] we were trying to guess which guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU layer, avoiding the scenario described above because QEMU is now failing in behalf of the guest. This is not robust because the last online CPU of the guest can change while we're in the middle of the unplug process, and our initial assumptions are now invalid. In [2] we were accepting that our unplug process is uncertain and the user should be allowed to spam the IRQ hotunplug queue of the guest in case the CPU hotunplug fails. This patch presents another alternative, using the timeout infrastructure introduced in the previous patch. CPU hotunplugs in the pSeries machine will now timeout after 15 seconds. This is a long time for a single CPU unplug to occur, regardless of guest load - although the user is strongly encouraged to not hotunplug devices from a guest under high load - and we can be sure that something went wrong if it takes longer than that for the guest to release the CPU (the same can't be said about memory hotunplug - more on that in the next patch). Timing out the unplug operation will reset the unplug state of the CPU and allow the user to try it again, regardless of the error situation that prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations for CPU hotunplug errors in pSeries, timing out the operation is an admission that we have no control in the process, and must assume the worst case if the operation doesn't succeed in a sensible time frame. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html Reported-by: Xujun Ma <xuma@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414 Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-5-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza	51254ffb32	spapr_drc.c: introduce unplug_timeout_timer The LoPAR spec provides no way for the guest kernel to report failure of hotplug/hotunplug events. This wouldn't be bad if those operations were granted to always succeed, but that's far for the reality. What ends up happening is that, in the case of a failed hotunplug, regardless of whether it was a QEMU error or a guest misbehavior, the pSeries machine is retaining the unplug state of the device in the running guest. This state is cleanup in machine reset, where it is assumed that this state represents a device that is pending unplug, and the device is hotunpluged from the board. Until the reset occurs, any hotunplug operation of the same device is forbid because there is a pending unplug state. This behavior has at least one undesirable side effect. A long standing pending unplug state is, more often than not, the result of a hotunplug error. The user had to dealt with it, since retrying to unplug the device is noy allowed, and then in the machine reset we're removing the device from the guest. This means that we're failing the user twice - failed to hotunplug when asked, then hotunplugged without notice. Solutions to this problem range between trying to predict when the hotunplug will fail and forbid the operation from the QEMU layer, from opening up the IRQ queue to allow for multiple hotunplug attempts, from telling the users to 'reboot the machine if something goes wrong'. The first solution is flawed because we can't fully predict guest behavior from QEMU, the second solution is a trial and error remediation that counts on a hope that the unplug will eventually succeed, and the third is ... well. This patch introduces a crude, but effective solution to hotunplug errors in the pSeries machine. For each unplug done, we'll timeout after some time. If a certain amount of time passes, we'll cleanup the hotunplug state from the machine. During the timeout period, any unplug operations in the same device will still be blocked. After that, we'll assume that the guest failed the operation, and allow the user to try again. If the timeout is too short we'll prevent legitimate hotunplug situations to occur, so we'll need to overestimate the regular time an unplug operation takes to succeed to account that. The true solution for the hotunplug errors in the pSeries machines is a PAPR change to allow for the guest to warn the platform about it. For now, the work done in this timeout design can be used for the new PAPR 'abort hcall' in the future, given that for both cases we'll need code to cleanup the existing unplug states of the DRCs. At this moment we're adding the basic wiring of the timer into the DRC. Next patch will use the timer to timeout failed CPU hotunplugs. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-4-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-10 09:07:09 +11:00
Daniel Henrique Barboza	a03509cd2b	spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() spapr_drc_detach() is not the best name for what the function does. The function does not detach the DRC, it makes an uncommited attempt to do it. It'll mark the DRC as pending unplug, via the 'unplug_request' flag, and only if the DRC state is drck->empty_state it will detach the DRC, via spapr_drc_release(). This is a contrast with its pair spapr_drc_attach(), where the function is indeed creating the DRC QOM object. If you know what spapr_drc_attach() does, you can be misled into thinking that spapr_drc_detach() is removing the DRC from QEMU internal state, which isn't true. The current role of this function is better described as a request for detach, since there's no guarantee that we're going to detach the DRC in the end. Rename the function to spapr_drc_unplug_request to reflect what is is doing. The initial idea was to change the name to spapr_drc_detach_request(), and later on change the unplug_request flag to detach_request. However, unplug_request is a migratable boolean for a long time now and renaming it is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request is more natural than spapr_drc_detach_request setting drc->unplug_request. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210222194531.62717-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-03-10 09:07:08 +11:00
Philippe Mathieu-Daudé	d32335e8ed	exec/memory: Use struct Object typedef We forward-declare Object typedef in "qemu/typedefs.h" since commit `ca27b5eb7c` ("qom/object: Move Object typedef to 'qemu/typedefs.h'"). Use it everywhere to make the code simpler. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20210225182003.3629342-1-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2021-03-09 21:53:57 +01:00
Daniel Henrique Barboza	6640706972	spapr_numa.c: create spapr_numa_initial_nvgpu_numa_id() helper We'll need to check the initial value given to spapr->gpu_numa_id when building the rtas DT, so put it in a helper for easier access and to avoid repetition. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Daniel Henrique Barboza	3b880445e6	spapr: move spapr_machine_using_legacy_numa() to spapr_numa.c This function is used only in spapr_numa.c. Tested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210128174213.1349181-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	032c226bc6	ppc/pnv: Introduce a LPC FW memory region attribute to map the PNOR This to map the PNOR from the machine init handler directly and finish the cleanup of the LPC model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-8-clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
Cédric Le Goater	cb9428642e	ppc/xive: Add firmware bit when dumping the ENDs ENDs allocated by OPAL for the HW thread VPs are tagged as owned by FW. Dump the state in 'info pic'. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20210126171059.307867-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-02-10 10:43:50 +11:00
David Gibson	6c8ebe30ea	spapr: Add PEF based confidential guest support Some upcoming POWER machines have a system called PEF (Protected Execution Facility) which uses a small ultravisor to allow guests to run in a way that they can't be eavesdropped by the hypervisor. The effect is roughly similar to AMD SEV, although the mechanisms are quite different. Most of the work of this is done between the guest, KVM and the ultravisor, with little need for involvement by qemu. However qemu does need to tell KVM to allow secure VMs. Because the availability of secure mode is a guest visible difference which depends on having the right hardware and firmware, we don't enable this by default. In order to run a secure guest you need to create a "pef-guest" object and set the confidential-guest-support property to point to it. Note that this just allows secure guests, the architecture of PEF is such that the guest still needs to talk to the ultravisor to enter secure mode. Qemu has no direct way of knowing if the guest is in secure mode, and certainly can't know until well after machine creation time. To start a PEF-capable guest, use the command line options: -object pef-guest,id=pef0 -machine confidential-guest-support=pef0 Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org>	2021-02-08 16:57:38 +11:00
Daniel Henrique Barboza	eb72b63988	spapr_hcall.c: make do_client_architecture_support static The function is called only inside spapr_hcall.c. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210114180628.1675603-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-19 10:20:29 +11:00
Daniel Henrique Barboza	bb51f2fae7	spapr.h: fix trailing whitespace in phb_placement This whitespace was messing with lots of diffs if you happen to use an editor that eliminates trailing whitespaces on file save. Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20210114180628.1675603-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-19 10:20:29 +11:00
Greg Kurz	73598c75df	spapr: Improve handling of memory unplug with old guests Since commit `1e8b5b1aa1` ("spapr: Allow memory unplug to always succeed") trying to unplug memory from a guest that doesn't support it (eg. rhel6) no longer generates an error like it used to. Instead, it leaves the memory around : only a subsequent reboot or manual use of drmgr within the guest can complete the hot-unplug sequence. A flag was added to SpaprMachineClass so that this new behavior only applies to the default machine type. We can do better. CAS processes all pending hot-unplug requests. This means that we don't really care about what the guest supports if the hot-unplug request happens before CAS. All guests that we care for, even old ones, set enough bits in OV5 that lead to a non-empty bitmap in spapr->ov5_cas. Use that as a heuristic to decide if CAS has already occured or not. Always accept unplug requests that happen before CAS since CAS will process them. Restore the previous behavior of rejecting them after CAS when we know that the guest doesn't support memory hot-unplug. This behavior is suitable for all machine types : this allows to drop the pre_6_0_memory_unplug flag. Fixes: `1e8b5b1aa1` ("spapr: Allow memory unplug to always succeed") Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <161012708715.801107.11418801796987916516.stgit@bahia.lan> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-19 10:20:29 +11:00
Peter Maydell	f7c4acf572	hw/ppc: Remove unused ppcuic_init() Now we've converted all the callsites to directly create the QOM UIC device themselves, the ppcuic_init() function is unused and can be removed. The enum defining PPCUIC symbolic constants can be moved to the ppc-uic.h header where it more naturally belongs. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-Id: <20210108171212.16500-5-peter.maydell@linaro.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-19 10:20:29 +11:00
Greg Kurz	babb819f94	spapr: Introduce spapr_drc_reset_all() No need to expose the way DRCs are traversed outside of spapr_drc.c. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-4-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	930ef3b5c2	spapr: Fix reset of transient DR connectors Documentation of object_property_iter_init() clearly stipulates that "it is forbidden to modify the property list while iterating". But this is exactly what we do when resetting transient DR connectors during CAS. The call to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a PCI bridge, both of which will then in turn destroy their PCI DRCs. This could potentially invalidate the iterator. It is pure luck that this haven't caused any issues so far. Change spapr_drc_reset() to return true if it caused a device to be removed. Restart from scratch in this case. This can potentially increase the overall DRC reset time, especially with a high maxmem which generates a lot of LMB DRCs. But this kind of setup is rare, and so is the use case of rebooting a guest while doing hot-unplug. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-3-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	cd725bd748	spapr: Call spapr_drc_reset() for all DRCs at CAS Non-transient DRCs are either in the empty or the ready state, which means spapr_drc_reset() doesn't change their state. It is thus not needed to do any checking. Call spapr_drc_reset() unconditionally and squash spapr_drc_transient() into its only user, spapr_drc_needed(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201218103400.689660-2-groug@kaod.org> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	30499fdd98	spapr: Fix buffer overflow in spapr_numa_associativity_init() Running a guest with 128 NUMA nodes crashes QEMU: ../../util/error.c:59: error_setv: Assertion `errp == NULL' failed. The crash happens when setting the FWNMI migration blocker: 2861 if (spapr_get_cap(spapr, SPAPR_CAP_FWNMI) == SPAPR_CAP_ON) { 2862 / Create the error string for live migration blocker / 2863 error_setg(&spapr->fwnmi_migration_blocker, 2864 "A machine check is being handled during migration. The handler" 2865 "may run and log hardware error on the destination"); 2866 } Inspection reveals that papr->fwnmi_migration_blocker isn't NULL: (gdb) p spapr->fwnmi_migration_blocker $1 = (Error ) 0x8000000004000000 Since this is the only place where papr->fwnmi_migration_blocker is set, this means someone wrote there in our back. Further analysis points to spapr_numa_associativity_init(), especially the part that initializes the associative arrays for NVLink GPUs: max_nodes_with_gpus = nb_numa_nodes + NVGPU_MAX_NUM; ie. max_nodes_with_gpus = 128 + 6, but the array isn't sized to accommodate the 6 extra nodes: struct SpaprMachineState { . . . uint32_t numa_assoc_array[MAX_NODES][NUMA_ASSOC_SIZE]; Error *fwnmi_migration_blocker; }; and the following loops happily overwrite spapr->fwnmi_migration_blocker, and probably more: for (i = nb_numa_nodes; i < max_nodes_with_gpus; i++) { spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS); for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) { uint32_t gpu_assoc = smc->pre_5_1_assoc_refpoints ? SPAPR_GPU_NUMA_ID : cpu_to_be32(i); spapr->numa_assoc_array[i][j] = gpu_assoc; } spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i); } Fix the size of the array. This requires "hw/ppc/spapr.h" to see NVGPU_MAX_NUM. Including "hw/pci-host/spapr.h" introduces a circular dependency that breaks the build, so this moves the definition of NVGPU_MAX_NUM to "hw/ppc/spapr.h" instead. Reported-by: Min Deng <mdeng@redhat.com> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1908693 Fixes: `dd7e1d7ae4` ("spapr_numa: move NVLink2 associativity handling to spapr_numa.c") Cc: danielhb413@gmail.com Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160829960428.734871.12634150161215429514.stgit@bahia.lan> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	1e8b5b1aa1	spapr: Allow memory unplug to always succeed It is currently impossible to hot-unplug a memory device between machine reset and CAS. (qemu) device_del dimm1 Error: Memory hot unplug not supported for this guest This limitation was introduced in order to provide an explicit error path for older guests that didn't support hot-plug event sources (and thus memory hot-unplug). The linux kernel has been supporting these since 4.11. All recent enough guests are thus capable of handling the removal of a memory device at all time, including during early boot. Lift the limitation for the latest machine type. This means that trying to unplug memory from a guest that doesn't support it will likely just do nothing and the memory will only get removed at next reboot. Such older guests can still get the existing behavior by using an older machine type. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160794035064.23292.17560963281911312439.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Cédric Le Goater	ab9c93c25c	spapr/xive: Make spapr_xive_pic_print_info() static Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20201215174025.2636824-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2021-01-06 11:09:59 +11:00
Greg Kurz	c4c81d7d51	spapr: Pass sPAPR machine state down to spapr_pci_switch_vga() This allows to drop a user of qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201209170052.1431440-4-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:54:12 +11:00
Greg Kurz	bc370a659a	spapr: spapr_drc_attach() cannot fail All users are passing &error_abort already. Document the fact that spapr_drc_attach() should only be passed a free DRC, which is supposedly the case if appropriate checking is done earlier. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201201113728.885700-5-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:54:12 +11:00
Greg Kurz	f5598c92b8	spapr: Make PHB placement functions and spapr_pre_plug_phb() return status Read documentation in "qapi/error.h" and changelog of commit `e3fe3988d7` ("error: Document Error API usage rules") for rationale. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-7-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:50:55 +11:00
Greg Kurz	ea042c53f4	spapr: Do NVDIMM/PC-DIMM device hotplug sanity checks at pre-plug only Pre-plug of a memory device, be it an NVDIMM or a PC-DIMM, ensures that the memory slot is available and that addresses don't overlap with existing memory regions. The corresponding DRCs in the LMB and PMEM namespaces are thus necessarily attachable at plug time. Pass &error_abort to spapr_drc_attach() in spapr_add_lmbs() and spapr_add_nvdimm(). This allows to greatly simplify error handling on the plug path. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120234208.683521-3-groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:50:55 +11:00
Greg Kurz	0b66209d9f	spapr/xics: Drop unused argument to xics_kvm_has_broken_disconnect() Never used from the start. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <20201120174646.619395-6-groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-12-14 15:50:55 +11:00

1 2 3 4 5 ...

815 Commits