mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Yuval Shaia	028c3f93d6	hw/rdma: Initialize node_guid from vmxnet3 mac address node_guid should be set once device is load. Make node_guid be GID format (32 bit) of PCI function 0 vmxnet3 device's MAC. A new function was added to do the conversion. So for example the MAC 56:b6:44:e9:62:dc will be converted to GID 54b6:44ff:fee9:62dc. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	d961ead16e	hw/pvrdma: Make sure PCI function 0 is vmxnet3 Guest driver enforces it, we should also. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	317639aafd	vmxnet3: Move some definitions to header file pvrdma setup requires vmxnet3 device on PCI function 0 and PVRDMA device on PCI function 1. pvrdma device needs to access vmxnet3 device object for several reasons: 1. Make sure PCI function 0 is vmxnet3. 2. To monitor vmxnet3 device state. 3. To configure node_guid accoring to vmxnet3 device's MAC address. To be able to access vmxnet3 device the definition of VMXNET3State is moved to a new header file. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	2b05705dc8	hw/pvrdma: Add support to allow guest to configure GID table The control over the RDMA device's GID table is done by updating the device's Ethernet function addresses. Usually the first GID entry is determined by the MAC address, the second by the first IPv6 address and the third by the IPv4 address. Other entries can be added by adding more IP addresses. The opposite is the same, i.e. whenever an address is removed, the corresponding GID entry is removed. The process is done by the network and RDMA stacks. Whenever an address is added the ib_core driver is notified and calls the device driver add_gid function which in turn update the device. To support this in pvrdma device we need to hook into the create_bind and destroy_bind HW commands triggered by pvrdma driver in guest. Whenever a change is made to the pvrdma port's GID table a special QMP message is sent to be processed by libvirt to update the address of the backend Ethernet device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	4a5c9903f3	qapi: Define new QMP message for pvrdma pvrdma requires that the same GID attached to it will be attached to the backend device in the host. A new QMP messages is defined so pvrdma device can broadcast any change made to its GID table. This event is captured by libvirt which in turn will update the GID table in the backend device. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	1625bb13da	hw/pvrdma: Set the correct opcode for send completion opcode for WC should be set by the device and not taken from work element. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	2bff59e699	hw/pvrdma: Set the correct opcode for recv completion The function pvrdma_post_cqe populates CQE entry with opcode from the given completion element. For receive operation value was not set. Fix it by setting it to IBV_WC_RECV. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	536692ca1d	hw/pvrdma: Make default pkey 0xFFFF Commit `6e7dba23af` ("hw/pvrdma: Make default pkey 0xFFFF") exports default pkey as external definition but omit the change from 0x7FFF to 0xFFFF. Fixes: `6e7dba23af` ("hw/pvrdma: Make default pkey 0xFFFF") Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	f00c48caab	hw/pvrdma: Make function reset_device return void This function cannot fail - fix it to return void Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	605ec1663b	hw/rdma: Add support for MAD packets MAD (Management Datagram) packets are widely used by various modules both in kernel and in user space for example the rdma_* API which is used to create and maintain "connection" layer on top of RDMA uses several types of MAD packets. For more information please refer to chapter 13.4 in Volume 1 Architecture Specification, Release 1.1 available here: https://www.infinibandta.org/ibta-specifications-download/ To support MAD packets the device uses an external utility (contrib/rdmacm-mux) to relay packets from and to the guest driver. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	305bdd7a57	hw/rdma: Abort send-op if fail to create addr handler Function create_ah might return NULL, let's exit with an error. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	46462cb161	hw/rdma: Return qpn 1 if ibqp is NULL Device is not supporting QP0, only QP1. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	4082e533f6	hw/rdma: Add ability to force notification without re-arm Upon completion of incoming packet the device pushes CQE to driver's RX ring and notify the driver (msix). While for data-path incoming packets the driver needs the ability to control whether it wished to receive interrupts or not, for control-path packets such as incoming MAD the driver needs to be notified anyway, it even do not need to re-arm the notification bit. Enhance the notification field to support this. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	a5d2f6f877	contrib/rdmacm-mux: Add implementation of RDMA User MAD multiplexer RDMA MAD kernel module (ibcm) disallow more than one MAD-agent for a given MAD class. This does not go hand-by-hand with qemu pvrdma device's requirements where each VM is MAD agent. Fix it by adding implementation of RDMA MAD multiplexer service which on one hand register as a sole MAD agent with the kernel module and on the other hand gives service to more than one VM. Design Overview: Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> ---------------- A server process is registered to UMAD framework (for this to work the rdma_cm kernel module needs to be unloaded) and creates a unix socket to listen to incoming request from clients. A client process (such as QEMU) connects to this unix socket and registers with its own GID. TX: ---- When client needs to send rdma_cm MAD message it construct it the same way as without this multiplexer, i.e. creates a umad packet but this time it writes its content to the socket instead of calling umad_send(). The server, upon receiving such a message fetch local_comm_id from it so a context for this session can be maintain and relay the message to UMAD layer by calling umad_send(). RX: ---- The server creates a worker thread to process incoming rdma_cm MAD messages. When an incoming message arrived (umad_recv()) the server, depending on the message type (attr_id) looks for target client by either searching in gid->fd table or in local_comm_id->fd table. With the extracted fd the server relays to incoming message to the client. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Yuval Shaia	dee2e53c86	hw/pvrdma: Check the correct return value Return value of 0 means ok, we want to free the memory only in case of error. Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com> Message-Id: <20181025061700.17050-1-yuval.shaia@oracle.com> Reviewed-by: Marcel Apfelbaum<marcel.apfelbaum@gmail.com> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>	2018-12-22 11:09:56 +02:00
Peter Maydell	891ff9f4a3	ppc patch queue 2018-12-21 This pull request supersedes the one from 2018-12-13. This is a revised first ppc pull request for qemu-4.0. Highlights are: * Most of the code for the POWER9 "XIVE" interrupt controller (not complete yet, but we're getting there) * A number of g_new vs. g_malloc cleanups * Some IRQ wiring cleanups * A fix for how we advertise NUMA nodes to the guest for pseries -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlwce1QACgkQbDjKyiDZ s5JqlhAAhES3+UoNHa/eTf/o2OOgIgZ3A2FVP1kGQbb3Q415hxx1blpYkBDk6sUg POEgwbj8QMvJ8npOICb2NnHNsAhKRfgGxx/lqVgxLPTDdwGBq7Jr1lfhyX4D99WD C2oLtJWvQrA7yIsDzurMjJpFvw8SYSogppom4lqE5667pm6U0j7JggFJkwIo+VAj jzl706vvB6/EL3PHZ8eCzsxT2oRpxxMStE3lJ1JPpKc60mFb5gkXMk3hura1L8Ez t4NEN9I4ePivXlh6YYDp7Pv5l9JSzKV7Uu8xrYeMdz33e0jUWyoERrdhM51mI4s1 WGoQm6eL6p0jngAUPYtAdIGC6ZGaCMT5rkoDZ4K+us94kVvdqzWjQyRNp84GpQq0 Z/sxJaTSK2DZMnQL3LE19upk7XkB5uBgnjs5T5FcFia7bIDG3p8MY4VwIM4dRum9 WuirEUJRKg28eTTnuK9NQX2+MEnrRWc/FSNaBLjxrijD4C4jHogXTpssNprYnkV7 HgkQ2MaidcnNLftfOUeBr0aTx+rGqtUB56Xas1UK+WqykKVfRdZ6hnbRg30FJvQ/ X4SIc/QZcLcA78C/SvCuXa1uqWqlrZMhN2e5r+eiEXaFYyriWgMYe9w+mXKbrTsb ZPkbax6xz1esqOZ15ytCTneQONyhMXy5iFDfb0khO4DO0uXq4dA= =IN1s -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20181221' into staging ppc patch queue 2018-12-21 This pull request supersedes the one from 2018-12-13. This is a revised first ppc pull request for qemu-4.0. Highlights are: * Most of the code for the POWER9 "XIVE" interrupt controller (not complete yet, but we're getting there) * A number of g_new vs. g_malloc cleanups * Some IRQ wiring cleanups * A fix for how we advertise NUMA nodes to the guest for pseries # gpg: Signature made Fri 21 Dec 2018 05:34:12 GMT # gpg: using RSA key 6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.0-20181221: (40 commits) MAINTAINERS: PPC: add a XIVE section spapr: change default CPU type to POWER9 spapr: introduce an 'ic-mode' machine option spapr: add an extra OV5 field to the sPAPR IRQ backend spapr: add a 'reset' method to the sPAPR IRQ backend spapr: extend the sPAPR IRQ backend for XICS migration spapr: allocate the interrupt thread context under the CPU core spapr: add device tree support for the XIVE exploitation mode spapr: add hcalls support for the XIVE exploitation interrupt mode spapr: introduce a new machine IRQ backend for XIVE spapr-iommu: Always advertise the maximum possible DMA window size spapr/xive: use the VCPU id as a NVT identifier spapr/xive: introduce a XIVE interrupt controller ppc/xive: notify the CPU when the interrupt priority is more privileged ppc/xive: introduce a simplified XIVE presenter ppc/xive: introduce the XIVE interrupt thread context ppc/xive: add support for the END Event State Buffers Changes requirement for "vsubsbs" instruction spapr: export and rename the xics_max_server_number() routine spapr: introduce a spapr_irq_init() routine ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-12-21 15:49:59 +00:00
Peter Maydell	15763776bf	pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJcG967AAoJECgfDbjSjVRpUlMH/1wy8WW7Br/4JxlWUPsfZTqZ 0Lg2n59wuFzRVS+gLotp6bUaJGR9xn9fKjI1wfD59oVrDTKyauuw82v0OityEs33 ZquFecuJvP6k5N40DkqA9YJjKSw7nUmLrsyrD0t2H43npikP2aD9f6yootrr3oVV nPwBvyT9ykIBQc7IzsHDiLw3EPdIf+9IR7+l+PLptzV0zK43Jfwi/nzHIQq00UMz eLM/ejQLIx4BZJnYGS0Cy6v3K7cS3k45LHDY0cGc0id2jHFN2vdQyHCF9I1ps72q pSlhMaLEwvQSYyre6iFTG5uuvyIPWv3LOkaBEwMSA5B/HXuEb2RKHThYzS9dc68= =OwW7 -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging pci, pc, virtio: fixes, features VTD fixes IR and split irqchip are now the default for Q35 ACPI refactoring hotplug refactoring new names for virtio devices multiple pcie link width/speeds PCI fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 20 Dec 2018 18:26:03 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (44 commits) x86-iommu: turn on IR by default if proper x86-iommu: switch intr_supported to OnOffAuto type q35: set split kernel irqchip as default pci: Adjust PCI config limit based on bus topology spapr_pci: perform unplug via the hotplug handler pci/shpc: perform unplug via the hotplug handler pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge pci/pcie: perform unplug via the hotplug handler pci/pcihp: perform unplug via the hotplug handler pci/pcihp: overwrite hotplug handler recursively from the start pci/pcihp: perform check for bus capability in pre_plug handler s390x/pci: rename hotplug handler callbacks pci/shpc: rename hotplug handler callbacks pci/pcie: rename hotplug handler callbacks hw/i386: Remove deprecated machines pc-0.10 and pc-0.11 hw: acpi: Remove AcpiRsdpDescriptor and fix tests hw: acpi: Export and share the ARM RSDP build hw: arm: Support both legacy and current RSDP build hw: arm: Convert the RSDP build to the buid_append_foo() API hw: arm: Carry RSDP specific data through AcpiRsdpData ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2018-12-21 14:06:01 +00:00
Cédric Le Goater	b62c6e1237	MAINTAINERS: PPC: add a XIVE section Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	34a6b015a9	spapr: change default CPU type to POWER9 Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	3ba3d0bc33	spapr: introduce an 'ic-mode' machine option This option is used to select the interrupt controller mode (XICS or XIVE) with which the machine will operate. XICS being the default mode for now. When running a machine with the XIVE interrupt mode backend, the guest OS is required to have support for the XIVE exploitation mode. In the case of legacy OS, the mode selected by CAS should be XICS and the OS should fail to boot. However, QEMU could possibly detect it, terminate the boot process and reset to stop in the SLOF firmware. This is not yet handled. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	db592b5b16	spapr: add an extra OV5 field to the sPAPR IRQ backend The interrupt modes supported by the hypervisor are advertised to the guest with new bits definitions of the option vector 5 of property "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are defined as follow : 0b00 PAPR 2.7 and earlier (Legacy systems) 0b01 XIVE Exploitation mode only 0b10 Either available If the client/guest selects the XIVE interrupt mode, it informs the hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00 value indicates the use of the XICS interrupt mode (Legacy systems). The sPAPR IRQ backend is extended with these definitions and the values are directly used to populate the "ibm,arch-vec-5-platform-support" property. The interrupt mode is advertised under TCG and under KVM. Although a KVM XIVE device is not yet available, the machine can still operate with kernel_irqchip=off. However, we apply a restriction on the CPU which is required to be a POWER9 when a XIVE interrupt controller is in use. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:43 +11:00
Cédric Le Goater	b2e2247716	spapr: add a 'reset' method to the sPAPR IRQ backend For the time being, the XIVE reset handler updates the OS CAM line of the vCPU as it is done under a real hypervisor when a vCPU is scheduled to run on a HW thread. This will let the XIVE presenter engine find a match among the NVTs dispatched on the HW threads. This handler will become even more useful when we introduce the machine supporting both interrupt modes, XIVE and XICS. In this machine, the interrupt mode is chosen by the CAS negotiation process and activated after a reset. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:40:35 +11:00
Cédric Le Goater	1c53b06c03	spapr: extend the sPAPR IRQ backend for XICS migration Introduce a new sPAPR IRQ handler to handle resend after migration when the machine is using a KVM XICS interrupt controller model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:39:13 +11:00
Cédric Le Goater	1a937ad7e7	spapr: allocate the interrupt thread context under the CPU core Each interrupt mode has its own specific interrupt presenter object, that we store under the CPU object, one for XICS and one for XIVE. Extend the sPAPR IRQ backend with a new handler to support them both. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:39:13 +11:00
Cédric Le Goater	6e21de4a50	spapr: add device tree support for the XIVE exploitation mode The XIVE interface for the guest is described in the device tree under the "interrupt-controller" node. A couple of new properties are specific to XIVE : - "reg" contains the base address and size of the thread interrupt managnement areas (TIMA), for the User level and for the Guest OS level. Only the Guest OS level is taken into account today. - "ibm,xive-eq-sizes" the size of the event queues. One cell per size supported, contains log2 of size, in ascending order. - "ibm,xive-lisn-ranges" the IRQ interrupt number ranges assigned to the guest for the IPIs. and also under the root node : - "ibm,plat-res-int-priorities" contains a list of priorities that the hypervisor has reserved for its own use. OPAL uses the priority 7 queue to automatically escalate interrupts for all other queues (DD2.X POWER9). So only priorities [0..6] are allowed for the guest. Extend the sPAPR IRQ backend with a new handler to populate the DT with the appropriate "interrupt-controller" node. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:39:07 +11:00
Cédric Le Goater	23bcd5eb9a	spapr: add hcalls support for the XIVE exploitation interrupt mode The different XIVE virtualization structures (sources and event queues) are configured with a set of Hypervisor calls : - H_INT_GET_SOURCE_INFO used to obtain the address of the MMIO page of the Event State Buffer (ESB) entry associated with the source. - H_INT_SET_SOURCE_CONFIG assigns a source to a "target". - H_INT_GET_SOURCE_CONFIG determines which "target" and "priority" is assigned to a source - H_INT_GET_QUEUE_INFO returns the address of the notification management page associated with the specified "target" and "priority". - H_INT_SET_QUEUE_CONFIG sets or resets the event queue for a given "target" and "priority". It is also used to set the notification configuration associated with the queue, only unconditional notification is supported for the moment. Reset is performed with a queue size of 0 and queueing is disabled in that case. - H_INT_GET_QUEUE_CONFIG returns the queue settings for a given "target" and "priority". - H_INT_RESET resets all of the guest's internal interrupt structures to their initial state, losing all configuration set via the hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG. - H_INT_SYNC issue a synchronisation on a source to make sure all notifications have reached their queue. Calls that still need to be addressed : H_INT_SET_OS_REPORTING_LINE H_INT_GET_OS_REPORTING_LINE See the code for more documentation on each hcall. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
Cédric Le Goater	dcc345b61e	spapr: introduce a new machine IRQ backend for XIVE The XIVE IRQ backend uses the same layout as the new XICS backend but covers the full range of the IRQ number space. The IRQ numbers for the CPU IPIs are allocated at the bottom of this space, below 4K, to preserve compatibility with XICS which does not use that range. This should be enough given that the maximum number of CPUs is 1024 for the sPAPR machine under QEMU. For the record, the biggest POWER8 or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192 cores, SMT8). Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
Alexey Kardashevskiy	8994e91e96	spapr-iommu: Always advertise the maximum possible DMA window size When deciding about the huge DMA window, the typical Linux pseries guest uses the maximum allowed RAM size as the upper limit. We did the same on QEMU side to match that logic. Now we are going to support a GPU RAM pass through which is not available at the guest boot time as it requires the guest driver interaction. As the result, the guest requests a smaller window than it should. Therefore the guest needs to be patched to understand this new memory and so does QEMU. Instead of reimplementing here whatever solution we choose for the guest, this advertises the biggest possible window size limited by 32 bit (as defined by LoPAPR). Since the window size has to be power-of-two (the create rtas call receives a window shift, not a size), this uses 0x8000.0000 as the maximum number of TCEs possible (rather than 32bit maximum of 0xffff.ffff). This is safe as: 1. The guest visible emulated table is allocated in KVM (actual pages are allocated in page fault handler) and QEMU (actual pages are allocated when updated); 2. The hardware table (and corresponding userspace address table) supports sparse allocation and also checks for locked_vm limit so it is unable to cause the host any damage. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
Cédric Le Goater	0cddee8d48	spapr/xive: use the VCPU id as a NVT identifier The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts to find a matching Notification Virtual Target (NVT) among the NVTs dispatched on the HW processor threads. On a real system, the thread interrupt contexts are updated by the hypervisor when a Virtual Processor is scheduled to run on a HW thread. Under QEMU, the model will emulate the same behavior by hardwiring the NVT identifier in the thread context registers at reset. The NVT identifier used by the sPAPRXive model is the VCPU id. The END identifier is also derived from the VCPU id. A set of helpers doing the conversion between identifiers are provided for the hcalls configuring the sources and the ENDs. The model does not need a NVT table but the XiveRouter NVT operations are provided to perform some extra checks in the routing algorithm. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
Cédric Le Goater	3aa597f650	spapr/xive: introduce a XIVE interrupt controller sPAPRXive models the XIVE interrupt controller of the sPAPR machine. It inherits from the XiveRouter and provisions storage for the routing tables : - Event Assignment Structure (EAS) - Event Notification Descriptor (END) The sPAPRXive model incorporates an internal XiveSource for the IPIs and for the interrupts of the virtual devices of the guest. This model is consistent with XIVE architecture which also incorporates an internal IVSE for IPIs and accelerator interrupts in the IVRE sub-engine. The sPAPRXive model exports two memory regions, one for the ESB trigger and management pages used to control the sources and one for the TIMA pages. They are mapped by default at the addresses found on chip 0 of a baremetal system. This is also consistent with the XIVE architecture which defines a Virtualization Controller BAR for the internal IVSE ESB pages and a Thread Managment BAR for the TIMA. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Fold in field accessor fixes] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:38 +11:00
Cédric Le Goater	cdd4de68ed	ppc/xive: notify the CPU when the interrupt priority is more privileged After the event data was enqueued in the O/S Event Queue, the IVPE raises the bit corresponding to the priority of the pending interrupt in the register IBP (Interrupt Pending Buffer) to indicate there is an event pending in one of the 8 priority queues. The Pending Interrupt Priority Register (PIPR) is also updated using the IPB. This register represent the priority of the most favored pending notification. The PIPR is then compared to the the Current Processor Priority Register (CPPR). If it is more favored (numerically less than), the CPU interrupt line is raised and the EO bit of the Notification Source Register (NSR) is updated to notify the presence of an exception for the O/S. The check needs to be done whenever the PIPR or the CPPR are changed. The O/S acknowledges the interrupt with a special load in the Thread Interrupt Management Area. If the EO bit of the NSR is set, the CPPR takes the value of PIPR. The bit number in the IBP corresponding to the priority of the pending interrupt is reseted and so is the EO bit of the NSR. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:33 +11:00
Cédric Le Goater	af53dbf622	ppc/xive: introduce a simplified XIVE presenter The last sub-engine of the XIVE architecture is the Interrupt Virtualization Presentation Engine (IVPE). On HW, the IVRE and the IVPE share elements, the Power Bus interface (CQ), the routing table descriptors, and they can be combined in the same HW logic. We do the same in QEMU and combine both engines in the XiveRouter for simplicity. When the IVRE has completed its job of matching an event source with a Notification Virtual Target (NVT) to notify, it forwards the event notification to the IVPE sub-engine. The IVPE scans the thread interrupt contexts of the Notification Virtual Targets (NVT) dispatched on the HW processor threads and if a match is found, it signals the thread. If not, the IVPE escalates the notification to some other targets and records the notification in a backlog queue. The IVPE maintains the thread interrupt context state for each of its NVTs not dispatched on HW processor threads in the Notification Virtual Target table (NVTT). The model currently only supports single NVT notifications. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:37:04 +11:00
Cédric Le Goater	207d9fe985	ppc/xive: introduce the XIVE interrupt thread context Each POWER9 processor chip has a XIVE presenter that can generate four different exceptions to its threads: - hypervisor exception, - O/S exception - Event-Based Branch (EBB) - msgsnd (doorbell). Each exception has a state independent from the others called a Thread Interrupt Management context. This context is a set of registers which lets the thread handle priority management and interrupt acknowledgment among other things. The most important ones being : - Interrupt Priority Register (PIPR) - Interrupt Pending Buffer (IPB) - Current Processor Priority (CPPR) - Notification Source Register (NSR) These registers are accessible through a specific MMIO region, called the Thread Interrupt Management Area (TIMA), four aligned pages, each exposing a different view of the registers. First page (page address ending in 0b00) gives access to the entire context and is reserved for the ring 0 view for the physical thread context. The second (page address ending in 0b01) is for the hypervisor, ring 1 view. The third (page address ending in 0b10) is for the operating system, ring 2 view. The fourth (page address ending in 0b11) is for user level, ring 3 view. The thread interrupt context is modeled with a XiveTCTX object containing the values of the different exception registers. The TIMA region is mapped at the same address for each CPU. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:29:12 +11:00
Cédric Le Goater	002686be42	ppc/xive: add support for the END Event State Buffers The Event Notification Descriptor (END) XIVE structure also contains two Event State Buffers providing further coalescing of interrupts, one for the notification event (ESn) and one for the escalation events (ESe). A MMIO page is assigned for each to control the EOI through loads only. Stores are not allowed. The END ESBs are modeled through an object resembling the 'XiveSource' It is stateless as the END state bits are backed into the XiveEND structure under the XiveRouter and the MMIO accesses follow the same rules as for the XiveSource ESBs. END ESBs are not supported by the Linux drivers neither on OPAL nor on sPAPR. Nevetherless, it provides a mean to study the question in the future and validates a bit more the XIVE model. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fold in a later fix for field access] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:29:12 +11:00
Paul A. Clarke	fcfbc18d00	Changes requirement for "vsubsbs" instruction Changes requirement for "vsubsbs" instruction, which has been supported since ISA 2.03. (Please see section 5.9.1.2 of ISA 2.03) Reported-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: Paul A. Clarke <pc@us.ibm.com> Signed-off-by: Leonardo Bras <leonardo@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:29:12 +11:00
Cédric Le Goater	1a518e7693	spapr: export and rename the xics_max_server_number() routine The XIVE sPAPR IRQ backend will use it to define the number of ENDs of the IC controller. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:29:10 +11:00
Cédric Le Goater	fab397d84a	spapr: introduce a spapr_irq_init() routine Initialize the MSI bitmap from it as this will be necessary for the sPAPR IRQ backend for XIVE. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:28:47 +11:00
Cédric Le Goater	482969d680	spapr: initialize VSMT before initializing the IRQ backend We will need to use xics_max_server_number() to create the sPAPRXive object modeling the interrupt controller of the machine which is created before the CPUs. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Fix style nit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:28:39 +11:00
Cédric Le Goater	e4ddaac67f	ppc/xive: introduce the XIVE Event Notification Descriptors To complete the event routing, the IVRE sub-engine uses a second table containing Event Notification Descriptor (END) structures. An END specifies on which Event Queue (EQ) the event notification data, defined in the associated EAS, should be posted when an exception occurs. It also defines which Notification Virtual Target (NVT) should be notified. The Event Queue is a memory page provided by the O/S defining a circular buffer, one per server and priority couple, containing Event Queue entries. These are 4 bytes long, the first bit being a 'generation' bit and the 31 following bits the END Data field. They are pulled by the O/S when the exception occurs. The END Data field is a way to set an invariant logical event source number for an IRQ. On sPAPR machines, it is set with the H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fold in a later fix from Cédric fixing field accessors] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:26:42 +11:00
Cédric Le Goater	7ff7ea9280	ppc/xive: introduce the XiveRouter model The XiveRouter models the second sub-engine of the XIVE architecture : the Interrupt Virtualization Routing Engine (IVRE). The IVRE handles event notifications of the IVSE and performs the interrupt routing process. For this purpose, it uses a set of tables stored in system memory, the first of which being the Event Assignment Structure (EAS) table. The EAT associates an interrupt source number with an Event Notification Descriptor (END) which will be used in a second phase of the routing process to identify a Notification Virtual Target. The XiveRouter is an abstract class which needs to be inherited from to define a storage for the EAT, and other upcoming tables. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Folded in parts of a later fix by Cédric fixing field access] [dwg: Fix style nits] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:26:31 +11:00
Cédric Le Goater	5e79b155a8	ppc/xive: introduce the XiveNotifier interface The XiveNotifier offers a simple interface, between the XiveSource object and the main interrupt controller of the machine. It will forward event notifications to the XIVE Interrupt Virtualization Routing Engine (IVRE). Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Adjust type name string for XiveNotifier] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Cédric Le Goater	5fd9ef18a9	ppc/xive: add support for the LSI interrupt sources The 'sent' status of the LSI interrupt source is modeled with the 'P' bit of the ESB and the assertion status of the source is maintained with an extra bit under the main XiveSource object. The type of the source is stored in the same array for practical reasons. Signed-off-by: Cédric Le Goater <clg@kaod.org> [dwg: Fix style nit] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Cédric Le Goater	02e3ff548d	ppc/xive: introduce a XIVE interrupt source model The first sub-engine of the overall XIVE architecture is the Interrupt Virtualization Source Engine (IVSE). An IVSE can be integrated into another logic, like in a PCI PHB or in the main interrupt controller to manage IPIs. Each IVSE instance is associated with an Event State Buffer (ESB) that contains a two bit state entry for each possible event source. When an event is signaled to the IVSE, by MMIO or some other means, the associated interrupt state bits are fetched from the ESB and modified. Depending on the resulting ESB state, the event is forwarded to the IVRE sub-engine of the controller doing the routing. Each supported ESB entry is associated with either a single or a even/odd pair of pages which provides commands to manage the source: to EOI, to turn off the source for instance. On a sPAPR machine, the O/S will obtain the page address of the ESB entry associated with a source and its characteristic using the H_INT_GET_SOURCE_INFO hcall. On PowerNV, a similar OPAL call is used. The xive_source_notify() routine is in charge forwarding the source event notification to the routing engine. It will be filled later on. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	2104d4f5bc	e500: simplify IRQ wiring The OpenPIC have 5 outputs per connected CPU. The machine init code hence needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs between the PIC and the CPUs. The current code first allocates an array of smp_cpus pointers to qemu_irq type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the first array with pointers to each line of the second array. This is rather convoluted. Simplify the logic by introducing a structured type that describes all the OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only allocate a smp_cpu sized array of those. This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n) as recommended in HACKING. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	9929301ee1	mac_newworld: simplify IRQ wiring The OpenPIC have 5 outputs per connected CPU. The machine init code hence needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs between the PIC and the CPUs. The current code first allocates an array of smp_cpus pointers to qemu_irq type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the first array with pointers to each line of the second array. This is rather convoluted. Simplify the logic by introducing a structured type that describes all the OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only allocate a smp_cpu sized array of those. This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n) as recommended in HACKING. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	57aa218818	virtex_ml507: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	0989e6d1f2	sam460ex: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	30f8ec7630	ppc440_bamboo: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	c4f46986fc	ppc405_uc: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00
Greg Kurz	779db4c7ca	ppc405_boards: use g_new(T, n) instead of g_malloc(sizeof(T) * n) Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2018-12-21 09:24:23 +11:00

1 2 3 4 5 ...

65648 Commits