This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)
The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.
This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.
The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.
This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.
These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.
This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.
TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PAPR interface defines a hypercall to pass high-quality
hardware generated random numbers to guests. Recent kernels can
already provide this hypercall to the guest if the right hardware
random number generator is available. But in case the user wants
to use another source like EGD, or QEMU is running with an older
kernel, we should also have this call in QEMU, so that guests that
do not support virtio-rng yet can get good random numbers, too.
This patch now adds a new pseudo-device to QEMU that either
directly provides this hypercall to the guest or is able to
enable the in-kernel hypercall if available. The in-kernel
hypercall can be enabled with the use-kvm property, e.g.:
qemu-system-ppc64 -device spapr-rng,use-kvm=true
For handling the hypercall in QEMU instead, a "RngBackend" is
required since the hypercall should provide "good" random data
instead of pseudo-random (like from a "simple" library function
like rand() or g_random_int()). Since there are multiple RngBackends
available, the user must select an appropriate back-end via the
"rng" property of the device, e.g.:
qemu-system-ppc64 -object rng-random,filename=/dev/hwrng,id=gid0 \
-device spapr-rng,rng=gid0 ...
See http://wiki.qemu-project.org/Features-Done/VirtIORNG for
other example of specifying RngBackends.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This device emulates a firmware abstraction used by pSeries guests to
manage hotplug/dynamic-reconfiguration of host-bridges, PCI devices,
memory, and CPUs. It is conceptually similar to an SHPC device,
complete with LED indicators to identify individual slots to physical
physical users and indicate when it is safe to remove a device. In
some cases it is also used to manage virtualized resources, such a
memory, CPUs, and physical-host bridges, which in the case of pSeries
guests are virtualized resources where the physical components are
managed by the host.
Guests communicate with these DR Connectors using RTAS calls,
generally by addressing the unique DRC index associated with a
particular connector for a particular resource. For introspection
purposes we expose this state initially as QOM properties, and
in subsequent patches will introduce the RTAS calls that make use of
it. This constitutes to the 'guest' interface.
On the QEMU side we provide an attach/detach interface to associate
or cleanup a DeviceState with a particular sPAPRDRConnector in
response to hotplug/unplug, respectively. This constitutes the
'physical' interface to the DR Connector.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment the RTAS (firmware/hypervisor) time of day functions are
implemented in spapr_rtas.c along with a bunch of other things. Since
we're going to be expanding these a bit, move the RTAS RTC related code
out into new file spapr_rtc.c. Also add its own initialization function,
spapr_rtc_init() called from the main machine init routine.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The virtex-ml507 is a Xilinx CPU based system, and requires several sub
devices which are only included with CONFIG_XILINX. Therefore, it should
only be compiled if CONFIG_XILINX is set.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
The patch adds a spapr-pci-vfio-host-bridge device type
which is a PCI Host Bridge with VFIO support. The new device
inherits from the spapr-pci-host-bridge device and adds an "iommu"
property which is an IOMMU id. This ID represents a minimal entity
for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
group is called a Partitionable Endpoint (PE).
Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
SPAPR allows multiple PHB for no extra cost, this does not seem to
be a problem. This limitation may change in the future though.
Example of use:
Configure and Add 3 functions of a multifunctional device to QEMU:
(the NEC PCI USB card is used as an example here):
-device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
-device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
-device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
-device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB
where:
* index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
offset);
* iommu=4 is an IOMMU id which can be found in sysfs:
[aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
[aik@vpl2 0004:00:00.0]$ ls -l iommu_group
lrwxrwxrwx 1 root root 0 Jun 5 12:49 iommu_group -> ../../../kernel/iommu_groups/4
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The PAPR specification requires a certain amount of NVRAM, accessed via
RTAS, which we don't currently implement in qemu. This patch addresses
this deficiency, implementing the NVRAM as a VIO device, with some glue to
instantiate it automatically based on a machine option.
The machine option specifies a drive id, which is used to back the NVRAM,
making it persistent. If nothing is specified, the driver instead simply
allocates space for the NVRAM, which will not be persistent
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
At present, using 'system_powerdown' from the monitor or otherwise
instructing qemu to (cleanly) shut down a pseries guest will not work,
because we did not have a method of signalling the shutdown request to the
guest.
PAPR does include a usable mechanism for this, though it is rather more
involved than the equivalent on x86. This involves sending an EPOW
(Environmental and POwer Warning) event through the PAPR event and error
logging mechanism, which also has a number of other functions.
This patch implements just enough of the event/error logging functionality
to be able to send a shutdown event to the guest. At least with modern
guest kernels and a userspace that is up and running, this means that
system_powerdown from the qemu monitor should now work correctly on pseries
guests.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This gives the kernel a paravirtualized machine to target, without
requiring both sides to pretend to be targeting a specific board
that likely has little to do with the host in KVM scenarios. This
avoids the need to add new boards to QEMU, just to be able to
run KVM on new CPUs.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the only mpc8544ds-ism that is factored out is
toplevel compatible and model. In the future the generic e500
code is expected to become more generic.
Signed-off-by: Scott Wood <scottwood@freescale.com>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
Rename the file (with no changes other than fixing up the header paths)
in preparation for refactoring into a generic e500 platform. Also move
it into the newly created ppc/ directory.
Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
[agraf: conditionalize on CONFIG_FDT]
Signed-off-by: Alexander Graf <agraf@suse.de>
The pseries platform already contains an IOMMU implementation, since it is
essential for the platform's paravirtualized VIO devices. This IOMMU
support is currently built into the implementation of the VIO "bus" and
the various VIO devices.
This patch converts this code to make use of the new common IOMMU
infrastructure.
We don't yet handle synchronization of map/unmap callbacks vs. invalidations,
this will require some complex interaction with the kernel and is not a
major concern at this stage.
Cc: Alex Graf <agraf@suse.de>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Now that we're moving all of the device tree generation from an external
pre-execution generated blob to runtime generation using libfdt, we absolutely
must have libfdt around.
This requirement was there before already, as the only way to not require libfdt
with e500 was to not use -kernel, which was the only way to boot the mpc8544ds
machine. This patch only manifests said requirement in the build system.
Signed-off-by: Alexander Graf <agraf@suse.de>
Speeds up the build.
xilinx_ethlite uses tswap32() and is thus target-dependent.
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
The file is located in target-ppc/, not hw/.
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>