This patch adds some hooks to let PCI devices and busses use the new IOMMU
infrastructure. When IOMMU support is enabled, each PCI device now
contains a DMAContext * which is used by the pci_dma_*() wrapper functions.
By default, the contexts are initialized to NULL, assuming no IOMMU.
However the platform or host bridge code which sets up the PCI bus can use
pci_setup_iommu() to set a function which will determine the correct
DMAContext for a given PCI device.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The pseries platform already contains an IOMMU implementation, since it is
essential for the platform's paravirtualized VIO devices. This IOMMU
support is currently built into the implementation of the VIO "bus" and
the various VIO devices.
This patch converts this code to make use of the new common IOMMU
infrastructure.
We don't yet handle synchronization of map/unmap callbacks vs. invalidations,
this will require some complex interaction with the kernel and is not a
major concern at this stage.
Cc: Alex Graf <agraf@suse.de>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This patch adds the basic infrastructure necessary to emulate an IOMMU
visible to the guest. The DMAContext structure is extended with
information and a callback describing the translation, and the various
DMA functions used by devices will now perform IOMMU translation using
this callback.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The USB UHCI and EHCI drivers were converted some time ago to use the
pci_dma_*() helper functions. However, this conversion was not complete
because in some places both these drivers do DMA via the usb_packet_map()
function in usb-libhw.c. That function directly used
cpu_physical_memory_map().
Now that the sglist code uses DMA wrappers properly, we can convert the
functions in usb-libhw.c, thus conpleting the conversion of UHCI and EHCI
to use the DMA wrappers.
Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
callback function. When IOMMU support is added, this will mean that
usb_packet_map() and the corresponding usb_packet_unmap() must be called in
close proximity without dropping the qemu device lock - otherwise the guest
might invalidate IOMMU mappings while they are still in use by the device
code.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The AHCI device can provide both PCI and SysBus AHCI device
emulations. For this reason, it wasn't previously converted to use
the pci_dma_*() helper functions. Now that we have universal DMA
helper functions, this converts AHCI to use them.
The DMAContext is obtained from pci_dma_context() in the PCI case and
set to NULL in the SysBus case (i.e. we assume for now that a SysBus
AHCI has no IOMMU translation).
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
dma-helpers.c contains a number of helper functions for doing
scatter/gather DMA, and various block device related DMA. Currently,
these directly access guest memory using cpu_physical_memory_*(),
assuming no IOMMU translation.
This patch updates this code to use the new universal DMA helper
functions. qemu_sglist_init() now takes a DMAContext * to describe
the DMA address space in which the scatter/gather will take place.
We minimally update the callers qemu_sglist_init() to pass NULL
(i.e. no translation, same as current behaviour). Some of those
callers should pass something else in some cases to allow proper IOMMU
translation in future, but that will be fixed in later patches.
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The OHCI device emulation can provide both PCI and SysBus OHCI
implementations. Because of this, it was not previously converted to
use the PCI DMA helper functions.
This patch converts it to use the new universal DMA helper functions.
In the PCI case, it obtains its DMAContext from pci_dma_context(), in
the SysBus case, it uses NULL - i.e. assumes for now that there will
be no IOMMU translation for a SysBus OHCI.
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Not that long ago, every device implementation using DMA directly
accessed guest memory using cpu_physical_memory_*(). This meant that
adding support for a guest visible IOMMU would require changing every
one of these devices to go through IOMMU translation.
Shortly before qemu 1.0, I made a start on fixing this by providing
helper functions for PCI DMA. These are currently just stubs which
call the direct access functions, but mean that an IOMMU can be
implemented in one place, rather than for every PCI device.
Clearly, this doesn't help for non PCI devices, which could also be
IOMMU translated on some platforms. It is also problematic for the
devices which have both PCI and non-PCI version (e.g. OHCI, AHCI) - we
cannot use the the pci_dma_*() functions, because they assume the
presence of a PCIDevice, but we don't want to have to check between
pci_dma_*() and cpu_physical_memory_*() every time we do a DMA in the
device code.
This patch makes the first step on addressing both these problems, by
introducing new (stub) dma helper functions which can be used for any
DMA capable device.
These dma functions take a DMAContext *, a new (currently empty)
variable describing the DMA address space in which the operation is to
take place. NULL indicates untranslated DMA directly into guest
physical address space. The intention is that in future non-NULL
values will given information about any necessary IOMMU translation.
DMA using devices must obtain a DMAContext (or, potentially, contexts)
from their bus or platform. For now this patch just converts the PCI
wrappers to be implemented in terms of the universal wrappers,
converting other drivers can take place over time.
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
A while back, we introduced the dma_addr_t type, which is supposed to
be used for bus visible memory addresses. At present, this is an
alias for target_phys_addr_t, but this will change when we eventually
add support for guest visible IOMMUs.
There are some instances of target_phys_addr_t in the code now which
should really be dma_addr_t, but can't be trivially converted due to
missing features which this patch corrects.
* We add DMA_ADDR_BITS analagous to TARGET_PHYS_ADDR_BITS. This is
important where we need to make a compile-time (#if) based on the
size of dma_addr_t.
* We add a new helper macro to create device properties which take a
dma_addr_t, currently an alias to DEFINE_PROP_TADDR().
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Commit ff71f2e8ca prevent the possible
crash during initialization of linux driver by checking the operating
mode.This seems too strict as:
- the real card could still work in mode other than normal
- some buggy driver who does not set correct opmode after eeprom
access
So, considering rx ring address were reset to zero (which could be
safely trated as an address not intened to DMA to), in order to
both letting old guest work and preventing the unexpected DMA to
guest, we can forbid packet receiving when rx ring address is zero.
Tested-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
From Markus:
Before:
$ qemu-system-x86_64 -display none -drive if=ide
qemu-system-x86_64: Device needs media, but drive is empty
qemu-system-x86_64: Initialization of device ide-hd failed
[Exit 1 ]
After:
$ qemu-system-x86_64 -display none -drive if=ide
qemu-system-x86_64: Device needs media, but drive is empty
Segmentation fault (core dumped)
[Exit 139 (SIGSEGV)]
This error always existed as qdev_init() frees the object. But QOM
goes a bit further and purposefully sets the class pointer to NULL to
help find use-after-free. It worked :-)
Cc: Andreas Faerber <afaerber@suse.de>
Reported-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* stefanha/trivial-patches:
tci: Support INDEX_op_bswap64_i64
target-i386: Use QEMU instead of Qemu
Makefile.hw: avoid overly large 'make clean' rm command
configure: Fix typo
arm_gic: Send dbg msgs to stderr not stdout
checkpatch: Add QEMU specific rule
qemu-config: Use QEMU instead of Qemu
libqtest: Fix socket_accept() to pass address_len
Makefile.user: Define CONFIG_USER_ONLY for libuser/
Makefile: Remove macro qapi-dir
Makefile: Remove BUILD_DIR from qapi-dir
Install 'bepo' keymap already included in Qemu source
* kraxel/usb.54:
uhci: fix uhci_async_cancel_all
usb-host: live migration support
usb-host: attach only to running guest
ehci: tracing improvements
usb: restore USBDevice->attached on vmload
ehci: add live migration support
* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf: (72 commits)
PPC: BookE206: Bump MAS2 to 64bit
PPC: BookE: Support 32 and 64 bit wide MAS2
PPC: Extract SPR dump generation into its own function
PPC: Add e5500 CPU target
PPC: BookE: Make ivpr selectable by CPU type
PPC: BookE: Implement EPR SPR
PPC: Add support for MSR_CM
PPC: Add some booke SPR defines
uImage: increase the gzip load size
PPC: e500: allow users to set the /compatible property via -machine
dt: make setprop argument static
PPC: e500: Refactor serial dt generation
dt: Add global option to set phandle start offset
PPC: e500: Extend address/size of / to 64bit
PPC: e500: Define addresses as always 64bit
PPC: e500: Use new SOC dt format
PPC: e500: Use new MPIC dt format
Revert "dt: temporarily disable subtree creation failure check"
PPC: e500: enable manual loading of dtb blob
PPC: e500: dt: use target_phys_addr_t for ramsize
...
* 's390-for-upstream' of git://repo.or.cz/qemu/agraf:
s390: stop target cpu on sigp initial reset
s390: make kvm_stat work on s390
kvm: Update kernel headers
s390x: fix s390 virtio aliases
* 'arm-devs.for-upstream' of git://git.linaro.org/people/pmaydell/qemu-arm:
arm_boot: Conditionalised DTB command line update
cadence_ttc: changed master clock frequency
cadence_gem: avoid stack-writing buffer-overrun
hw/a9mpcore: Fix compilation failure if physaddrs are 64 bit
hw/omap.h: Drop broken MEM_VERBOSE tracing
hw/armv7m_nvic: Make the NVIC a freestanding class
hw/arm_gic: Move CPU interface memory region setup into arm_gic_init
hw/arm_gic.c: Make NVIC interrupt numbering a runtime setting
hw/arm_gic: Make CPU target registers RAZ/WI on uniprocessor
hw/arm_gic: Add qdev property for GIC revision
hw/armv7m_nvic: Use MemoryRegions for NVIC specific registers
hw/arm_gic: Move NVIC specific reset to armv7m_nvic_reset
hw/arm_gic: Remove the special casing of NCPU for the NVIC
hw/arm_gic: Remove NVIC ifdefs from gic_state struct
arm_boot: Fix typos in comment
ARM: Exynos4210 IRQ: Introduce new IRQ gate functionality.
On the e500 series, accessing SPR_EPR magically turns into an access at
that CPU's IACK register on the MPIC. Implement that logic to get kernels
that make use of that feature work.
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent u-boot has different defines for its gzip extract buffer, but the
common ground seems to be 64MB. So let's bump it up to that, enabling me
to load my test image again ;).
Signed-off-by: Alexander Graf <agraf@suse.de>
Device trees usually have a node /compatible, which indicate which machine
type we're looking at. For quick prototyping, it can be very useful to change
the contents of that node via the command line.
Thus, introduce a new option to -machine called dt_compatible, which when
set changes the /compatible contents to its value.
Signed-off-by: Alexander Graf <agraf@suse.de>
When generating serial port device tree nodes, we duplicate quite a bit
of code, because there are 2 of them in the mpc8544ds board we emulate.
Shove the generating code into a function, so we duplicate less code.
Signed-off-by: Alexander Graf <agraf@suse.de>
We want to be able to support >= 4GB of RAM. To do so, we need to be able
to tell the guest OS how much RAM it has.
However, that information today is capped to 32bit. So let's extend the
offset and size fields to 64bit, so we can fit in big addresses and even
one day - if we wish to do so - map devices above 32bit.
Signed-off-by: Alexander Graf <agraf@suse.de>
Every time we use an address constant, it needs to potentially fit into
a 64bit physical address space. So let's define things accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
Due to popular demand, let's clean up the soc node a bit and use
more recent dt notions.
Requested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Due to popular demand, we're updating the way we generate the MPIC
node and interrupt lines based on what the current state of art is.
Requested-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
We want to be able to override the automatically created device tree
by using the -dtb option. Implement this for the mpc8544ds machine.
Signed-off-by: Alexander Graf <agraf@suse.de>
We're passing the ram size as uint32_t, capping it to 32 bits atm.
Change to target_phys_addr_t (uint64_t) to make sure we have all
the bits.
Signed-off-by: Alexander Graf <agraf@suse.de>
We have a nice 64bit helper to ease the device tree generation and
make the code more readable when creating 64bit 2-cell parameters.
Use it when generating the device tree.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we are dynamically creating the dtb, it's really useful to
be able to dump the created blob for debugging.
This patch implements a -machine dumpdtb=<file> option for e500 that
dumps the dtb exactly in the form the guest would get it to disk. It
can then be analyzed by dtc to get information about the guest
configuration.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that all of the device tree bits are generated during runtime, we
can get rid of the device tree blob and instead start from scratch with
an empty device tree.
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we're moving all of the device tree generation from an external
pre-execution generated blob to runtime generation using libfdt, we absolutely
must have libfdt around.
This requirement was there before already, as the only way to not require libfdt
with e500 was to not use -kernel, which was the only way to boot the mpc8544ds
machine. This patch only manifests said requirement in the build system.
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds a qemu-specific hypervisor call to the pseries machine
which allows to do what amounts to memmove, memcpy and xor over
regions of physical memory such as the framebuffer.
This is the simplest way to get usable framebuffer speed from
SLOF since the framebuffer isn't mapped in the VRMA and so would
otherwise require an hcall per 8 bytes access.
The performance is still not great but usable, and can be improved
with a more complex implementation of the hcall itself if needed.
This also adds some documentation for the qemu-specific hypercalls
that we add to PAPR along with a new qemu,hypertas-functions property
that mirrors ibm,hypertas-functions and provides some discoverability
for the new calls.
Note: I chose note to advertise H_RTAS to the guest via that mechanism.
This is done on purpose, the guest uses the normal RTAS interfaces
provided by qemu (including SLOF) which internally calls H_RTAS.
We might in the future implement part (or even all) of RTAS inside the
guest like IBM's firmware does and replace H_RTAS with some finer grained
set of private hypercalls.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
We were incorrectly g_free'ing an object that isn't allocated
in one error path and failed to release it completely in another
This fixes qemu crashes with some cases of IO errors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The core tcg/kvm code for ppc64 now has at least the outline
capability to support pagesizes beyond the standard 4k and 16MB. The
CPUState is initialized with information advertising the available
pagesizes and their correct encodings, and under the right KVM setup
this will be populated with page sizes beyond the standard.
Obviously guests can't use the extra page sizes unless they know
they're present. For the pseries machine, at least, there is a
defined method for conveying exactly this information, the
"ibm-segment-page-sizes" property in the guest device tree.
This patch generates this property using the supported page size
information that's already in the CPUState.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
The initial TLB entry is supposed to help us run the guest -kernel payload.
This means the guest needs to be able to access its own memory, the initrd
memory and the device tree.
So far we only statically reserved a TLB entry from [0;256M[. This patch
fixes it to span from [0;dt_end[, allowing the guest payload to access
everything initially.
Reported-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Old size: 8 MB (traditional upstream qemu value).
New size: 16 MB (traditional qemu-kvm value).
Also adds compat properties so old machine types
keep the old default values.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>