The MPIC source irq handler suddenly became identical to the standard
OpenPIC source irq handler. Combine them into the same function.
Signed-off-by: Alexander Graf <agraf@suse.de>
The openpic code was still using the old mmio memory api. Convert it to
be a generic memory api user and clean up some code that becomes redundant
that way.
Signed-off-by: Alexander Graf <agraf@suse.de>
MPIC interrupt numbers in Linux (device tree) and in QEMU are different,
because QEMU takes the sparseness of the IRQ number space into account.
Remove that cleverness and instead assume a flat number space. This makes
the code easier to understand, because we are actually aligned with Linux
on the view of our worlds.
Signed-off-by: Alexander Graf <agraf@suse.de>
The openpic code had a few WIP bits left that nobody reanimated within
the last few years. Remove that code.
Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Hervé Poussineau <hpoussin@reactos.org>
The PAPR specification requires that every bus or device mediated by the
IOMMU have a unique Logical IO Bus Number (LIOBN). This patch adds a check
to enforce this, which will help catch errors in configuration earlier.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
PCI Root complex have TYPE-1 configuration header while PCI endpoint
have type-0 configuration header. The type-1 configuration header have
a BAR (BAR0). In Freescale PCI controller BAR0 is used for mapping pci
address space to CCSR address space. This can used for 2 purposes: 1)
for MSI interrupt generation 2) Allow CCSR registers access when configured
as PCI endpoint, which I am not sure is a use case with QEMU-KVM guest.
What I observed is that when guest read the size of BAR0 of host controller
configuration header (TYPE1 header) then it always reads it as 0. When
looking into the QEMU hw/ppce500_pci.c, I do not find the PCI controller
device registering BAR0. I do not find any other controller also doing so
may they do not use BAR0.
There are two issues when BAR0 is not there (which I can think of):
1) There should be BAR0 emulated for PCI Root complex (TYPE1 header) and
when reading the size of BAR0, it should give size as per real h/w.
2) Do we need this BAR0 inbound address translation?
When BAR0 is of non-zero size then it will be configured for PCI
address space to local address(CCSR) space translation on inbound access.
The primary use case is for MSI interrupt generation. The device is
configured with an address offsets in PCI address space, which will be
translated to MSI interrupt generation MPIC registers. Currently I do
not understand the MSI interrupt generation mechanism in QEMU and also
IIRC we do not use QEMU MSI interrupt mechanism on e500 guest machines.
But this BAR0 will be used when using MSI on e500.
I can see one more issue, There are ATMUs emulated in hw/ppce500_pci.c,
but i do not see these being used for address translation.
So far that works because pci address space and local address space are 1:1
mapped. BAR0 inbound translation + ATMU translation will complete the address
translation of inbound traffic.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: fix double variable assignment w/o read]
Signed-off-by: Alexander Graf <agraf@suse.de>
All devices are also placed under CCSR memory region.
The CCSR memory region is exported to pci device. The MSI interrupt
generation is the main reason to export the CCSR region to PCI device.
This put the requirement to move mpic under CCSR region, but logically
all devices should be under CCSR. So this patch places all emulated
devices under ccsr region.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Now that we have implemented PAPR compatible NVRAM interfaces in qemu, this
updates the SLOF firmware to actually initialize and use the NVRAM as a
PAPR guest firmware is expected to do.
This SLOF update also includes an ugly but useful workaround for a bug in
the SLES11 installer which caused it to fail under KVM.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The PAPR specification requires a certain amount of NVRAM, accessed via
RTAS, which we don't currently implement in qemu. This patch addresses
this deficiency, implementing the NVRAM as a VIO device, with some glue to
instantiate it automatically based on a machine option.
The machine option specifies a drive id, which is used to back the NVRAM,
making it persistent. If nothing is specified, the driver instead simply
allocates space for the NVRAM, which will not be persistent
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the XICS irq controller code has a per-irq state structure which
amongst other things includes whether the interrupt is level or message
triggered - this is configured by the platform code, and is not directly
visible to the guest. This leads to a slightly awkward construct at reset
time where we need to reset everything in the state structure _except_ the
lsi/msi flag, which needs to retain the information given at platform init
time.
More importantly this flag will make matching the qemu state to the KVM
state for the upcoming in-kernel XICS implementation more awkward. This
patch, therefore, removes this flag from the per-irq state structure,
instead adding a parallel array giving the lsi/msi configuration per irq.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds tracing / debugging calls to the XICS interrupt controller
implementation used on the pseries machine.
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Kernel-based RTAS calls will not have a qemu handler, but will
still be registered in qemu in order to be assigned a token
number and appear in the device-tree.
Let's test for the name being NULL rather than the handler
when deciding to skip an entry while building the device-tree
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
The kernel will soon be able to service some RTAS calls. However the
choice of tokens will still be up to userspace. To support this have
spapr_rtas_register() return the token that is allocated for an
RTAS call, that allows the calling code to tell the kernel what the
token value is.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the lowest "real" irq number for the XICS irq controller (as
opposed to numbers reserved for IPIs and other special purposes) is
hard coded as 16 in two places - in xics_system_init() and in spapr.c.
As well as being generally bad practice, we're going to need to change this
number soon to fit in with the in-kernel XICS implementation. This patch
adds a #define for this number to avoid future breakage.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently in the reset code for the XICS interrupt controller, we
initialize the pending_priority field to 0 (most favored, by XICS
convention). This is incorrect, since there is no pending interrupt, it
should be set to least favored - 0xff. At the moment our XICS
implementation doesn't get hurt by this edge case, but it does confuse the
upcoming kernel XICS implementation.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
* pmaydell/arm-devs.next:
hw/ds1338.c: Fix handling of DAY (wday) register.
hw/ds1338.c: Implement support for the control register.
hw/ds1338.c: Ensure state is properly initialized.
hw/ds1338.c: Fix handling of HOURS register.
hw/ds1338.c: Add definitions for various flags in the RTC registers.
hw/ds1338.c: Correct bug in conversion to BCD.
exynos4210/mct: Avoid infinite loop on non incremental timers
hw/arm_gic: fix target CPUs affected by set enable/pending ops
xilinx_zynq: Add one variable to avoid overwriting QSPI bus
hw/arm_gic_common: Correct GICC_PMR reset value for newer GICs
hw/arm_gic: Fix comparison with priority mask register
hw/arm_boot, exynos4210, highbank: Fix secondary boot GIC init
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
There's no reason for run_dependent_requests() to hold s->lock, and a
later patch will require that in fact the lock is not held.
Also, before this patch, run_dependent_requests() not only does what its
name suggests, but also removes the l2meta from the list of in-flight
requests. When changing this, it becomes an one-liner, so just inline it
completely.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is closer to where the dirty flag is really needed, and it avoids
having checks for special cases related to cluster allocation directly
in the writev loop.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Even for writes to already allocated clusters, an l2meta is allocated,
though it stays effectively unused. After this patch, only allocating
requests still have one. Each l2meta now describes an in-flight request
that writes to clusters that are not yet hooked up in the L2 table.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
There's no real reason to have an l2meta for normal requests that don't
allocate anything. Before we can get rid of it, we must return the host
cluster offset in a different way.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As soon as delayed COW is introduced, the l2meta struct is needed even
after completion of the request, so it can't live on the stack.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes it easier to address the areas for which a COW must be
performed. As a nice side effect, the COW code in
qcow2_alloc_cluster_link_l2 becomes really trivial.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The offset within the cluster is already present as n_start and this is
what the code uses. QCowL2Meta.offset is only needed at a cluster
granularity.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Per the datasheet, the DAY (wday) register is user defined. Implement this.
Signed-off-by: Antoine Mathys <barsamin@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Per the datasheet, the mapping between 12 and 24 hours modes is:
0 <-> 12 PM
1-12 <-> 1-12 AM
13-23 <-> 1-11 PM
Signed-off-by: Antoine Mathys <barsamin@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tray statuses should be also reseted. Some guests may lock the tray
and after reset before any kernel is loaded the tray should be unlocked.
Also if you reset the real computer the tray is closed. We should
do the same in qemu.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
To make it easier to move code around without breaking
build at intermedite steps, tweak makefiles
to look in pci/ and hw/ for include files, automatically.
This will be reverted at the end of the reorganization.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For tap, we currently assume the vnet header size is 10
(the default value) but that might not be the case
if tap is persistent and has been used by qemu previously.
To fix, set vnet header size correctly on open.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Cleanup the q35/ich9 license headers.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Isaku Yamahata <yamahata@valinux.co.jp>
madvise(DONTNEED) will throw away the contents of the whole page at the
given address, even if the given length is less than the page size. One
can argue about whether that's the correct behaviour, but that's what it's
done for a long time in Linux at least.
That means that the madvise() in ram_load(), on a setup where
TARGET_PAGE_SIZE is smaller than the host page size, can throw away data
in guest pages adjacent to the one it's actually processing right now,
leading to guest memory corruption on an incoming migration.
This patch therefore, disables the madvise() if the host page size is
larger than TARGET_PAGE_SIZE. This means we don't get the benefits of that
madvise() in this case, but a more complete fix is more difficult to
accomplish. This at least fixes the guest memory corruption.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The code for migrating (or savevm-ing) memory pages starts off by creating
a dirty bitmap and filling it with 1s. Except, actually, because bit
addresses are 0-based it fills every bit except bit 0 with 1s and puts an
extra 1 beyond the end of the bitmap, potentially corrupting unrelated
memory. Oops. This patch fixes it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
This adds some first tests for qcow2's dependency handling when two
parallel write requests access the same cluster.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We want to use these events to suspend requests for testing concurrent
AIO requests. Suspending requests while they are holding the CoMutex is
rather boring for this purpose.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This makes the blkdebug suspend/resume functionality available in
qemu-io. Use it like this:
$ ./qemu-io blkdebug::/tmp/test.qcow2
qemu-io> break write_aio req_a
qemu-io> aio_write 0 4k
qemu-io> blkdebug: Suspended request 'req_a'
qemu-io> resume req_a
blkdebug: Resuming request 'req_a'
qemu-io> wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec)
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This allows more systematic AIO testing. The patch adds three new
operations to blkdebug:
* Setting a "breakpoint" on a blkdebug event. The next request that
triggers this breakpoint is suspended and is tagged with a name.
The breakpoint is removed after a request has triggered it.
* A suspended request (identified by it's tag) can be resumed
* It's possible to check whether a suspended request with a given
tag exists. This can be used for waiting for an event.
Ideally, we would instead tag requests right when they are created and
set breakpoints for individual requests. However, at this point the
block layer doesn't allow this easily, and breakpoints that trigger for
any request already allow a lot of useful testing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The cleanup work to remove a rule depends on the type of the rule. It's
easy for the existing rules as there is no data that must be cleaned up
and is specific to a type yet, but the next patch will change this.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
As soon as new rules can be set during runtime, as introduced by the
next patch, blkdebug makes sense even without a config file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We will use qemu_opts_create_nofail function, it can make code
more readable.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
While id is NULL, qemu_opts_create can not fail, so ignore
errors is fine.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>