The permission of TCE entry should exclude physical base address.
Otherwise, unmapping TCE entry can be interpreted to mapping TCE
entry wrongly for VFIO devices.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
0b183fc87 "memory: move mem_path handling to
memory_region_allocate_system_memory" disabled -mempath use for all
machines that do not use memory_region_allocate_system_memory() to
register RAM. Since SPAPR uses memory_region_init_ram(), the huge pages
support was disabled for it.
This replaces memory_region_init_ram()+vmstate_register_ram_global() with
memory_region_allocate_system_memory() to get huge pages back.
This changes RAM size from (ram_limit - rma_alloc_size) to ram_limit as
the previous patch moved RMA memory region allocation after RAM allocation
and therefore this change does not have immediate effect but simplifies
the code.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
PPC970 does not support VRMA (virtual RMA) so real memory required
for SLOF to execute must be allocated by the KVM_ALLOCATE_RMA ioctl.
Later this memory is used as a part of the guest RAM area.
The RMA allocating code also registers a memory region for this piece
of RAM.
We are going to simplify memory regions layout: RMA memory region
will be a subregion in the RAM memory region, both starting from zero.
This way we will not have to take care of start address alignment for
the piece of RAM next to the RMA.
This moves memory region business closer to the RAM memory region
creation/allocation code.
As this is a mechanical patch, no change in behaviour is expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: fix compilation on non-kvm systems]
Signed-off-by: Alexander Graf <agraf@suse.de>
Commit 0b183fc871:"memory: move mem_path handling to
memory_region_allocate_system_memory" split memory_region_init_ram and
memory_region_init_ram_from_file. Also it moved mem-path handling a step
up from memory_region_init_ram to memory_region_allocate_system_memory.
Therefore for any board that uses memory_region_init_ram directly,
-mem-path is not supported.
Fix this by replacing memory_region_init_ram with
memory_region_allocate_system_memory.
Signed-off-by: Shreyas B. Prabhu <shreyas@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
During KVMPPC_H_CAS processing, the cpu-version updated value is stored
without taking care of the current endianess. As a consequence, the guest
may not switch to the right CPU model, leading to unexpected results.
If needed, the value is now converted.
Fixes: 6d9412ea81 ("target-ppc: Implement "compat" CPU option")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
When the user specifies -nodefaults he can tell us that he doesn't want any
serial ports spawned by default. While we do honor that wish, we still create
device tree entries for those non-existent devices.
Make device tree generation depend on whether the device is actually available.
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.
This makes use of new XICS ability to reuse interrupts.
This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.
This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.
This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.
This fixed traces to be more informative.
This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: Alexander Graf <agraf@suse.de>
This removes @next_irq from sPAPREnvironment which was used in old
IRQ allocator as XICS is now responsible for IRQs and keeps track of
allocated IRQs.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.
This moves an allocator from SPAPR to XICS.
This switches IRQ users to use new API.
This uses LSI/MSI flags to know if interrupt is allocated.
The interrupt release function will be posted as a separate patch.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add support for the SPLPAR Characteristics parameter to the emulated
RTAS call ibm,get-system-parameter.
The support provides just enough information to allow "cat
/proc/powerpc/lparcfg" to succeed without generating a kernel error
message.
Without this patch the above command will produce the following kernel
message: arch/powerpc/platforms/pseries/lparcfg.c \
parse_system_parameter_string Error calling get-system-parameter \
(0xfffffffd)
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Add support for the UUID parameter to the emulated RTAS call
ibm,get-system-parameter.
Return the guest's UUID as the value for the RTAS UUID system
parameter, or null (a zero length result) if it is not set.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This allows the ibm,get-system-parameter RTAS call to succeed for the
DIAGNOSTICS_RUN_MODE system parameter.
The problem can be seen with "ppc64_cpu --run-mode" from the
powerpc-utils package which fails before this patch with "Machine does
not support diagnostic run mode".
This is corrected by using the rtas_st_buffer() function to write to
the buffer.
The RTAS constants are also moved out into a header file, some new
constants added and the surrounding code slightly simplified.
Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
[agraf: remove some commentary]
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds a v2.1 machine to support backward compatibility
for newer macines in the case if they ever be implemented.
This adds a "pseries-2.1" machine as a child of the "pseries"
machine and only changes visible machine name.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Every single sPAPR QOM object has small first "s".
Most (not all yet) QOM objects have "State" suffix.
This replaces SPAPRMachine with sPAPRMachineState to conform with QEMU
code style and removes redundant empty line.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Change the order of creating devices for New World Mac emulation so
that devices on the motherboard are added first and PCI cards (VGA and
NIC) come later. As a side effect, this also causes OpenBIOS to map
the motherboard devices into the MMIO space to the same addresses as
on real hardware and allow clients that hardcode these addresses (e.g.
MorphOS) to find and use them until OpenBIOS is tought to map devices
to specific addresses. (On real hardware the graphics and network
cards are really on separate buses but we don't model that yet.) This
brings the memory map closer to what is found on PowerMac3,1.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexander Graf <agraf@suse.de>
The patch adds a spapr-pci-vfio-host-bridge device type
which is a PCI Host Bridge with VFIO support. The new device
inherits from the spapr-pci-host-bridge device and adds an "iommu"
property which is an IOMMU id. This ID represents a minimal entity
for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
group is called a Partitionable Endpoint (PE).
Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
SPAPR allows multiple PHB for no extra cost, this does not seem to
be a problem. This limitation may change in the future though.
Example of use:
Configure and Add 3 functions of a multifunctional device to QEMU:
(the NEC PCI USB card is used as an example here):
-device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
-device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
-device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
-device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB
where:
* index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
offset);
* iommu=4 is an IOMMU id which can be found in sysfs:
[aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
[aik@vpl2 0004:00:00.0]$ ls -l iommu_group
lrwxrwxrwx 1 root root 0 Jun 5 12:49 iommu_group -> ../../../kernel/iommu_groups/4
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating
TCE tables in the host kernel memory and handle H_PUT_TCE requests
targeted to specific LIOBN (logical bus number) right in the host without
switching to QEMU. At the moment this is used for emulated devices only
and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE
handler finds a LIOBN and corresponding table, it will put a TCE to
the table and complete hypercall execution. The user space will not be
notified.
Upcoming VFIO support is going to use the same sPAPRTCETable device class
so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE
tables for VFIO are going to be allocated in the host as well.
However VFIO operates with real IOMMU tables and simple copying of
a TCE to the real hardware TCE table will not work as guest physical
to host physical address translation is requited.
So until the host kernel gets VFIO support for H_PUT_TCE, we better not
to register VFIO's TCE in the host.
This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not
in upstream yet and being discussed so now it is always false which means
that in-kernel VFIO acceleration is not supported.
This adds a bool @vfio_accel flag to the sPAPRTCETable device telling
that sPAPRTCETable should not try allocating TCE table in the host kernel
for VFIO. The flag is false now as at the moment there is no VFIO.
This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic
is the same. Since there is only emulated PCI and VIO now, the flag is set
to false. Upcoming VFIO support will set it to true.
This is a preparation patch so no change in behaviour is expected
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment spapr_rtas_register() allocates a new token number for every
new RTAS callback so numbers are not fixed and depend on the number of
supported RTAS handlers and the exact order of spapr_rtas_register() calls.
These tokens are copied into the device tree and remain the same during
the guest lifetime.
When we start another guest to receive a migration, it calls
spapr_rtas_register() as well. If the number of RTAS handlers or their
order is different in QEMU on source and destination sides, the "/rtas"
node in the device tree will differ. Since migration overwrites the device
tree (as it overwrites the entire RAM), the actual RTAS config on
the destination side gets broken.
This defines global contant values for every RTAS token which QEMU
is using today.
This changes spapr_rtas_register() to accept a token number instead of
allocating one. This changes all users of spapr_rtas_register().
This changes XICS-KVM not to cache tokens registered with KVM as they
constant now.
This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as
a base. TOKEN_MAX is moved and renamed too and its value is changed
to the last token + 1. Boundary checks for token values are adjusted.
This reserves token numbers for "os-term" handlers and PCI hotplug
which we are working on.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This is required to enable boot menu display during booting
Signed-off-by: Avik Sil <aviksil@linux.vnet.ibm.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch also eliminates build time warning caused by no caller
of monitor_qapi_event_throttle().
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Add the numa_info structure to contain the numa nodes memory,
VCPUs information and the future added numa nodes host memory
policies.
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
[Fix hw/ppc/spapr.c - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Hotplug of multiple disks fails due to MSI vector quota check.
Number of MSI vectors default to 8 allowing only 4 devices.
This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback
to INTX.
One way to workaround the issue is to increase total MSIs,
so that MSI quota check allows us to hotplug multiple disks.
This sets the quota to the maximum number of interupts XICS has
which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c
to xics.h for wider visibility.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
[aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The kvm-type machine option was left out when MachineState was
introduced, preventing the kvm-type option from being used. Add the
missing property to the sPAPR machine class, so it can be used.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from
the H_SET_MODE, for POWER8 (PowerISA 2.07) only.
This defines AIL flags for LPCR special register.
This changes @excp_prefix according to the mode, takes effect in TCG.
This turns support of a new capability PPC2_ISA207S flag for TCG.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This moves H_SET_MODE_RESOURCE_LE handler to a separate function
as there are other "resources" coming and this is going to become ugly.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Tom Musta <tommusta@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
PR KVM supports an ePAPR compliant hypercall interface in parallel to the
normal sPAPR one. Expose the ePAPR /hypervisor node and properties to the
guest so it can use it.
This enables magic page sharing on PR KVM with -M pseries.
However we had a few nasty bugs in the magic page implementation on vcpus
newer than 970 (p7, p8) that KVM now has workarounds for. It indicates that
it does have these workarounds through the PPC_FIXUP_HCALL capability.
To not expose broken guest kernels to issues on host kernels that don't
have the fixups in place, we don't expose working hypercall instructions
when the fixups are not available so that the guest can never active the
magic page.
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds @bus_offset into sPAPRTCETable to tell where TCE table starts
from. It is set to 0 for emulated devices. Dynamic DMA windows will use
other offset.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR
spec allows other page sizes and we are going to implement them, we need
page size to be configrable.
This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT
with it where it is possible.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.
This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
qdev_init_nofail() was replaced by object_property_set_bool("realized")
all over the QEMU so do we.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment sPAPRPHBState contains a @tcet pointer to the only
TCE table. However sPAPR spec allows having more than one DMA window.
Since the TCE object is already a child of SPAPR PHB object, there is
no need to keep an additional pointer to it in sPAPRPHBState so remove it.
This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
objects.
This changes the default DMA window properties calculation.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently the default DMA window is represented by a single MemoryRegion.
However there can be more than just one window so we need
a "root" memory region to be separated from the actual DMA window(s).
This introduces a "root" IOMMU memory region and adds a subregion for
the default DMA 32bit window. Following patches will add other
subregion(s).
This initializes a default DMA window subregion size to the guest RAM
size as this window can be switched into "bypass" mode which implements
direct DMA mapping.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The spapr-pci PHB initializes IOMMU for emulated devices only.
The upcoming VFIO support will do it different. However both emulated
and VFIO PHB types share most of the initialization code.
For the type specific things a new finish_realize() callback is
introduced.
This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
adds the callback pointer.
This implements finish_realize() for emulated devices.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: Fix compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently only single TCE entry per request is supported (H_PUT_TCE).
However PAPR+ specification allows multiple entry requests such as
H_PUT_TCE_INDIRECT and H_STUFF_TCE. Having less transitions to the host
kernel via ioctls, support of these calls can accelerate IOMMU operations.
This implements H_STUFF_TCE and H_PUT_TCE_INDIRECT.
This advertises "multi-tce" capability to the guest if the host kernel
supports it (KVM_CAP_SPAPR_MULTITCE) or guest is running in TCG mode.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment the "ibm,hypertas-functions" list is fixed. However some
calls should be listed there if they are supported by QEMU or the host
kernel.
This enables hyperrtas_prop to grow on stack by adding
a SPAPR_HYPERRTAS_ADD macro. "qemu,hypertas-functions" is converted as well.
The first user of this is going to be a "multi-tce" property.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
SPAPR IOMMU is a bus-less device and therefore its only ID in
migration stream is an instance id which is not reliable ID
as it depends on the command line parameters order. Since
libvirt may change the order, we need something better than that.
This removes VMSD descriptor from the class definitiion and
registers it with @liobn as an intance ID to let the destination
side find the right device to receive migration data.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Modern Linux kernels support last POWERPC CPUs so when a kernel boots,
in most cases it can find a matching cpu_spec in the kernel's cpu_specs
list. However if the kernel is quite old, it may be missing a definition
of the actual CPU. To provide an ability for old kernels to work on modern
hardware, a Processor Compatibility Mode has been introduced
by the PowerISA specification.
>From the hardware prospective, it is supported by the Processor
Compatibility Register (PCR) which is defined in PowerISA. The register
enables one of the compatibility modes (2.05/2.06/2.07).
Since PCR is a hypervisor privileged register and cannot be
directly accessed from the guest, the mode selection is done via
ibm,client-architecture-support (CAS) RTAS call using which the guest
specifies what "raw" and "architected" CPU versions it supports.
QEMU works out the best match, changes a "cpu-version" property of
every CPU and notifies the guest about the change by setting these
properties in the buffer passed as a response on a custom H_CAS hypercall.
This implements ibm,client-architecture-support parameters parsing
(now only for PVRs) and cooks the device tree diff with new values for
"cpu-version", "ibm,ppc-interrupt-server#s" and
"ibm,ppc-interrupt-server#s" properties.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This puts a limit to the number of threads per core based on the current
compatibility mode. Although PowerISA specs do not specify the maximum
threads per core number, the linux guest still expects that
PowerISA2.05-compatible CPU supports only 2 threads per core as this
is what POWER6 (2.05 compliant CPU) implements, the same is for
POWER7 (2.06, 4 threads) and POWER8 (2.07, 8 threads).
This calls spapr_fixup_cpu_smt_dt() with the maximum allowed number of
threads which affects ibm,ppc-interrupt-server#s and
ibm,ppc-interrupt-gserver#s properties.
The number of CPU nodesremains unchanged.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
In PPC code we usually use the "cs" name for a CPUState* variables
and "cpu" for PowerPCCPU. So let's change spapr_fixup_cpu_dt() to
use same rules as spapr_create_fdt_skel() does.
This adds missing nodes creation if they do not already exist in
the current device tree, this is going to be used from
the client-architecture-support handler.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The PAPR+ specification defines a ibm,client-architecture-support (CAS)
RTAS call which purpose is to provide a negotiation mechanism for
the guest and the hypervisor to work out the best compatibility parameters.
During the negotiation process, the guest provides an array of various
options and capabilities which it supports, the hypervisor adjusts
the device tree and (optionally) reboots the guest.
At the moment the Linux guest calls CAS method at early boot so SLOF
gets called. SLOF allocates a memory buffer for the device tree changes
and calls a custom KVMPPC_H_CAS hypercall. QEMU parses the options,
composes a diff for the device tree, copies it to the buffer provided
by SLOF and returns to SLOF. SLOF updates the device tree and returns
control to the guest kernel. Only then the Linux guest parses the device
tree so it is possible to avoid unnecessary reboot in most cases.
The device tree diff is a header with an update format version
(defined as 1 in this patch) followed by a device tree with the properties
which require update.
If QEMU detects that it has to reboot the guest, it silently does so
as the guest expects reboot to happen because this is usual pHyp firmware
behavior.
This defines custom KVMPPC_H_CAS hypercall. The current SLOF already
has support for it.
This implements stub which returns very basic tree (root node,
no properties) to the guest.
As the return buffer does not contain any change, no change in behavior is
expected.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds basic support for the "compat" CPU option. By specifying
the compat property, the user can manually switch guest CPU mode from
"raw" to "architected".
This defines feature disable bits which are not used yet as, for example,
PowerISA 2.07 says if 2.06 mode is selected, the TM bit does not matter -
transactional memory (TM) will be disabled because 2.06 does not define
it at all. The same is true for VSX and 2.05 mode. So just setting a mode
must be ok.
This does not change the existing behavior as the actual compatibility
mode support is coming in next patches.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: fix compilation on 32bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>
The upcoming support of the "ibm,client-architecture-support"
reconfiguration call will be able to change dynamically the number
of threads per core (SMT mode). From the device tree prospective
this does not change the number of CPU nodes (as it is one node per
a CPU core) but affects content and size of the ibm,ppc-interrupt-server#s
and ibm,ppc-interrupt-gserver#s properties.
This moves ibm,ppc-interrupt-server#s and ibm,ppc-interrupt-gserver#s
out of the device tree skeleton.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds a "ibm,chip-id" property for CPU nodes which should be the same
for all cores in the same CPU socket. The recent guest kernels use this
information to associate threads with sockets.
Refer to the kernel commit 256f2d4b463d3030ebc8d2b54f427543814a2bdc
for more details.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
This allows guests to have a different timebase origin from the host.
This is needed for migration, where a guest can migrate from one host
to another and the two hosts might have a different timebase origin.
However, the timebase seen by the guest must not go backwards, and
should go forwards only by a small amount corresponding to the time
taken for the migration.
This is only supported for recent POWER hardware which has the TBU40
(timebase upper 40 bits) register. That includes POWER6, 7, 8 but not
970.
This adds kvm_access_one_reg() to access a special register which is not
in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch.
The feature must be present in the host kernel.
This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase
only for it. Since the vmstate_spapr::minimum_version_id remains
unchanged, migration from older QEMU is supported but without
vmstate_ppc_timebase.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Almost all platforms QEMU emulates have some sort of firmware they can load
to expose a guest environment that closely resembles the way it would look
like on real hardware.
This patch introduces such a firmware on our e500 platforms. U-boot is the
default firmware for most of these systems and as such our preferred choice.
For backwards compatibility reasons (and speed and simplicity) we skip u-boot
when you use -kernel and don't pass in -bios. For all other combinations like
-kernel and -bios or no -kernel you get u-boot as firmware.
This allows you to modify the boot environment, execute a networked boot through
the e1000 emulation and execute u-boot payloads.
Signed-off-by: Alexander Graf <agraf@suse.de>
We want to move to a model where firmware loads our kernel. To achieve
this we need to be able to tell firmware where the kernel lies.
Let's copy the mechanism we already use for -M pseries and expose the
kernel load address and size through the device tree.
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch adds pci pin to irq_num routing callback.
This callback is called from pci_device_route_intx_to_irq to
find which pci device maps to which irq.
This fix is required for pci-device passthrough using vfio.
Also without this patch we gets below prints
"
PCI: Bug - unimplemented PCI INTx routing (e500-pcihost)
qemu-system-ppc64: PCI: Bug - unimplemented PCI INTx routing (e500-pcihost) "
and Legacy interrupt does not work with pci device passthrough.
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
[agraf: remove double semicolon]
Signed-off-by: Alexander Graf <agraf@suse.de>
- Use PCI_NUM_PINS rather than hardcoding
- use "pin" wherever possible
Signed-off-by: Bharat Bhushan <Bharat.Bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment XICS does not support interrupts reuse so sPAPR PHB
implements this. sPAPRPHBState holds array of 32 spapr_pci_msi to
describe PCI config address, first MSI and number of MSIs. Once
allocated for a device, QEMU tries reusing this config until the number
of MSIs changes.
Existing SPAPR guests call ibm,change-msi in a loop until the handler
returns the requested number of vectors.
Recently introduced check for the maximum number of MSI/MSIX vectors
supported by a device only works for a device which is new for PHB's
MSI cache. If it is already there, the check is not performed which
leads to new IRQ block allocation. This happens during PCI hotplug
even when the user hot plug the same device which he just hot unplugged.
This moves the check earlier.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
While there, also moved the hard coded value for CLOCKFREQ to a #define.
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: Alexander Graf <agraf@suse.de>
Current guest kernels try allocating as many vectors as the quota is.
For example, in the case of virtio-net (which has just 3 vectors)
the guest requests 4 vectors (that is the quota in the test) and
the existing ibm,change-msi handler returns 4. But before it returns,
it calls msix_set_message() in a loop and corrupts memory behind
the end of msix_table.
This limits the number of vectors returned by ibm,change-msi to
the maximum supported by the actual device.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: qemu-stable@nongnu.org
[agraf: squash in bugfix from aik]
Signed-off-by: Alexander Graf <agraf@suse.de>
In the past, IO space could not be mapped into the memory address space
so we introduced a workaround for that. Nowadays it does not look
necessary so we can remove the workaround and make sPAPR PCI
configuration simplier.
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
After previous Peter patch, they are redundant. This way we don't
assign them except when needed. Once there, there were lots of case
where the ".fields" indentation was wrong:
.fields = (VMStateField []) {
and
.fields = (VMStateField []) {
Change all the combinations to:
.fields = (VMStateField[]){
The biggest problem (appart from aesthetics) was that checkpatch complained
when we copy&pasted the code from one place to another.
Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Total removal of QEMUMachineInitArgs struct. QEMUMachineInitArgs's fields
are copied into MachineState. Removed duplicated fields from MachineState.
All the other changes are only mechanical refactoring, no semantic changes.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> (s390)
Reviewed-by: Michael S. Tsirkin <mst@redhat.com> (PC)
[AF: Renamed ms -> machine, use MACHINE_GET_CLASS()]
Signed-off-by: Andreas Färber <afaerber@suse.de>
This fixes warnings from the static code analysis (smatch).
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
PortioList is an abstraction used for construction of MemoryRegionPortioList
from MemoryRegionPortio. It can be used later to unmap created memory regions.
It also requires proper cleanup because some of the memory inside is allocated
dynamically.
By moving PortioList ot device state we make it possible to cleanup later and
avoid leaking memory.
This change spans several target platforms. The following testcases cover all
changed lines:
qemu-system-ppc -M prep
qemu-system-i386 -vga qxl
qemu-system-i386 -M isapc -soundhw adlib -device ib700,id=watchdog0,bus=isa.0
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
No need to go through qemu_machine field. Use
MachineClass fields directly.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
In order to eliminate the QEMUMachine indirection,
add its fields directly to MachineClass.
Do not yet remove qemu_machine field because it is
still in use by sPAPR.
Signed-off-by: Marcel Apfelbaum <marcel.a@redhat.com>
[AF: Copied fields for sPAPR, too]
Signed-off-by: Andreas Färber <afaerber@suse.de>
The spinning struct is in guest endianness, so we need to initialize
its variables in guest endianness too.
This fixes booting e500 guests with SMP on x86 for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
There are 3 different variants of the decrementor for BookE and BookS.
The BookE variant sets TSR[DIS] to 1 when the DEC value becomes 1 or 0. TSR[DIS]
is then the indicator whether the decrementor interrupt line is asserted or not.
The old BookS variant treats DEC as an edge interrupt that gets triggered when
the DEC value's top bit turns 1 from 0.
The new BookS variant maintains the assertion bit inside DEC itself. Whenever
the DEC value becomes negative (top bit set) the DEC interrupt line is asserted.
So far we implemented mostly the old BookS variant. Let's do them all properly.
This fixes booting pseries ppc64 guest images in TCG mode for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
We now reset SPRs to their reset values on CPU reset. So if we want
to have an SPR persistently changed, we need to change its default
reset value rather than the value itself manually.
Do this for SPR_BOOKE_PIR, fixing e500v2 SMP boot.
Reported-by: Frederic Konrad <fred.konrad@greensocs.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Tested-by: KONRAD Frederic <fred.konrad@greensocs.com>
Add U suffix to various places where we were doing "1 << 31",
which is undefined behaviour, and also to other constant
definitions in the same groups, for consistency.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
This extends the pseries machine type with the interface to fix firmware
pathnames for devices which have @bootindex property.
This fixes SCSI disks' device node names (which are wildcard nodes in
the device-tree), for spapr-vscsi, virtio-scsi and usb-storage.
This fixes PHB name from "pci" to "pci@XXXX" where XXXX is a BUID as
there is no bus on top of sPAPRPHBState where PHB firmware name could
be fixed using the BusClass::get_fw_dev_path() mechanism.
This stores the boot list in the /chosen/qemu,boot-list property of
the device tree. "\n" are replaced by spaces to support OF1275.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This changes VIO bridge fw name from spapr-vio-bridge to vdevice and
vscsi/veth node names from QEMU object names to VIO specific device tree
names.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This changes resource code definitions to ones used in the host kernel.
This fixes H_SET_MODE_RESOURCE_LE (switch between big endian and
little endian) to sync registers from KVM before changing LPCR value.
This adds a set_spr() helper to update an SPR in a CPU's context to avoid
possible races and makes use of it to change LPCR.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
We wanted to loop till index is 8. On 8 we return with H_PTEG_FULL. If we
are successful in loading hpte with any other index, we continue with that
index value.
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
PCI memory region is 0x3f000000 bytes starting at 0xc0000000.
However, keep compatibility with Open Hack'Ware expectations
by adding a hack for Open Hack'Ware display.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Remove now duplicated code from prep board.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Before spapr_vga_init will returned false if the vga is specified by
the command '-device VGA' because vga_interface_type was evaluated to
VGA_NONE. With the change in previous patch of this series,
spapr_vga_init should return true if it's told that the vga will be
initialized in flow of the generic devices initialization.
To keep '-nodefaults' have the semantics of bare minimum, it adds a
check of 'has_defaults' in usb_enabled() to avoid that a USB controller
is added by '-nodefautls, -device VGA' implicitly.
This patch also makes two cleanups:
1. skip initialization for VGA_NONE
2. remove the useless 'break'
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mark Wu <wudxw@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Raven datasheet explains where firmware lives in system memory, so do
it there instead of in board code. Other boards using the same PCI
host will not have to copy the firmware loading code.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
[AF: Drop BIOS size workaround in favor of replacing our firmware blob]
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Commits fdfba1a298,
ab1da85791,
f606604f1c and
2c17449b30 added usages of ENV_GET_CPU()
macro in target-specific code.
Use ppc_env_get_cpu() instead.
Cc: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This converts the old-style SysBusDevice::init() callback to a new-style
DeviceClass::realize() callback.
As a part of conversion, this replaces fprintf(stderr) with error_setg()
as realize() does not "return" any value, instead it puts the extended
error into **errp.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Previously libvirt required the first/default PCI bus to have name "pci".
Since QEMU can support multiple buses now, libvirt wants "pci.0" now.
This removes custom bus name and lets QEMU make up default names.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This makes use of @cpu_dt_id and related API in:
1. emulated XICS hypercall handlers as they receive fixed CPU indexes;
2. XICS-KVM to enable in-kernel XICS on right CPU;
3. device-tree renderer.
This removes @cpu_index fixup as @cpu_dt_id is used instead so QEMU monitor
can accept command-line CPU indexes again.
This changes kvm_arch_vcpu_id() to use ppc_get_vcpu_dt_id() as at the moment
KVM CPU id and device tree ID are calculated using the same algorithm.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
Normally CPUState::cpu_index is used to pick the right CPU for various
operations. However default consecutive numbering does not always work
for POWERPC.
These indexes are reflected in /proc/device-tree/cpus/PowerPC,POWER7@XX
and used to call KVM VCPU's ioctls. In order to achieve this,
kvmppc_fixup_cpu() was introduced. Roughly speaking, it multiplies
cpu_index by the number of threads per core.
This approach has disadvantages such as:
1. NUMA configuration stays broken after the fixup;
2. CPU-targeted commands from the QEMU Monitor do not work properly as
CPU indexes have been fixed and there is no clear way for the user to
know what the new CPU indexes are.
This introduces a @cpu_dt_id field in the CPUPPCState struct which
is initialized from @cpu_index by default and can be fixed later
to meet the device tree requirements.
This adds an API to handle @cpu_dt_id.
This removes kvmppc_fixup_cpu() as it is not more needed, @cpu_dt_id
is calculated in ppc_cpu_realize().
This will be used later in machine code.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Mike Day <ncmike@ncultra.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
This patch introduces the hypervisor call H_GET_TCE which is basically the
reverse of H_PUT_TCE, as defined in the Power Architecture Platform
Requirements (PAPR).
The hcall H_GET_TCE is required by the kdump kernel which is calling it to
retrieve the TCE set up by the panicing kernel.
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
For updating in kernel htab we need to provide both pte0 and pte1, hence update
the interface to take pte0 and pte1 together
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[ ldq_phys() API change, Greg Kurz <gkurz@linux.vnet.ibm.com> ]
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
With kvm enabled, we store the hash page table information in the hypervisor.
Use ioctl to read the htab contents. Without this we get the below error when
trying to read the guest address
(gdb) x/10 do_fork
0xc000000000098660 <do_fork>: Cannot access memory at address 0xc000000000098660
(gdb)
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[ fixes for 32 bit build (casts!), ldq_phys() API change,
Greg Kurz <gkurz@linux.vnet.ibm.com ]
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Correctly update the htab_mask using the return value of
KVM_PPC_ALLOCATE_HTAB ioctl. Also we don't update sdr1
on GET_SREGS for HV. We check for external htab and if
found true, we don't need to update sdr1
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[ fixed pte group offset computation in ppc_hash64_htab_lookup() that
caused TCG to fail, Greg Kurz <gkurz@linux.vnet.ibm.com> ]
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
We currently size the msi window trap page according to the host's page
size so that we poke a working hole into a memory slot in case we overlap.
However, this is only ever necessary with KVM active. Without KVM, we should
rather try to be host platform agnostic and use a constant size: 4k.
This fixes a build breakage on win32 hosts.
Signed-off-by: Alexander Graf <agraf@suse.de>
We will use this in later patches to make sure we use the right load
functions when copying hpte entries.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This makes use of new error codes which load_elf() can return and
prints more informative error message.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Currently everybody uses ELF kernel images with "-kernel" option on
pseries machine but QEMU still tries to boot from an image even it
fails to recognize it is ELF. This produces undefined behaviour if
the user tries a kernel image compiled for another architecture.
This removes support of raw kernel images.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
[agraf: fix up stray quotes and newlines in strings]
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent changes introduced cannot_instantiate_with_device_add_yet
and removed capability of adding yet another PCI host bridge via
command line for SPAPR platform (POWERPC64 server).
This brings the capability back and puts SPAPR PHB into "bridge"
category.
This is not much use for emulated PHB but it is absolutely required
for VFIO as we put an IOMMU group onto a separate PHB on SPAPR.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Targets like ppc64 support different types of KVM, one which use
hypervisor mode and the other which doesn't. Add a new machine
option kvm-type that helps in selecting the respective ones
We also add a new QEMUMachine callback get_vm_type that helps
in mapping the string representation of kvm type specified.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[agraf: spelling fixes, use error_report(), use qemumachine.h]
Signed-off-by: Alexander Graf <agraf@suse.de>
This is now obsolete - remove the header and all its inclusions.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Inline these usages. Converts these init to at least a semi-recent QOM
styling.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Inline these usages. Converts these init to at least a semi-recent QOM
styling.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Define macros for the interrupt and memory maps for the sake of self
documentation.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Replace them with uint8/32/64.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
While ISA address space in prep machine is currently the one returned
by get_system_io(), this depends of the implementation of i82378/raven
devices, and this may not be the case forever.
Use the right ISA address space when adding some more ports to it.
We can use whatever ISA device on the right ISA bus, as all ISA devices
on the same ISA bus share the same ISA address space.
Signed-off-by: Hervé Poussineau <hpoussin@reactos.org>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Many PCI host bridges consist of a sysbus device and a PCI device.
You need both for the thing to work. Arguably, these bridges should
be modelled as a single, composite devices instead of pairs of
seemingly independent devices you can only use together, but we're not
there, yet.
Since the sysbus part can't be instantiated with device_add, yet,
permitting it with the PCI part is useless. We shouldn't offer
useless options to the user, so let's set
cannot_instantiate_with_device_add_yet for them.
It's already set for Bonito, Grackle, i440FX and Raven. Document why.
Set it for the others: dec-21154, e500-host-bridge, gt64120_pci, mch,
pbm-pci, ppc4xx-host-bridge, sh_pci_host, u3-agp, uni-north-agp,
uni-north-internal-pci, uni-north-pci, and versatile_pci_host.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
device_add plugs devices into suitable bus. For "real" buses, that
actually connects the device. For sysbus, the connections need to be
made separately, and device_add can't do that. The device would be
left unconnected, and could not possibly work.
Quite a few, but not all sysbus devices already set
cannot_instantiate_with_device_add_yet in their class init function.
Set it in their abstract base's class init function
sysbus_device_class_init(), and remove the now redundant assignments
from device class init functions.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
In an ideal world, machines can be built by wiring devices together
with configuration, not code. Unfortunately, that's not the world we
live in right now. We still have quite a few devices that need to be
wired up by code. If you try to device_add such a device, it'll fail
in sometimes mysterious ways. If you're lucky, you get an
unmysterious immediate crash.
To protect users from such badness, DeviceClass member no_user used to
make device models unavailable with -device / device_add, but that
regressed in commit 18b6dad. The device model is still omitted from
help, but is available anyway.
Attempts to fix the regression have been rejected with the argument
that the purpose of no_user isn't clear, and it's prone to misuse.
This commit clarifies no_user's purpose. Anthony suggested to rename
it cannot_instantiate_with_device_add_yet_due_to_internal_bugs, which
I shorten somewhat to keep checkpatch happy. While there, make it
bool.
Every use of cannot_instantiate_with_device_add_yet gets a FIXME
comment asking for rationale. The next few commits will clean them
all up, either by providing a rationale, or by getting rid of the use.
With that done, the regression fix is hopefully acceptable.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.a@redhat.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
This makes sure that all NUMA memory blocks reside within RAM or
have zero length.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The SPAPR specification says that the RMA starts at the LPAR's logical
address 0 and is the first logical memory block reported in
the LPAR’s device tree.
So SLOF only maps the first block and that block needs to span
the full RMA.
This makes sure that the RMA area is where SLOF expects it.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
The qemu_devtree API is a wrapper around the fdt_ set of APIs.
Rename accordingly.
Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
[agraf: also convert hw/arm/virt.c]
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr-nvram's drive property is currently connected to a non-existent
"-machine nvram=<drivename>" option. Instead, tie it to -pflash like
other non-volatile RAM devices. This provides the following possibilities
for adding a backend for the sPAPR non-volatile RAM:
* -pflash filename
* -drive if=pflash,file=filename,format=raw,...
* -drive if=none,file=filename,format=raw,id=foo,... -global spapr-nvram.drive=foo
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
This adds very basic handlers for ibm,get-system-parameter and
ibm,set-system-parameter RTAS calls.
The only parameter handled at the moment is
"platform-processor-diagnostics-run-mode" which is always disabled and
does not support changing. This is expected to make
"ppc64_cpu --run-mode=1" happy.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: s/papameter/parameter/g]
Signed-off-by: Alexander Graf <agraf@suse.de>
It doesn't make sense for a region to be INT64_MAX in size:
memory core uses UINT64_MAX as a special value meaning
"all 64 bit" this is what was meant here.
While this should never affect the spapr system which at the moment always
has < 63 bit size, this makes us hit all kind of corner case bugs with
sub-pages, so users are probably better off if we just use UINT64_MAX
instead.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Alexander Graf <agraf@suse.de>
Most code already used QEMUTimer without the redundant 'struct' keyword.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
The default granularity for the FIT timer on 440 is on every 0x1000th
transition of TB from 0 to 1. Translated that means 48828 times a second.
Since interrupts are quite expensive for 440 and we don't really care
about the accuracy of the FIT to that significance, let's force FIT and
WDT to at best millisecond granularity.
This basically restores behavior as it was in QEMU 1.6, where timers
could only deal with millisecond granularities at all.
This patch greatly improves performance with the 440 target and restores
roughly the same performance level that QEMU 1.6 had for me.
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1385416015-22775-3-git-send-email-agraf@suse.de
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
Today we fire FIT and WDT timer events every time the respective bit
position in TB flips from 0 -> 1.
However, there is no need to do this if the end result would be that
we're changing a TSR bit that is set to 1 to 1 again. No guest visible
change would have occured.
So whenever we see that the TSR bit to our timer is already set, don't
even bother to update the timer that would potentially fire it off.
However, we do need to make sure that we update our timer that notifies
us of the TB flip when the respective TSR bit gets unset. In that case
we do care about the flip and need to notify the guest again. So add
a callback into our timer handlers when TSR bits get unset.
This improves performance for me when the guest is busy processing things.
Signed-off-by: Alexander Graf <agraf@suse.de>
Message-id: 1385416015-22775-2-git-send-email-agraf@suse.de
Signed-off-by: Anthony Liguori <aliguori@amazon.com>
rom_add_blob never fails, and neither does rom_add_blob_fixed,
so there's no need to return value from it.
In fact, rom_add_blob_fixed was erroneously returning -1 unconditionally
which made the only system that checked the return value -M bamboo fail
to start.
Drop the return value and drop checks from ppc440_bamboo to
fix this failure.
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
compatiblity -> compatibility
continously -> continuously
existance -> existence
usefull -> useful
shoudl -> should
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Instead of relying on cpu_model, obtain the device tree node label
per CPU. Use DeviceClass::fw_name as source.
Whenever DeviceClass::fw_name is unknown, default to "PowerPC,UNKNOWN".
As a consequence, spapr_fixup_cpu_dt() can operate on each CPU's fw_name,
obsoleting sPAPREnvironment::cpu_model, and spapr_create_fdt_skel() can
drop its cpu_model argument.
Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
This enables IRQFD for LSI (level triggered INTx interrupts) by adding
a spapr_route_intx_pin_to_irq() callback to the sPAPR PCI host bus. This
callback is called to know the global interrupt number to link resampling fd
with IRQFD's fd in KVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM. This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.
This should give considerable performance improvements. e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
The upcoming XICS-KVM support will use bits of emulated XICS code.
So this introduces new level of hierarchy - "xics-common" class. Both
emulated XICS and XICS-KVM will inherit from it and override class
callbacks when required.
The new "xics-common" class implements:
1. replaces static "nr_irqs" and "nr_servers" properties with
the dynamic ones and adds callbacks to be executed when properties
are set.
2. xics_cpu_setup() callback renamed to xics_common_cpu_setup() as
it is a common part for both XICS'es
3. xics_reset() renamed to xics_common_reset() for the same reason.
The emulated XICS changes:
1. the part of xics_realize() which creates ICPs is moved to
the "nr_servers" property callback as realize() is too late to
create/initialize devices and instance_init() is too early to create
devices as the number of child devices comes via the "nr_servers"
property.
2. added ics_initfn() which does a little part of what xics_realize() did.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>
This moves the xics_cpu_setup() call after kvmppc_set_papr()
in order to get VCPUs initialized as this is required by upcoming
XICS-KVM.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>
On the real hardware, RTAS is called in real mode and therefore
top 4 bits of the address passed in the call are ignored.
So does the patch.
This converts h_rtas() to use existing rtas_ld() handlers.
This fixed rtas_ld()/rtas_st() to ignore top 4 bits.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR+ says that no "ibm,purr" tells the guest that H_PURR is not
supported. However some guests still try calling H_PURR on POWER7 unless
the property is present and equal to 0. This adds the property for CPUs
supporting the PURR special register.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
At the moment the size of the buffer is set to 64K which is
enough for approximately 150 VCPUs which is not the limit.
This increases the buffer up to 256K which allows having
a tree for approximately 600 VCPUs which is way beyond the real
number we need.
As only the real size of the tree is copied to the guest, there
will be no impact on existing configurations.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
Try loading the kernel as little endian if it fails big endian.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
* Conversion of global CPU list to QTAILQ - preparing for CPU hot-unplug
* Document X86CPU magic numbers for CPUID cache info
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJSJgdaAAoJEPou0S0+fgE/WqAQAJ6pcTymZO86NLKwcY4dD5Dr
Es2aTs4XFs9V3+gpbH9vOA71n9HanFQp1s4ZUskQ2BVQU8cZeRUKlGhKJfqcEbPF
H5wkxskqgV2Sw8+XWjQk80J/X/W6k10Fit64CUpQqxzd3HwXXzT/QHXzM8t6p79i
KdEAsjaQYqR8/qa7+pd437lLcTiRb51FqB5u3ClbCbIKjnnjswr/ZypKr+CUc9WY
1AzP9UKg0qSxz1yCkgzYHt3eWjfuGhsqn8KXVQfc+37xFRZp0uYQYkCahhwrPRUO
jTg0eJKxoyH76t+2jIsnNHfd6r5zaTmVThGnun/SzJTGj8AFNrz81EfT1niJdp2/
6RdykpWdqqeA3usKoSzBgTEAXGL50tCL0xiREk7hPwflxJqjbjFuVuttkazEcHZf
Q2OS0tUFhYi3yUojms/YJYFUaNUhA033wJSjKGbFfSDdtJdjnxmB2r+LhsH4ByfS
4SPU5zr4up1Yr1dnmIlNUA5W/KMgZseT3shasLhFmODR7wGvrQ7DuEHRs87UQbbM
pedvN92VmWzByEvLNkICJGuaVer+mHznig9f1eOkxXlK4RdNBmAf5QYMU+oxbkUG
fwXu0w7/aUJKpcYl6aYUmkhgn9dB3Oe/WTVLkvfg54MUFKpo4b72AR01+fWT91XO
r8DQQYwP94htozAC6F9n
=/bSY
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'afaerber/tags/qom-cpu-for-anthony' into staging
QOM CPUState refactorings / X86CPU
* Conversion of global CPU list to QTAILQ - preparing for CPU hot-unplug
* Document X86CPU magic numbers for CPUID cache info
# gpg: Signature made Tue 03 Sep 2013 10:59:22 AM CDT using RSA key ID 3E7E013F
# gpg: Can't check signature: public key not found
# By Andreas Färber (3) and Eduardo Habkost (1)
# Via Andreas Färber
* afaerber/tags/qom-cpu-for-anthony:
target-i386: Use #defines instead of magic numbers for CPUID cache info
cpu: Replace qemu_for_each_cpu()
cpu: Use QTAILQ for CPU list
a15mpcore: Use qemu_get_cpu() for generic timers
This includes pc and pci cleanups and enhancements,
and a virtio bugfix for level interrupts.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSIveoAAoJECgfDbjSjVRp2C8IAL7DE0oM0jfEB5DAd8jlULHx
hA8RP21rFzyU8PwtHB+72+C1ImldBge4hvhI+qbsm6PoW3RCeV/lbESIRTiv8dCO
pGUOFmv8MfJAH+WWFsle5mRisoTksYQWWBMHCOqvmaY4JL9pBQOhCLHVhV1XfjtL
hO7uGrWmlijeILv5CxYyPMYuOEdVvRSZKzE+Fp2YKfNstiQrS5fJIlqmwCHrlneW
l2atnt2d9ZV1K8QYiGg4GRVbSAMJvA1wum+0F4gnXIz9yAeOt+Ht1s8cNKQDMouJ
r2OyVgPM9aS/XaO6ejct1Sjo7Vgh/Ublrpw3lFqV/qHix6rEHwy2I3JHFEJPjvk=
=SytJ
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'mst/tags/for_anthony' into staging
pc,pci,virtio fixes and cleanups
This includes pc and pci cleanups and enhancements,
and a virtio bugfix for level interrupts.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Sun 01 Sep 2013 03:15:36 AM CDT using RSA key ID D28D5469
# gpg: Can't check signature: public key not found
# By Michael S. Tsirkin (3) and others
# Via Michael S. Tsirkin
* mst/tags/for_anthony:
virtio_pci: fix level interrupts with irqfd
pc: reduce duplication, fix PIIX descriptions
hw: Clean up bogus default boot order
pci: add config space access traces
pc: fix regression for 64 bit PCI memory
pci: Introduce helper to retrieve a PCI device's DMA address space
Message-id: 1378023590-11109-1-git-send-email-mst@redhat.com
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
This converts old style fprintf to traces.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[agraf: change patch subject]
Signed-off-by: Alexander Graf <agraf@suse.de>
PAPR+ requires two RTAS calls to be supported by the hypervisor in
order to allow hotplugging VCPUs from the guest. The "start-cpu" RTAS
call was already there but "stop-self" was not.
This adds the "stop-self" RTAS call.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alexander Graf <agraf@suse.de>
H_SET_MODE is used for controlling various partition settings. One
of these settings is the endianness a guest takes its exceptions in.
Signed-off-by: Anton Blanchard <anton@samba.org>
[agraf: fix whitespace]
Signed-off-by: Alexander Graf <agraf@suse.de>
On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
hypercalls which return global IRQ numbers to a guest so it only
operates with those and never touches MSIMessage.
Therefore MSIMessage handling is completely hidden in QEMU.
Previously every sPAPR PCI host bridge implemented its own MSI window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and route them to the guest via qemu_pulse_irq().
MSIMessage used to be encoded as:
.addr - address within the PHB MSI window;
.data - the device index on PHB plus vector number.
The MSI MR write function translated this MSIMessage to a global IRQ
number and called qemu_pulse_irq().
However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store an IRQ number there.
This simplifies MSI handling in sPAPR PHB. Specifically, this does:
1. remove a MSI window from a PHB;
2. add a single memory region for all MSIs to sPAPREnvironment
and spapr_pci_msi_init() to initialize it;
3. encode MSIMessage as:
* .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
* .data as an IRQ number.
4. change IRQ allocator to align first IRQ number in a block for MSI.
MSI uses lower bits to specify the vector number so the first IRQ has to
be aligned. MSIX does not need any special allocator though.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
spapr-pci config space accessors use find_dev() to find a PCI device.
However find_dev() only searched on a primary bus and did not do
recursive search through secondary buses so config space access was not
possible for devices other that on a primary bus.
This fixed find_dev() by using the PCI API pci_find_device() function.
This effectively enabled pci bridges on spapr.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Alexander Graf <agraf@suse.de>