mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Samuel Mendoza-Jonas	01a579729b	spapr: Fix stale HTAB during live migration (KVM) If a guest reboots during a running migration, changes to the hash page table are not necessarily updated on the destination. Opening a new file descriptor to the HTAB forces the migration handler to resend the entire table. Signed-off-by: Samuel Mendoza-Jonas <sam.mj@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2015-01-07 16:16:26 +01:00
Greg Kurz	8c46f7ec85	spapr_pci: map the MSI window in each PHB On sPAPR, virtio devices are connected to the PCI bus and use MSI-X. Commit `cc943c36fa` has modified MSI-X so that writes are made using the bus master address space and follow the IOMMU path. Unfortunately, the IOMMU address space address space does not have an MSI window: the notification is silently dropped in unassigned_mem_write instead of reaching the guest... The most visible effect is that all virtio devices are non-functional on sPAPR since then. :( This patch does the following: 1) map the MSI window into the IOMMU address space for each PHB - since each PHB instantiates its own IOMMU address space, we can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW) - no real need to keep the MSI window setup in a separate function, the spapr_pci_msi_init() code moves to spapr_phb_realize(). 2) kill the global MSI window as it is not needed in the end Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-09-08 12:50:53 +02:00
Benjamin Herrenschmidt	b7d1f77ada	spapr: Locate RTAS and device-tree based on real RMA We currently calculate the final RTAS and FDT location based on the early estimate of the RMA size, cropped to 256M on KVM since we only know the real RMA size at reset time which happens much later in the boot process. This means the FDT and RTAS end up right below 256M while they could be much higher, using precious RMA space and limiting what the OS bootloader can put there which has proved to be a problem with some OSes (such as when using very large initrd's) Fortunately, we do the actual copy of the device-tree into guest memory much later, during reset, late enough to be able to do it using the final RMA value, we just need to move the calculation to the right place. However, RTAS is still loaded too early, so we change the code to load the tiny blob into qemu memory early on, and then copy it into guest memory at reset time. It's small enough that the memory usage doesn't matter. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [aik: fixed errors from checkpatch.pl, defined RTAS_MAX_ADDR] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: fix compilation on 32bit hosts] Signed-off-by: Alexander Graf <agraf@suse.de>	2014-09-08 12:50:48 +02:00
Alexander Graf	261265cc91	PPC: mac99: Move NVRAM to page boundary when necessary When running KVM we have to adhere to host page boundaries for memory slots. Unfortunately the NVRAM on mac99 is a 4k RAM hole inside of an MMIO flash area. So if our host is configured with 64k page size, we can't use the mac99 target with KVM. This is a real shame, as this limitation is not really an issue - we can easily map NVRAM somewhere else and at least Linux and Mac OS X use it at their new location. So in that emergency case when it's about failing to run at all and moving NVRAM to a place it shouldn't be at, choose the latter. This patch enables -M mac99 with KVM on 64k page size hosts. Signed-off-by: Alexander Graf <agraf@suse.de>	2014-09-08 12:50:47 +02:00
Nikunj A Dadhania	2e14072f9e	ppc: spapr-rtas - implement os-term rtas call PAPR compliant guest calls this in absence of kdump. This finally reaches the guest and can be handled according to the policies set by higher level tools(like taking dump) for further analysis by tools like crash. Linux kernel calls ibm,os-term when extended property of os-term is set. This makes sure that a return to the linux kernel is gauranteed. Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com> [agraf: reduce RTAS_TOKEN_MAX] Signed-off-by: Alexander Graf <agraf@suse.de>	2014-09-08 12:50:45 +02:00
Alexey Kardashevskiy	9a321e9234	spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Currently SPAPR PHB keeps track of all allocated MSI (here and below MSI stands for both MSI and MSIX) interrupt because XICS used to be unable to reuse interrupts. This is a problem for dynamic MSI reconfiguration which happens when guest reloads a driver or performs PCI hotplug. Another problem is that the existing implementation can enable MSI on 32 devices maximum (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that. This makes use of new XICS ability to reuse interrupts. This reorganizes MSI information storage in sPAPRPHBState. Instead of static array of 32 descriptors (one per a PCI function), this patch adds a GHashTable when @config_addr is a key and (first_irq, num) pair is a value. GHashTable can dynamically grow and shrink so the initial limit of 32 devices is gone. This changes migration stream as @msi_table was a static array while new @msi_devs is a dynamic hash table. This adds temporary array which is used for migration, it is populated in "spapr_pci"::pre_save() callback and expanded into the hash table in post_load() callback. Since the destination side does not know the number of MSI-enabled devices in advance and cannot pre-allocate the temporary array to receive migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro which allocates the array automatically. This resets the MSI configuration space when interrupts are released by the ibm,change-msi RTAS call. This fixed traces to be more informative. This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which was incorrect by accident. As the internal representation changed, thus bumps migration version number. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: drop g_malloc_n usage] Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:27 +02:00
Alexey Kardashevskiy	51bba713fe	xics: Implement xics_ics_free() This implements interrupt release function so IRQs can be returned back to the pool for reuse in cases such as PCI hot plug. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy	bee763dbfb	spapr: Move interrupt allocator to xics The current allocator returns IRQ numbers from a pool and does not support IRQs reuse in any form as it did not keep track of what it previously returned, it only keeps the last returned IRQ. Some use cases such as PCI hot(un)plug may require IRQ release and reallocation. This moves an allocator from SPAPR to XICS. This switches IRQ users to use new API. This uses LSI/MSI flags to know if interrupt is allocated. The interrupt release function will be posted as a separate patch. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:26 +02:00
Alexey Kardashevskiy	4af88944d0	xics: Add flags for interrupts The existing interrupt allocation scheme in SPAPR assumes that interrupts are allocated at the start time, continously and the config will not change. However, there are cases when this is not going to work such as: 1. migration - we will have to have an ability to choose interrupt numbers for devices in the command line and this will create gaps in interrupt space. 2. PCI hotplug - interrupts from unplugged device need to be returned back to interrupt pool, otherwise we will quickly run out of interrupts. This replaces a separate lslsi[] array with a byte in the ICSIRQState struct and defines "LSI" and "MSI" flags. Neither of these flags set signals that the descriptor is not allocated and not in use. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:26 +02:00
Sam bobroff	3b50d8974b	spapr: Add RTAS sysparm SPLPAR Characteristics Add support for the SPLPAR Characteristics parameter to the emulated RTAS call ibm,get-system-parameter. The support provides just enough information to allow "cat /proc/powerpc/lparcfg" to succeed without generating a kernel error message. Without this patch the above command will produce the following kernel message: arch/powerpc/platforms/pseries/lparcfg.c \ parse_system_parameter_string Error calling get-system-parameter \ (0xfffffffd) Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:26 +02:00
Sam bobroff	b907d7b0fd	spapr: Add RTAS sysparm UUID Add support for the UUID parameter to the emulated RTAS call ibm,get-system-parameter. Return the guest's UUID as the value for the RTAS UUID system parameter, or null (a zero length result) if it is not set. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:26 +02:00
Sam bobroff	3052d95190	spapr: Fix RTAS sysparm DIAGNOSTICS_RUN_MODE This allows the ibm,get-system-parameter RTAS call to succeed for the DIAGNOSTICS_RUN_MODE system parameter. The problem can be seen with "ppc64_cpu --run-mode" from the powerpc-utils package which fails before this patch with "Machine does not support diagnostic run mode". This is corrected by using the rtas_st_buffer() function to write to the buffer. The RTAS constants are also moved out into a header file, some new constants added and the surrounding code slightly simplified. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> [agraf: remove some commentary] Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:25 +02:00
Sam bobroff	ce3fa1eca2	spapr: Add rtas_st_buffer utility function Add a function to write lengh + data into a buffer as required for the emulation of the RTAS ibm,get-system-parameter call. If the destination is smaller than the source, the write is truncated and success is returned. This matches the behaviour of pHyp. This will be used in following patches. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:25 +02:00
Alexey Kardashevskiy	9bb62a0702	spapr_iommu: Make in-kernel TCE table optional POWER KVM supports an KVM_CAP_SPAPR_TCE capability which allows allocating TCE tables in the host kernel memory and handle H_PUT_TCE requests targeted to specific LIOBN (logical bus number) right in the host without switching to QEMU. At the moment this is used for emulated devices only and the handler only puts TCE to the table. If the in-kernel H_PUT_TCE handler finds a LIOBN and corresponding table, it will put a TCE to the table and complete hypercall execution. The user space will not be notified. Upcoming VFIO support is going to use the same sPAPRTCETable device class so KVM_CAP_SPAPR_TCE is going to be used as well. That means that TCE tables for VFIO are going to be allocated in the host as well. However VFIO operates with real IOMMU tables and simple copying of a TCE to the real hardware TCE table will not work as guest physical to host physical address translation is requited. So until the host kernel gets VFIO support for H_PUT_TCE, we better not to register VFIO's TCE in the host. This adds a place holder for KVM_CAP_SPAPR_TCE_VFIO capability. It is not in upstream yet and being discussed so now it is always false which means that in-kernel VFIO acceleration is not supported. This adds a bool @vfio_accel flag to the sPAPRTCETable device telling that sPAPRTCETable should not try allocating TCE table in the host kernel for VFIO. The flag is false now as at the moment there is no VFIO. This adds an vfio_accel parameter to spapr_tce_new_table(), the semantic is the same. Since there is only emulated PCI and VIO now, the flag is set to false. Upcoming VFIO support will set it to true. This is a preparation patch so no change in behaviour is expected Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:23 +02:00
Alexey Kardashevskiy	3a3b8502e6	spapr: Fix RTAS token numbers At the moment spapr_rtas_register() allocates a new token number for every new RTAS callback so numbers are not fixed and depend on the number of supported RTAS handlers and the exact order of spapr_rtas_register() calls. These tokens are copied into the device tree and remain the same during the guest lifetime. When we start another guest to receive a migration, it calls spapr_rtas_register() as well. If the number of RTAS handlers or their order is different in QEMU on source and destination sides, the "/rtas" node in the device tree will differ. Since migration overwrites the device tree (as it overwrites the entire RAM), the actual RTAS config on the destination side gets broken. This defines global contant values for every RTAS token which QEMU is using today. This changes spapr_rtas_register() to accept a token number instead of allocating one. This changes all users of spapr_rtas_register(). This changes XICS-KVM not to cache tokens registered with KVM as they constant now. This makes TOKEN_BASE global as RTAS_XXX use TOKEN_BASE as a base. TOKEN_MAX is moved and renamed too and its value is changed to the last token + 1. Boundary checks for token values are adjusted. This reserves token numbers for "os-term" handlers and PCI hotplug which we are working on. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-27 13:48:22 +02:00
Badari Pulavarty	9dbae97723	spapr_pci: Advertise MSI quota Hotplug of multiple disks fails due to MSI vector quota check. Number of MSI vectors default to 8 allowing only 4 devices. This happens on RHEL6.5 guest. RHEL7 and SLES11 guests fallback to INTX. One way to workaround the issue is to increase total MSIs, so that MSI quota check allows us to hotplug multiple disks. This sets the quota to the maximum number of interupts XICS has which is 1024 now (XICS_IRQS). This moves XICS_IRQS from spapr.c to xics.h for wider visibility. Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> [aik: put XICS_IRQS=1024 instead of 64i, fixed endianness and size] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:46 +02:00
Alexey Kardashevskiy	d5ac4f5433	spapr_hcall: Add address-translation-mode-on-interrupt resource in H_SET_MODE This adds handling of the RESOURCE_ADDR_TRANS_MODE resource from the H_SET_MODE, for POWER8 (PowerISA 2.07) only. This defines AIL flags for LPCR special register. This changes @excp_prefix according to the mode, takes effect in TCG. This turns support of a new capability PPC2_ISA207S flag for TCG. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Tom Musta <tommusta@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:45 +02:00
Alexey Kardashevskiy	1b8eceee28	spapr_iommu: Introduce bus_offset in sPAPRTCETable This adds @bus_offset into sPAPRTCETable to tell where TCE table starts from. It is set to 0 for emulated devices. Dynamic DMA windows will use other offset. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:39 +02:00
Alexey Kardashevskiy	650f33adbd	spapr_iommu: Introduce page_shift in sPAPRTCETable At the moment only 4K pages are supported by sPAPRTCETable. Since sPAPR spec allows other page sizes and we are going to implement them, we need page size to be configrable. This adds @page_shift into sPAPRTCETable and replaces SPAPR_TCE_PAGE_SHIFT with it where it is possible. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:39 +02:00
Alexey Kardashevskiy	523e7b8ab8	spapr_iommu: Get rid of window_size in sPAPRTCETable This removes window_size as it is basically a copy of nb_table shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are going to support windows as big as the entire RAM and this number will be bigger that 32 capacity, we will have to do something about @window_size anyway and removal seems to be the right way to go. This removes dma_window_start/dma_window_size from sPAPRPHBState as they are no longer used. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:39 +02:00
Alexander Graf	3e300fa6ad	macio ide: Do remainder access asynchronously The macio IDE controller has some pretty nasty magic in its implementation to allow for unaligned sector accesses. We used to handle these accesses synchronously inside the IO callback handler. However, the block infrastructure changed below our feet and now it's impossible to call a synchronous block read/write from the aio callback handler of a previous block access. Work around that limitation by making the unaligned handling bits also go through our asynchronous handler. This fixes booting Mac OS X for me. Reported-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:38 +02:00
Alexey Kardashevskiy	2a6593cb6a	spapr: Add ibm, client-architecture-support call The PAPR+ specification defines a ibm,client-architecture-support (CAS) RTAS call which purpose is to provide a negotiation mechanism for the guest and the hypervisor to work out the best compatibility parameters. During the negotiation process, the guest provides an array of various options and capabilities which it supports, the hypervisor adjusts the device tree and (optionally) reboots the guest. At the moment the Linux guest calls CAS method at early boot so SLOF gets called. SLOF allocates a memory buffer for the device tree changes and calls a custom KVMPPC_H_CAS hypercall. QEMU parses the options, composes a diff for the device tree, copies it to the buffer provided by SLOF and returns to SLOF. SLOF updates the device tree and returns control to the guest kernel. Only then the Linux guest parses the device tree so it is possible to avoid unnecessary reboot in most cases. The device tree diff is a header with an update format version (defined as 1 in this patch) followed by a device tree with the properties which require update. If QEMU detects that it has to reboot the guest, it silently does so as the guest expects reboot to happen because this is usual pHyp firmware behavior. This defines custom KVMPPC_H_CAS hypercall. The current SLOF already has support for it. This implements stub which returns very basic tree (root node, no properties) to the guest. As the return buffer does not contain any change, no change in behavior is expected. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:37 +02:00
Alexey Kardashevskiy	98a8b52442	spapr: Add support for time base offset migration This allows guests to have a different timebase origin from the host. This is needed for migration, where a guest can migrate from one host to another and the two hosts might have a different timebase origin. However, the timebase seen by the guest must not go backwards, and should go forwards only by a small amount corresponding to the time taken for the migration. This is only supported for recent POWER hardware which has the TBU40 (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not 970. This adds kvm_access_one_reg() to access a special register which is not in env->spr. This requires kvm_set_one_reg/kvm_get_one_reg patch. The feature must be present in the host kernel. This bumps vmstate_spapr::version_id and enables new vmstate_ppc_timebase only for it. Since the vmstate_spapr::minimum_version_id remains unchanged, migration from older QEMU is supported but without vmstate_ppc_timebase. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:35 +02:00
BALATON Zoltan	9d1c128341	mac99: Added FW_CFG_PPC_BUSFREQ to match CLOCKFREQ and TBFREQ already there While there, also moved the hard coded value for CLOCKFREQ to a #define. Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu> Signed-off-by: Alexander Graf <agraf@suse.de>	2014-06-16 13:24:28 +02:00
Alexander Graf	e81a982aa5	PPC: Clean up DECR implementation There are 3 different variants of the decrementor for BookE and BookS. The BookE variant sets TSR[DIS] to 1 when the DEC value becomes 1 or 0. TSR[DIS] is then the indicator whether the decrementor interrupt line is asserted or not. The old BookS variant treats DEC as an edge interrupt that gets triggered when the DEC value's top bit turns 1 from 0. The new BookS variant maintains the assertion bit inside DEC itself. Whenever the DEC value becomes negative (top bit set) the DEC interrupt line is asserted. So far we implemented mostly the old BookS variant. Let's do them all properly. This fixes booting pseries ppc64 guest images in TCG mode for me. Signed-off-by: Alexander Graf <agraf@suse.de>	2014-04-08 11:20:04 +02:00
Alexey Kardashevskiy	a46622fd07	spapr_hcall: Fix little-endian resource handling in H_SET_MODE This changes resource code definitions to ones used in the host kernel. This fixes H_SET_MODE_RESOURCE_LE (switch between big endian and little endian) to sync registers from KVM before changing LPCR value. This adds a set_spr() helper to update an SPR in a CPU's context to avoid possible races and makes use of it to change LPCR. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Andreas Färber <afaerber@suse.de>	2014-03-20 02:39:33 +01:00
Edgar E. Iglesias	ab1da85791	exec: Make stl_*_phys input an AddressSpace Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2014-02-11 22:57:18 +10:00
Edgar E. Iglesias	fdfba1a298	exec: Make ldl_*_phys input an AddressSpace Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>	2014-02-11 22:56:54 +10:00
Alexey Kardashevskiy	3ada6b1137	spapr-rtas: add ibm, (get\|set)-system-parameter This adds very basic handlers for ibm,get-system-parameter and ibm,set-system-parameter RTAS calls. The only parameter handled at the moment is "platform-processor-diagnostics-run-mode" which is always disabled and does not support changing. This is expected to make "ppc64_cpu --run-mode=1" happy. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [agraf: s/papameter/parameter/g] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-12-20 01:57:59 +01:00
Alexey Kardashevskiy	a64d325df1	spapr-rtas: replace return code constants with macros Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-12-20 01:57:59 +01:00
Stefan Weil	1246b259f8	misc: Replace 'struct QEMUTimer' by 'QEMUTimer' Most code already used QEMUTimer without the redundant 'struct' keyword. Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>	2013-12-02 21:03:39 +04:00
Andreas Färber	3bbf37f269	spapr: Use DeviceClass::fw_name for device tree CPU node Instead of relying on cpu_model, obtain the device tree node label per CPU. Use DeviceClass::fw_name as source. Whenever DeviceClass::fw_name is unknown, default to "PowerPC,UNKNOWN". As a consequence, spapr_fixup_cpu_dt() can operate on each CPU's fw_name, obsoleting sPAPREnvironment::cpu_model, and spapr_create_fdt_skel() can drop its cpu_model argument. Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com> Signed-off-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:48 +02:00
Benjamin Herrenschmidt	5d87e4b74a	xics: Implement H_XIRR_X This implements H_XIRR_X hypercall in addition to H_XIRR as it is mandatory for PAPR+ and there is no way for the guest to detect whether it is supported or not so just add it. As the Partition Adjunct Option is not supported at the moment, the CPPR parameter of the hypercall is ignored. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:47 +02:00
David Gibson	11ad93f681	xics-kvm: Support for in-kernel XICS interrupt controller Recent (host) kernels support emulating the PAPR defined "XICS" interrupt controller system within KVM. This patch allows qemu to initialize and configure the in-kernel XICS, and keep its state in sync with qemu's XICS state as necessary. This should give considerable performance improvements. e.g. on a simple IPI ping-pong test between hardware threads, using qemu XICS gives us around 5,000 irqs/second, whereas the in-kernel XICS gives us around 70,000 irqs/s on the same hardware configuration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy	5eb92ccc3f	xics: add cpu_setup callback This adds a cpu_setup callback to the XICS device class (as XICS-KVM will do it different), xics_cpu_setup() will call it if it is set. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy	5a3d7b23ba	xics: split to xics and xics-common The upcoming XICS-KVM support will use bits of emulated XICS code. So this introduces new level of hierarchy - "xics-common" class. Both emulated XICS and XICS-KVM will inherit from it and override class callbacks when required. The new "xics-common" class implements: 1. replaces static "nr_irqs" and "nr_servers" properties with the dynamic ones and adds callbacks to be executed when properties are set. 2. xics_cpu_setup() callback renamed to xics_common_cpu_setup() as it is a common part for both XICS'es 3. xics_reset() renamed to xics_common_reset() for the same reason. The emulated XICS changes: 1. the part of xics_realize() which creates ICPs is moved to the "nr_servers" property callback as realize() is too late to create/initialize devices and instance_init() is too early to create devices as the number of child devices comes via the "nr_servers" property. 2. added ics_initfn() which does a little part of what xics_realize() did. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:47 +02:00
Alexey Kardashevskiy	d1b5682d88	xics: add pre_save/post_load dispatchers The upcoming support of in-kernel XICS will redefine migration callbacks for both ICS and ICP so classes and callback pointers are added. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:46 +02:00
Alexey Kardashevskiy	4fe822e075	spapr-rtas: fix h_rtas parameters reading On the real hardware, RTAS is called in real mode and therefore top 4 bits of the address passed in the call are ignored. So does the patch. This converts h_rtas() to use existing rtas_ld() handlers. This fixed rtas_ld()/rtas_st() to ignore top 4 bits. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-10-25 23:25:46 +02:00
Anton Blanchard	42561bf2e4	pseries: Add H_SET_MODE hcall to change guest exception endianness H_SET_MODE is used for controlling various partition settings. One of these settings is the endianness a guest takes its exceptions in. Signed-off-by: Anton Blanchard <anton@samba.org> [agraf: fix whitespace] Signed-off-by: Alexander Graf <agraf@suse.de>	2013-09-02 10:06:42 +02:00
Alexey Kardashevskiy	f1c2dc7c86	spapr-pci: rework MSI/MSIX On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS hypercalls which return global IRQ numbers to a guest so it only operates with those and never touches MSIMessage. Therefore MSIMessage handling is completely hidden in QEMU. Previously every sPAPR PCI host bridge implemented its own MSI window to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci or vfio) and route them to the guest via qemu_pulse_irq(). MSIMessage used to be encoded as: .addr - address within the PHB MSI window; .data - the device index on PHB plus vector number. The MSI MR write function translated this MSIMessage to a global IRQ number and called qemu_pulse_irq(). However the total number of IRQs is not really big (at the moment it is 1024 IRQs starting from 4096) and even 16bit data field of MSIMessage seems to be enough to store an IRQ number there. This simplifies MSI handling in sPAPR PHB. Specifically, this does: 1. remove a MSI window from a PHB; 2. add a single memory region for all MSIs to sPAPREnvironment and spapr_pci_msi_init() to initialize it; 3. encode MSIMessage as: * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL; * .data as an IRQ number. 4. change IRQ allocator to align first IRQ number in a block for MSI. MSI uses lower bits to specify the vector number so the first IRQ has to be aligned. MSIX does not need any special allocator though. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alexander Graf <agraf@suse.de>	2013-09-02 10:06:42 +02:00
Anthony Liguori	c04d6cfa3f	xics: rename types to be sane and follow coding style Basically, in HW the layout of the interrupt network is: - One ICP per processor thread (the "presenter"). This contains the registers to fetch a pending interrupt (ack), EOI, and control the processor priority. - One ICS per logical source of interrupts (ie, one per PCI host bridge, and a few others here or there). This contains the per-interrupt source configuration (target processor(s), priority, mask) and the per-interrupt internal state. Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit oddball what pHyp does here, arguably there are two but we can ignore that distinction). There is no register level access. A pair of firmware (RTAS) calls is used to configure each virtual interrupt. So our model here is somewhat the same. We have one ICS in the emulated XICS which arguably is the emulated XICS, there's no point making it a separate "device", that would just be gross, and each VCPU has an associated ICP. Yet we call the "XICS" struct icp_state and then the ICPs 'struct icp_server_state'. It's particularly confusing when all of the functions have xics_prefixes yet take *icp arguments. Rename: struct icp_state -> XICSState struct icp_server_state -> ICPState struct ics_state -> ICSState struct ics_irq_state -> ICSIRQState Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-id: 1374175984-8930-12-git-send-email-aliguori@us.ibm.com [aik: added ics_resend() on post_load] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 10:37:09 -05:00
Alexey Kardashevskiy	e68cb8b4fa	pseries: savevm support with KVM At present, the savevm / migration support for the pseries machine will not work when KVM is enabled. That's because KVM manages the guest's hash page table in the host kernel, so qemu has no visibility of it. This patch fixes this by using new kernel interfaces to extract and reinsert the guest's hash table during the migration process. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Message-id: 1374175984-8930-11-git-send-email-aliguori@us.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 10:37:09 -05:00
David Gibson	4be21d561d	pseries: savevm support for pseries machine This adds the necessary pieces to implement savevm / migration for the pseries machine. The most complex part here is migrating the hash table - for the paravirtualized pseries machine the guest's hash page table is not stored within guest memory, but externally and the guest accesses it via hypercalls. This patch uses a hypervisor reserved bit of the HPTE as a dirty bit (tracking changes to the HPTE itself, not the page it references). This is used to implement a live migration style incremental save and restore of the hash table contents. Normally a hash table is 16MB but it can get bigger depending on how much RAM the guest has. Due to its nature, updates to it are random so the live migration style is used for it. In addition it adds VMStateDescription information to save and restore the (few) remaining pieces of state information needed by the pseries machine. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Message-id: 1374175984-8930-9-git-send-email-aliguori@us.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 10:37:08 -05:00
Anthony Liguori	a83000f5e3	spapr-tce: make sPAPRTCETable a proper device Model TCE tables as a device that's hooked up as a child object to the owner. Besides the code cleanup, we get a few nice benefits: 1) free actually works now (it was dead code before) 2) the TCE information is visible in the device tree 3) we can expose table information as properties such that if we change the window_size, we can use globals to keep migration working. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-id: 1374175984-8930-6-git-send-email-aliguori@us.ibm.com [dwg: pseries: savevm support for PAPR TCE tables] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [alexey: ppc kvm: fix to compile] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 10:37:08 -05:00
David Gibson	b368a7d864	pseries: savevm support for VIO devices This patch adds helpers to allow PAPR VIO devices to save state common to all VIO devices during savevm. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Message-id: 1374175984-8930-3-git-send-email-aliguori@us.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-07-29 10:37:08 -05:00
Alexander Graf	80fc95d8bd	PPC: dbdma: Support unaligned DMA access The DBDMA engine really just reads bytes from a producing device (IDE in our case) and shoves these bytes into memory. It doesn't care whether any alignment takes place or not. Our code today however assumes that block accesses always happen on sector (512 byte) boundaries. This is a fair assumption for most cases. However, Mac OS X really likes to do unaligned, incomplete accesses that it finishes with the next DMA request. So we need to read / write the unaligned bits independent of the actual asynchronous request, because that one can only handle 512-byte-aligned data. We also need to cache these unaligned sectors until the next DMA request, at which point the data might be successfully flushed from the pipe. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-07-11 18:51:25 +02:00
Alexander Graf	03ee3b1e58	PPC: dbdma: Move processing to io Soon we will introduce intermediate processing pauses which will allow the bottom half to restart a DMA request that couldn't be fulfilled yet. For that to work, move the processing variable into the io struct which is what DMA providers work with. While touching it, also change it into a bool Signed-off-by: Alexander Graf <agraf@suse.de>	2013-07-11 18:51:25 +02:00
Alexander Graf	d2f0ce2189	PPC: dbdma: Move static bh variable to device struct The DBDMA controller has a bottom half to asynchronously process DMA request queues. This bh was stored as a gross static variable. Move it into the device struct instead. While at it, move all users of it to the new generic kick function. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-07-11 18:51:24 +02:00
Alexander Graf	d1e562deb2	PPC: dbdma: Introduce kick function The DBDMA engine really is running all the time, waiting for input. However we don't want to waste cycles constantly polling. So introduce a kick function that data providers can call to notify the DBDMA controller of new input. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-07-11 18:51:24 +02:00
Alexander Graf	f2f963fd07	PPC: dbdma: Move defines into header file We usually keep struct and constant definitions in header files. Move them there to stay consistent and to make access to fields easier. Signed-off-by: Alexander Graf <agraf@suse.de>	2013-07-11 18:51:24 +02:00

1 2

63 Commits