mirrors/qemu - qemu - SynapseOS git

Author	SHA1	Message	Date
Alexey Kardashevskiy	4dba872219	spapr/rtas: Reserve space for RTAS blob and log At the moment SLOF reserves space for RTAS and instantiates the RTAS blob which is 20 bytes binary blob calling an hypercall. The rest of the RTAS area is a log which SLOF has no idea about but QEMU does. This moves RTAS sizing to QEMU and this overrides the size from SLOF. The only remaining problem is that SLOF copies the number of bytes it reserved (2KB for now) so QEMU needs to reserve at least this much; SLOF will be fixed separately to check that rtas-size from QEMU is enough for those 20 bytes for the H_RTAS hcall. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200316011841.99970-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 15:08:50 +11:00
Alexey Kardashevskiy	395a20d3cc	ppc/spapr: Move GPRs setup to one place At the moment "pseries" starts in SLOF which only expects the FDT blob pointer in r3. As we are going to introduce a OpenFirmware support in QEMU, we will be booting OF clients directly and these expect a stack pointer in r1, Linux looks at r3/r4 for the initramdisk location (although vmlinux can find this from the device tree but zImage from distro kernels cannot). This extends spapr_cpu_set_entry_state() to take more registers. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200310050733.29805-2-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 15:08:50 +11:00
David Gibson	425f0b7adb	spapr: Clean up RMA size calculation Move the calculation of the Real Mode Area (RMA) size into a helper function. While we're there clean it up and correct it in a few ways: * Add comments making it clearer where the various constraints come from * Remove a pointless check that the RMA fits within Node 0 (we've just clamped it so that it does) Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 15:08:47 +11:00
David Gibson	1052ab67f4	spapr: Don't clamp RMA to 16GiB on new machine types In spapr_machine_init() we clamp the size of the RMA to 16GiB and the comment saying why doesn't make a whole lot of sense. In fact, this was done because the real mode handling code elsewhere limited the RMA in TCG mode to the maximum value configurable in LPCR[RMLS], 16GiB. But, * Actually LPCR[RMLS] has been able to encode a 256GiB size for a very long time, we just didn't implement it properly in the softmmu * LPCR[RMLS] shouldn't really be relevant anyway, it only was because we used to abuse the RMOR based translation mode in order to handle the fact that we're not modelling the hypervisor parts of the cpu We've now removed those limitations in the modelling so the 16GiB clamp no longer serves a function. However, we can't just remove the limit universally: that would break migration to earlier qemu versions, where the 16GiB RMLS limit still applies, no matter how bad the reasons for it are. So, we replace the 16GiB clamp, with a clamp to a limit defined in the machine type class. We set it to 16 GiB for machine types 4.2 and earlier, but set it to 0 meaning unlimited for the new 5.0 machine type. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
David Gibson	8897ea5a9f	spapr: Don't attempt to clamp RMA to VRMA constraint The Real Mode Area (RMA) is the part of memory which a guest can access when in real (MMU off) mode. Of course, for a guest under KVM, the MMU isn't really turned off, it's just in a special translation mode - Virtual Real Mode Area (VRMA) - which looks like real mode in guest mode. The mechanics of how this works when using the hash MMU (HPT) put a constraint on the size of the RMA, which depends on the size of the HPT. So, the latter part of spapr_setup_hpt_and_vrma() clamps the RMA we advertise to the guest based on this VRMA limit. There are several things wrong with this: 1) spapr_setup_hpt_and_vrma() doesn't actually clamp, it takes the minimum of Node 0 memory size and the VRMA limit. That will often work the same as clamping, but there can be other constraints on RMA size which supersede Node 0 memory size. We have real bugs caused by this (currently worked around in the guest kernel) 2) Some callers of spapr_setup_hpt_and_vrma() are in a situation where we're past the point that we can actually advertise an RMA limit to the guest 3) But most fundamentally, the VRMA limit depends on host configuration (page size) which shouldn't be visible to the guest, but this partially exposes it. This can cause problems with migration in certain edge cases, although we will mostly get away with it. In practice, this clamping is almost never applied anyway. With 64kiB pages and the normal rules for sizing of the HPT, the theoretical VRMA limit will be 4x(guest memory size) and so never hit. It will hit with 4kiB pages, where it will be (guest memory size)/4. However all mainstream distro kernels for POWER have used a 64kiB page size for at least 10 years. So, simply replace this logic with a check that the RMA we've calculated based only on guest visible configuration will fit within the host implied VRMA limit. This can break if running HPT guests on a host kernel with 4kiB page size. As noted that's very rare. There also exist several possible workarounds: * Change the host kernel to use 64kiB pages * Use radix MMU (RPT) guests instead of HPT * Use 64kiB hugepages on the host to back guest memory * Increase the guest memory size so that the RMA hits one of the fixed limits before the RMA limit. This is relatively easy on POWER8 which has a 16GiB limit, harder on POWER9 which has a 1TiB limit. * Use a guest NUMA configuration which artificially constrains the RMA within the VRMA limit (the RMA must always fit within Node 0). Previously, on KVM, we also temporarily reduced the rma_size to 256M so that the we'd load the kernel and initrd safely, regardless of the VRMA limit. This was a) confusing, b) could significantly limit the size of images we could load and c) introduced a behavioural difference between KVM and TCG. So we remove that as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 09:41:15 +11:00
David Gibson	6a84737c80	spapr,ppc: Simplify signature of kvmppc_rma_size() This function calculates the maximum size of the RMA as implied by the host's page size of structure of the VRMA (there are a number of other constraints on the RMA size which will supersede this one in many circumstances). The current interface takes the current RMA size estimate, and clamps it to the VRMA derived size. The only current caller passes in an arguably wrong value (it will match the current RMA estimate in some but not all cases). We want to fix that, but for now just keep concerns separated by having the KVM helper function just return the VRMA derived limit, and let the caller combine it with other constraints. We call the new function kvmppc_vrma_limit() to more clearly indicate its limited responsibility. The helper should only ever be called in the KVM enabled case, so replace its !CONFIG_KVM stub with an assert() rather than a dummy value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
David Gibson	9943266ec3	spapr: Don't use weird units for MIN_RMA_SLOF MIN_RMA_SLOF records the minimum about of RMA that the SLOF firmware requires. It lets us give a meaningful error if the RMA ends up too small, rather than just letting SLOF crash. It's currently stored as a number of megabytes, which is strange for global constants. Move that megabyte scaling into the definition of the constant like most other things use. Change from M to MiB in the associated message while we're at it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-03-17 09:41:15 +11:00
David Gibson	e8b1144e73	spapr, ppc: Remove VPM0/RMLS hacks for POWER9 For the "pseries" machine, we use "virtual hypervisor" mode where we only model the CPU in non-hypervisor privileged mode. This means that we need guest physical addresses within the modelled cpu to be treated as absolute physical addresses. We used to do that by clearing LPCR[VPM0] and setting LPCR[RMLS] to a high limit so that the old offset based translation for guest mode applied, which does what we need. However, POWER9 has removed support for that translation mode, which meant we had some ugly hacks to keep it working. We now explicitly handle this sort of translation for virtual hypervisor mode, so the hacks aren't necessary. We don't need to set VPM0 and RMLS from the machine type code - they're now ignored in vhyp mode. On the cpu side we don't need to allow LPCR[RMLS] to be set on POWER9 in vhyp mode - that was only there to allow the hack on the machine side. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2020-03-17 09:41:15 +11:00
Philippe Mathieu-Daudé	f42274cff3	hw/ppc/pnv: Fix typo in comment Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200228123303.14540-1-philmd@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 09:41:14 +11:00
Shivaprasad G Bhat	af7084e72b	spapr: Fix Coverity warning while validating nvdimm options Fixes Coverity issue, CID 1419883: Error handling issues (CHECKED_RETURN) Calling "qemu_uuid_parse" without checking return value nvdimm_set_uuid() already verifies if the user provided uuid is valid or not. So, need to check for the validity during pre-plug validation again. As this a false positive in this case, assert if not valid to be safe. Also, error_abort if QOM accessor encounters error while fetching the uuid property. Reported-by: Coverity (CID 1419883) Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <158281096564.89540.4507375445765515529.stgit@lep8c.aus.stglabs.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 09:41:14 +11:00
Greg Kurz	ad334d89a6	spapr: Handle pending hot plug/unplug requests at CAS If a hot plug or unplug request is pending at CAS, we currently trigger a CAS reboot, which severely increases the guest boot time. This is because SLOF doesn't handle hot plug events and we had no way to fix the FDT that gets presented to the guest. We can do better thanks to recent changes in QEMU and SLOF: - we now return a full FDT to SLOF during CAS - SLOF was fixed to correctly detect any device that was either added or removed since boot time and to update its internal DT accordingly. The right solution is to process all pending hot plug/unplug requests during CAS: convert hot plugged devices to cold plugged devices and remove the hot unplugged ones, which is exactly what spapr_drc_reset() does. Also clear all hot plug events that are currently queued since they're no longer relevant. Note that SLOF cannot currently populate hot plugged PCI bridges or PHBs at CAS. Until this limitation is lifted, SLOF will reset the machine when this scenario occurs : this will allow the FDT to be fully processed when SLOF is started again (ie. the same effect as the CAS reboot that would occur anyway without this patch). Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158257222352.4102917.8984214333937947307.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-03-17 09:41:14 +11:00
Felipe Franciosi	64a7b8de42	qom/object: Use common get/set uint helpers Several objects implemented their own uint property getters and setters, despite them being straightforward (without any checks/validations on the values themselves) and identical across objects. This makes use of an enhanced API for object_property_add_uintXX_ptr() which offers default setters. Some of these setters used to update the value even if the type visit failed (eg. because the value being set overflowed over the given type). The new setter introduces a check for these errors, not updating the value if an error occurred. The error is propagated. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:24 +01:00
Felipe Franciosi	836e1b3813	qom/object: enable setter for uint types Traditionally, the uint-specific property helpers only offer getters. When adding object (or class) uint types, one must therefore use the generic property helper if a setter is needed (and probably duplicate some code writing their own getters/setters). This enhances the uint-specific property helper APIs by adding a bitwise-or'd 'flags' field and modifying all clients of that API to set this paramater to OBJ_PROP_FLAG_READ. This maintains the current behaviour whilst allowing others to also set OBJ_PROP_FLAG_WRITE (or use the more convenient OBJ_PROP_FLAG_READWRITE) in the future (which will automatically install a setter). Other flags may be added later. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 23:02:23 +01:00
Philippe Mathieu-Daudé	ea0ac7f6f8	hw: Make MachineClass::is_default a boolean type There's no good reason for it to be type int, change it to bool. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200207161948.15972-3-philmd@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2020-02-28 14:57:19 -05:00
Paolo Bonzini	9e264985ff	Merge branch 'exec_rw_const_v4' of https://github.com/philmd/qemu into HEAD	2020-02-25 13:41:48 +01:00
Paolo Bonzini	ca6155c0f2	Merge tag 'patchew/20200219160953.13771-1-imammedo@redhat.com' of https://github.com/patchew-project/qemu into HEAD This series removes ad hoc RAM allocation API (memory_region_allocate_system_memory) and consolidates it around hostmem backend. It allows to * resolve conflicts between global -mem-prealloc and hostmem's "policy" option, fixing premature allocation before binding policy is applied * simplify complicated memory allocation routines which had to deal with 2 ways to allocate RAM. * reuse hostmem backends of a choice for main RAM without adding extra CLI options to duplicate hostmem features. A recent case was -mem-shared, to enable vhost-user on targets that don't support hostmem backends [1] (ex: s390) * move RAM allocation from individual boards into generic machine code and provide them with prepared MemoryRegion. * clean up deprecated NUMA features which were tied to the old API (see patches) - "numa: remove deprecated -mem-path fallback to anonymous RAM" - (POSTPONED, waiting on libvirt side) "forbid '-numa node,mem' for 5.0 and newer machine types" - (POSTPONED) "numa: remove deprecated implicit RAM distribution between nodes" Introduce a new machine.memory-backend property and wrapper code that aliases global -mem-path and -mem-alloc into automatically created hostmem backend properties (provided memory-backend was not set explicitly given by user). A bulk of trivial patches then follow to incrementally convert individual boards to using machine.memory-backend provided MemoryRegion. Board conversion typically involves: * providing MachineClass::default_ram_size and MachineClass::default_ram_id so generic code could create default backend if user didn't explicitly provide memory-backend or -m options * dropping memory_region_allocate_system_memory() call * using convenience MachineState::ram MemoryRegion, which points to MemoryRegion allocated by ram-memdev On top of that for some boards: * missing ram_size checks are added (typically it were boards with fixed ram size) * ram_size fixups are replaced by checks and hard errors, forcing user to provide correct "-m" values instead of ignoring it and continuing running. After all boards are converted, the old API is removed and memory allocation routines are cleaned up.	2020-02-25 09:19:00 +01:00
Chen Qun	438bafcac5	hw/ppc/virtex_ml507:fix leak of fdevice tree blob The device tree blob returned by load_device_tree is malloced. We should free it after cpu_physical_memory_write(). Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20200218091154.21696-3-kuhn.chenqun@huawei.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Greg Kurz	ab8584349c	spapr: Fix handling of unplugged devices during CAS and migration We already detect if a device is being hot plugged before CAS to trigger a CAS reboot and during migration to migrate the state of the associated DRC. But hot unplugging a device is also an asynchronous operation that requires the guest to take action. This means that if the guest is migrated after the hot unplug event was sent but before it could release the device with RTAS, the destination QEMU doesn't know about the pending unplug operation and doesn't actually remove the device when the guest finally releases it. Similarly, if the unplug request is fired before CAS, the guest isn't notified of the change, just like with hotplug. It ends up booting with the device still present in the DT and configures it, just like it was never removed. Even weirder, since the event is still queued, it will be eventually processed when some other unrelated event is posted to the guest. Enhance spapr_drc_transient() to also return true if an unplug request is pending. This fixes the issue at CAS with a CAS reboot request and causes the DRC state to be migrated. Some extra care is still needed to inform the destination that an unplug request is pending : migrate the unplug_requested field of the DRC in an optional subsection. This might break backwards migration, but this is still better than ending with an inconsistent guest. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248798.3465937.1108351365840514270.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Greg Kurz	4b63db1289	spapr: Don't use spapr_drc_needed() in CAS code We currently don't support hotplug of devices between boot and CAS. If this happens a CAS reboot is triggered. We detect this during CAS using the spapr_drc_needed() function which is essentially a VMStateDescription .needed callback. Even if the condition for CAS reboot happens to be the same as for DRC migration, it looks wrong to piggyback a migration helper for this. Introduce a helper with slightly more explicit name and use it in both CAS and DRC migration code. Since a subsequent patch will enhance this helper to cover the case of hot unplug, let's go for spapr_drc_transient(). While here convert spapr_hotplugged_dev_before_cas() to the "transient" wording as well. This doesn't change any behaviour. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <158169248180.3465937.9531405453362718771.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Pan Nengyuan	b2fb7a4368	ppc: free 'fdt' after reset the machine 'fdt' forgot to clean both e500 and pnv when we call 'system_reset' on ppc, this patch fix it. The leak stacks are as follow: Direct leak of 4194304 byte(s) in 4 object(s) allocated from: #0 0x7fafe37dd970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7fafe2e3149d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x561876f7f80d in create_device_tree /mnt/sdb/qemu-new/qemu/device_tree.c:40 #3 0x561876b7ac29 in ppce500_load_device_tree /mnt/sdb/qemu-new/qemu/hw/ppc/e500.c:364 #4 0x561876b7f437 in ppce500_reset_device_tree /mnt/sdb/qemu-new/qemu/hw/ppc/e500.c:617 #5 0x56187718b1ae in qemu_devices_reset /mnt/sdb/qemu-new/qemu/hw/core/reset.c:69 #6 0x561876f6938d in qemu_system_reset /mnt/sdb/qemu-new/qemu/vl.c:1412 #7 0x561876f6a25b in main_loop_should_exit /mnt/sdb/qemu-new/qemu/vl.c:1645 #8 0x561876f6a398 in main_loop /mnt/sdb/qemu-new/qemu/vl.c:1679 #9 0x561876f7da8e in main /mnt/sdb/qemu-new/qemu/vl.c:4438 #10 0x7fafde16b812 in __libc_start_main ../csu/libc-start.c:308 #11 0x5618765c055d in _start (/mnt/sdb/qemu-new/qemu/build/ppc64-softmmu/qemu-system-ppc64+0x2b1555d) Direct leak of 1048576 byte(s) in 1 object(s) allocated from: #0 0x7fc0a6f1b970 in __interceptor_calloc (/lib64/libasan.so.5+0xef970) #1 0x7fc0a656f49d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5249d) #2 0x55eb05acd2ca in pnv_dt_create /mnt/sdb/qemu-new/qemu/hw/ppc/pnv.c:507 #3 0x55eb05ace5bf in pnv_reset /mnt/sdb/qemu-new/qemu/hw/ppc/pnv.c:578 #4 0x55eb05f2f395 in qemu_system_reset /mnt/sdb/qemu-new/qemu/vl.c:1410 #5 0x55eb05f43850 in main /mnt/sdb/qemu-new/qemu/vl.c:4403 #6 0x7fc0a18a9812 in __libc_start_main ../csu/libc-start.c:308 #7 0x55eb0558655d in _start (/mnt/sdb/qemu-new/qemu/build/ppc64-softmmu/qemu-system-ppc64+0x2b1555d) Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Message-Id: <20200214033206.4395-1-pannengyuan@huawei.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Alexey Kardashevskiy	87262806cb	spapr: Allow changing offset for -kernel image This allows moving the kernel in the guest memory. The option is useful for step debugging (as Linux is linked at 0x0); it also allows loading grub which is normally linked to run at 0x20000. This uses the existing kernel address by default. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200203032943.121178-6-aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Shivaprasad G Bhat	b5fca656f7	spapr: Add Hcalls to support PAPR NVDIMM device This patch implements few of the necessary hcalls for the nvdimm support. PAPR semantics is such that each NVDIMM device is comprising of multiple SCM(Storage Class Memory) blocks. The guest requests the hypervisor to bind each of the SCM blocks of the NVDIMM device using hcalls. There can be SCM block unbind requests in case of driver errors or unplug(not supported now) use cases. The NVDIMM label read/writes are done through hcalls. Since each virtual NVDIMM device is divided into multiple SCM blocks, the bind, unbind, and queries using hcalls on those blocks can come independently. This doesn't fit well into the qemu device semantics, where the map/unmap are done at the (whole)device/object level granularity. The patch doesnt actually bind/unbind on hcalls but let it happen at the device_add/del phase itself instead. The guest kernel makes bind/unbind requests for the virtual NVDIMM device at the region level granularity. Without interleaving, each virtual NVDIMM device is presented as a separate guest physical address range. So, there is no way a partial bind/unbind request can come for the vNVDIMM in a hcall for a subset of SCM blocks of a virtual NVDIMM. Hence it is safe to do bind/unbind everything during the device_add/del. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Message-Id: <158131059899.2897.11515211602702956854.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Shivaprasad G Bhat	ee3a71e366	spapr: Add NVDIMM device support Add support for NVDIMM devices for sPAPR. Piggyback on existing nvdimm device interface in QEMU to support virtual NVDIMM devices for Power. Create the required DT entries for the device (some entries have dummy values right now). The patch creates the required DT node and sends a hotplug interrupt to the guest. Guest is expected to undertake the normal DR resource add path in response and start issuing PAPR SCM hcalls. The device support is verified based on the machine version unlike x86. This is how it can be used .. Ex : For coldplug, the device to be added in qemu command line as shown below -object memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896 -device nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0 For hotplug, the device to be added from monitor as below object_add memory-backend-file,id=memnvdimm0,prealloc=yes,mem-path=/tmp/nvdimm0,share=yes,size=1073872896 device_add nvdimm,label-size=128k,uuid=75a3cdd7-6a2f-4791-8d15-fe0a920e8e9e,memdev=memnvdimm0,id=nvdimm0,slot=0 Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com> Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [Early implementation] Message-Id: <158131058078.2897.12767731856697459923.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:04 +11:00
Michael S. Tsirkin	a784926819	ppc: function to setup latest class options We are going to add more init for the latest machine, so move the setup to a function so we don't have to change the DEFINE_SPAPR_MACHINE macro each time. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20200207064628.1196095-1-mst@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:03 +11:00
Laurent Vivier	90118a657c	ppc/pnv: Fix PCI_EXPRESS dependency When PHB4 bridge has been added, the dependencies to PCIE_PORT has been added to XIVE_SPAPR and indirectly to PSERIES. The build of the PowerNV machine is fine while we also build the PSERIES machine. If we disable the PSERIES machine, the PowerNV build fails because the PCI Express files are not built: /usr/bin/ld: hw/ppc/pnv.o: in function `pnv_chip_power8_pic_print_info': .../hw/ppc/pnv.c:623: undefined reference to `pnv_phb3_msi_pic_print_info' /usr/bin/ld: hw/ppc/pnv.o: in function `pnv_chip_power9_pic_print_info': .../hw/ppc/pnv.c:639: undefined reference to `pnv_phb4_pic_print_info' /usr/bin/ld: ../hw/usb/hcd-ehci-pci.o: in function `usb_ehci_pci_write_config': .../hw/usb/hcd-ehci-pci.c:129: undefined reference to `pci_default_write_config' /usr/bin/ld: ../hw/usb/hcd-ehci-pci.o: in function `usb_ehci_pci_realize': .../hw/usb/hcd-ehci-pci.c:68: undefined reference to `pci_allocate_irq' /usr/bin/ld: .../hw/usb/hcd-ehci-pci.c:72: undefined reference to `pci_register_bar' /usr/bin/ld: ../hw/usb/hcd-ehci-pci.o:(.data.rel+0x50): undefined reference to `vmstate_pci_device' This patch fixes the problem by adding needed dependencies to POWERNV. Fixes: `4f9924c4d4` ("ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge") Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20200205232016.588202-3-lvivier@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:03 +11:00
Alexey Kardashevskiy	a4c3791ae0	spapr/rtas: Print message from "ibm,os-term" The "ibm,os-term" RTAS call has a single parameter which is a pointer to a message from the guest kernel about the termination cause; this prints it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200203032044.118585-1-aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-21 09:15:03 +11:00
Philippe Mathieu-Daudé	85eb7c18ee	Let cpu_[physical]_memory() calls pass a boolean 'is_write' argument Use an explicit boolean type. This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-02-20 14:47:08 +01:00
Peter Maydell	19f7034773	Avoid address_space_rw() with a constant is_write argument The address_space_rw() function allows either reads or writes depending on the is_write argument passed to it; this is useful when the direction of the access is determined programmatically (as for instance when handling the KVM_EXIT_MMIO exit reason). Under the hood it just calls either address_space_write() or address_space_read_full(). We also use it a lot with a constant is_write argument, though, which has two issues: * when reading "address_space_rw(..., 1)" this is less immediately clear to the reader as being a write than "address_space_write(...)" * calling address_space_rw() bypasses the optimization in address_space_read() that fast-paths reads of a fixed length This commit was produced with the included Coccinelle script scripts/coccinelle/exec_rw_const.cocci. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200218112457.22712-1-peter.maydell@linaro.org> [PMD: Update macvm_set_cr0() reported by Laurent Vivier] Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2020-02-20 14:47:08 +01:00
Igor Mammedov	9fe680ee75	ppc/virtex_ml507: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-69-imammedo@redhat.com>	2020-02-19 16:50:01 +00:00
Igor Mammedov	ab74e54311	ppc/spapr: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-68-imammedo@redhat.com>	2020-02-19 16:50:01 +00:00
Igor Mammedov	b28f01880e	ppc/{ppc440_bamboo, sam460ex}: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200219160953.13771-67-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	a0258e4afa	ppc/{ppc440_bamboo, sam460ex}: drop RAM size fixup If user provided non-sense RAM size, board will complain and continue running with max RAM size supported or sometimes crash like this: %QEMU -M bamboo -m 1 exec.c:1926: find_ram_offset: Assertion `size != 0' failed. Aborted (core dumped) Also RAM is going to be allocated by generic code, so it won't be possible for board to fix things up for user. Make it error message and exit to force user fix CLI, instead of accepting non-sense CLI values. That also fixes crash issue, since wrongly calculated size isn't used to allocate RAM Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> Acked-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200219160953.13771-66-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	2dc9ce13d2	ppc/ppc405_boards: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. PS: in ref405ep alias RAM into ram_memories[] to avoid re-factoring its user ppc405ep_init(), which would be invasive and out of scope this patch. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-65-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	4428dcf7b9	ppc/ppc405_boards: add RAM size checks If user provided non-sense RAM size, board will ignore it and continue running with fixed RAM size. Also RAM is going to be allocated by generic code, so it won't be possible for board to fix CLI. Make it error message and exit to force user fix CLI, instead of accepting non-sense CLI values. PS: move fixed RAM size into mc->default_ram_size, so that generic code will know how much to allocate. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-64-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	173a36d8d1	ppc/pnv: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-63-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	8ee06e4ccb	ppc/mac_oldworld: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-62-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	a5b5de02ac	ppc/mac_newworld: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200219160953.13771-61-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	9731664559	ppc/e500: use memdev for RAM memory_region_allocate_system_memory() API is going away, so replace it with memdev allocated MemoryRegion. The later is initialized by generic code, so board only needs to opt in to memdev scheme by providing MachineClass::default_ram_id and using MachineState::ram instead of manually initializing RAM memory region. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-60-imammedo@redhat.com>	2020-02-19 16:50:00 +00:00
Igor Mammedov	3538e846cb	ppc/e500: drop RAM size fixup If user provided non-sense RAM size, board will complain and continue running with max RAM size supported. Also RAM is going to be allocated by generic code, so it won't be possible for board to fix things up for user. Make it error message and exit to force user fix CLI, instead of accepting non-sense CLI values. While at it, replace usage of global ram_size with machine->ram_size Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20200219160953.13771-59-imammedo@redhat.com>	2020-02-19 16:49:59 +00:00
Aravinda Prasad	e0aeef7a35	ppc: spapr: Activate the FWNMI functionality This patch sets the default value of SPAPR_CAP_FWNMI_MCE to SPAPR_CAP_ON for machine type 5.0. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-8-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:11 +11:00
Aravinda Prasad	2500fb423a	migration: Include migration support for machine check handling This patch includes migration support for machine check handling. Especially this patch blocks VM migration requests until the machine check error handling is complete as these errors are specific to the source hardware and is irrelevant on the target hardware. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Do not set FWNMI cap in post_load, now its done in .apply hook] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-7-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:11 +11:00
Aravinda Prasad	f03496bc12	ppc: spapr: Handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls This patch adds support in QEMU to handle "ibm,nmi-register" and "ibm,nmi-interlock" RTAS calls. The machine check notification address is saved when the OS issues "ibm,nmi-register" RTAS call. This patch also handles the case when multiple processors experience machine check at or about the same time by handling "ibm,nmi-interlock" call. In such cases, as per PAPR, subsequent processors serialize waiting for the first processor to issue the "ibm,nmi-interlock" call. The second processor that also received a machine check error waits till the first processor is done reading the error log. The first processor issues "ibm,nmi-interlock" call when the error log is consumed. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Register fwnmi RTAS calls in core_rtas_register_types() where other RTAS calls are registered] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Message-Id: <20200130184423.20519-6-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:11 +11:00
Aravinda Prasad	81fe70e443	target/ppc: Build rtas error log upon an MCE Upon a machine check exception (MCE) in a guest address space, KVM causes a guest exit to enable QEMU to build and pass the error to the guest in the PAPR defined rtas error log format. This patch builds the rtas error log, copies it to the rtas_addr and then invokes the guest registered machine check handler. The handler in the guest takes suitable action(s) depending on the type and criticality of the error. For example, if an error is unrecoverable memory corruption in an application inside the guest, then the guest kernel sends a SIGBUS to the application. For recoverable errors, the guest performs recovery actions and logs the error. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [Assume SLOF has allocated enough room for rtas error log] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200130184423.20519-5-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:10 +11:00
Aravinda Prasad	9ac703ac5f	target/ppc: Handle NMI guest exit Memory error such as bit flips that cannot be corrected by hardware are passed on to the kernel for handling. If the memory address in error belongs to guest then the guest kernel is responsible for taking suitable action. Patch [1] enhances KVM to exit guest with exit reason set to KVM_EXIT_NMI in such cases. This patch handles KVM_EXIT_NMI exit. [1] https://www.spinics.net/lists/kvm-ppc/msg12637.html (e20bbd3d and related commits) Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200130184423.20519-4-ganeshgr@linux.ibm.com> [dwg: #ifdefs to fix compile for 32-bit target] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:10 +11:00
Aravinda Prasad	9d953ce447	ppc: spapr: Introduce FWNMI capability Introduce fwnmi an spapr capability and add a helper function which tries to enable it, which would be used by following patch of the series. This patch by itself does not change the existing behavior. Signed-off-by: Aravinda Prasad <arawinda.p@gmail.com> [eliminate cap_ppc_fwnmi, add fwnmi cap to migration state and reprhase the commit message] Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200130184423.20519-3-ganeshgr@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-03 11:33:10 +11:00
David Gibson	37965dfe4d	spapr: Enable DD2.3 accelerated count cache flush in pseries-5.0 machine For POWER9 DD2.2 cpus, the best current Spectre v2 indirect branch mitigation is "count cache disabled", which is configured with: -machine cap-ibs=fixed-ccd However, this option isn't available on DD2.3 CPUs with KVM, because they don't have the count cache disabled. For POWER9 DD2.3 cpus, it is "count cache flush with assist", configured with: -machine cap-ibs=workaround,cap-ccf-assist=on However this option isn't available on DD2.2 CPUs with KVM, because they don't have the special CCF assist instruction this relies on. On current machine types, we default to "count cache flush w/o assist", that is: -machine cap-ibs=workaround,cap-ccf-assist=off This runs, with mitigation on both DD2.2 and DD2.3 host cpus, but has a fairly significant performance impact. It turns out we can do better. The special instruction that CCF assist uses to trigger a count cache flush is a no-op on earlier CPUs, rather than trapping or causing other badness. It doesn't, of itself, implement the mitigation, but if we have count-cache-disabled, then the count cache flush is unnecessary, and so using the count cache flush mitigation is harmless. Therefore for the new pseries-5.0 machine type, enable cap-ccf-assist by default. Along with that, suppress throwing an error if cap-ccf-assist is selected but KVM doesn't support it, as long as KVM is giving us count-cache-disabled. To allow TCG to work out of the box, even though it doesn't implement the ccf flush assist, downgrade the error in that case to a warning. This matches several Spectre mitigations where we allow TCG to operate for debugging, since we don't really make guarantees about TCG security properties anyway. While we're there, make the TCG warning for this case match that for other mitigations. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au>	2020-02-03 11:33:02 +11:00
Cédric Le Goater	23a782eb66	ppc/pnv: change the PowerNV machine devices to be non user creatable The PowerNV machine emulates an OpenPOWER system and the PowerNV chip devices are models of the internal logic of the POWER processor. They can not be instantiated by the user on the QEMU command line. The PHB3/PHB4 devices could be an exception in the future after some rework on how the device tree is built. For the moment, exclude them also. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200129113720.7404-1-clg@kaod.org> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Cédric Le Goater	9ae1329ee2	ppc/pnv: Add models for POWER8 PHB3 PCIe Host bridge This is a model of the PCIe Host Bridge (PHB3) found on a POWER8 processor. It includes the PowerBus logic interface (PBCQ), IOMMU support, a single PCIe Gen.3 Root Complex, and support for MSI and LSI interrupt sources as found on a POWER8 system using the XICS interrupt controller. The POWER8 processor comes in different flavors: Venice, Murano, Naple, each having a different number of PHBs. To make things simpler, the models provides 3 PHB3 per chip. Some platforms, like the Firestone, can also couple PHBs on the first chip to provide more bandwidth but this is too specific to model in QEMU. XICS requires some adjustment to support the PHB3 MSI. The changes are provided here but they could be decoupled in prereq patches. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200127144506.11132-3-clg@kaod.org> [dwg: Use device_class_set_props()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Benjamin Herrenschmidt	4f9924c4d4	ppc/pnv: Add models for POWER9 PHB4 PCIe Host bridge These changes introduces models for the PCIe Host Bridge (PHB4) of the POWER9 processor. It includes the PowerBus logic interface (PBCQ), IOMMU support, a single PCIe Gen.4 Root Complex, and support for MSI and LSI interrupt sources as found on a POWER9 system using the XIVE interrupt controller. POWER9 processor comes with 3 PHB4 PEC (PCI Express Controller) and each PEC can have several PHBs. By default, * PEC0 provides 1 PHB (PHB0) * PEC1 provides 2 PHBs (PHB1 and PHB2) * PEC2 provides 3 PHBs (PHB3, PHB4 and PHB5) Each PEC has a set "global" registers and some "per-stack" (per-PHB) registers. Those are organized in two XSCOM ranges, the "Nest" range and the "PCI" range, each range contains both some "PEC" registers and some "per-stack" registers. No default device layout is provided and PCI devices can be added on any of the available PCIe Root Port (pcie.0 .. 2 of a Power9 chip) with address 0x0 as the firwware (skiboot) only accepts a single device per root port. To run a simple system with a network and a storage adapters, use a command line options such as : -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0 -device megasas,id=scsi0,bus=pcie.1,addr=0x0 -drive file=$disk,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 If more are needed, include a bridge. Multi chip is supported, each chip adding its set of PHB4 controllers and its PCI busses. The model doesn't emulate the EEH error handling. This model is not ready for hotplug yet. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [ clg: - numerous cleanups - commit log - fix for broken LSI support - PHB pic printinfo - large QOM rework ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200127144506.11132-2-clg@kaod.org> [dwg: Use device_class_set_props()] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Stefan Berger	864674fa29	spapr: Implement get_dt_compatible() callback For devices that cannot be statically initialized, implement a get_dt_compatible() callback that allows us to ask the device for the 'compatible' value. Signed-off-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200121152935.649898-3-stefanb@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Cédric Le Goater	08c3f3a734	ppc/pnv: Add support for "hostboot" mode When the "hb-mode" option is activated on the powernv machine, the firmware is mapped at 0x8000000 and the HRMOR of the HW threads are set to the same address. The PNOR mapping on the FW address space of the LPC bus is left enabled to let the firmware load any other images required to boot the host. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200127144154.10170-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Cédric Le Goater	59942f0ebb	ppc/pnv: remove useless "core-pir" property alias. Commit `158e17a65e` ("ppc/pnv: Link "chip" property to PnvCore::chip pointer") introduced some cleanups of the PnvCore realize handler. Let's continue by reworking a bit the interface of the PnvCore handlers for the CPU threads. These changes make the "core-pir" property alias unused. Remove it. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200127144154.10170-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Greg Kurz	12b3868ead	spapr: Don't allow multiple active vCPUs at CAS According to the description of "ibm,client-architecture-support" that can found in LoPAPR "B.6.2.3 Root Node Methods": If multiple partition processors or threads are active at the time of the ibm,client-architecture-support method call, or an error is detected in the format of the ibm,architecture.vec structure, the err? boolean shall be TRUE; else FALSE. We certainly don't want to temper with the platform or with the PCR of the other vCPUs if they happen to be active. Ensure we have only one active vCPU and fail CAS otherwise. This is just for conformance and robustness, it doesn't fix any known bugs. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157969867170.571404.12117797348882189656.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Greg Kurz	cbd0d7f363	spapr: Fail CAS if option vector table cannot be parsed Most of the option vector helpers have assertions to check their arguments aren't null. The guest can provide an arbitrary address for the CAS structure that would result in such null arguments. Fail CAS with H_PARAMETER and print a warning instead of aborting QEMU. Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <157925255250.397143.10855183619366882459.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Thomas Huth	b2ce76a073	hw/ppc/prep: Remove the deprecated "prep" machine and the OpenHackware BIOS It's been deprecated since QEMU v3.1. The 40p machine should be used nowadays instead. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Hervé Poussineau <hpoussin@reactos.org> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20200114114617.28854-1-thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Igor Mammedov	79a8733650	ppc:virtex_ml507: remove unused arguments Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <1579100861-73692-71-git-send-email-imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Cédric Le Goater	3cf4aac0de	ppc/pnv: improve error logging when a PNOR update fails Print out the offset at which the error occured. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200108090348.21224-3-clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Cédric Le Goater	b1c8c522f4	ppc/pnv: use QEMU unit definition MiB Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200108090348.21224-2-clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-02-02 14:07:57 +11:00
Damien Hedde	f703a04ce5	add device_legacy_reset function to prepare for reset api change Provide a temporary device_legacy_reset function doing what device_reset does to prepare for the transition with Resettable API. All occurrence of device_reset in the code tree are also replaced by device_legacy_reset. The new resettable API has different prototype and semantics (resetting child buses as well as the specified device). Subsequent commits will make the changeover for each call site individually; once that is complete device_legacy_reset() will be removed. Signed-off-by: Damien Hedde <damien.hedde@greensocs.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Cornelia Huck <cohuck@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200123132823.1117486-2-damien.hedde@greensocs.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-01-30 16:02:03 +00:00
Aleksandar Markovic	6cdda0ff4b	hw/core/loader: Let load_elf() populate a field with CPU-specific flags While loading the executable, some platforms (like AVR) need to detect CPU type that executable is built for - and, with this patch, this is enabled by reading the field 'e_flags' of the ELF header of the executable in question. The change expands functionality of the following functions: - load_elf() - load_elf_as() - load_elf_ram() - load_elf_ram_sym() The argument added to these functions is called 'pflags' and is of type 'uint32_t*' (that matches 'pointer to 'elf_word'', 'elf_word' being the type of the field 'e_flags', in both 32-bit and 64-bit variants of ELF header). Callers are allowed to pass NULL as that argument, and in such case no lookup to the field 'e_flags' will happen, and no information will be returned, of course. CC: Richard Henderson <rth@twiddle.net> CC: Peter Maydell <peter.maydell@linaro.org> CC: Edgar E. Iglesias <edgar.iglesias@gmail.com> CC: Michael Walle <michael@walle.cc> CC: Thomas Huth <huth@tuxfamily.org> CC: Laurent Vivier <laurent@vivier.eu> CC: Philippe Mathieu-Daudé <f4bug@amsat.org> CC: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> CC: Aurelien Jarno <aurelien@aurel32.net> CC: Jia Liu <proljc@gmail.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> CC: BALATON Zoltan <balaton@eik.bme.hu> CC: Christian Borntraeger <borntraeger@de.ibm.com> CC: Thomas Huth <thuth@redhat.com> CC: Artyom Tarasenko <atar4qemu@gmail.com> CC: Fabien Chouteau <chouteau@adacore.com> CC: KONRAD Frederic <frederic.konrad@adacore.com> CC: Max Filippov <jcmvbkbc@gmail.com> Reviewed-by: Aleksandar Rikalo <aleksandar.rikalo@rt-rk.com> Signed-off-by: Michael Rolnik <mrolnik@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Aleksandar Markovic <amarkovic@wavecomp.com> Message-Id: <1580079311-20447-24-git-send-email-aleksandar.markovic@rt-rk.com>	2020-01-29 19:28:52 +01:00
Marc-André Lureau	4f67d30b5e	qdev: set properties with device_class_set_props() The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:15 +01:00
Philippe Mathieu-Daudé	dd32e94838	hw/ppc/spapr_rtas: Remove local variable We only access this variable in the RTAS_SYSPARM_SPLPAR_CHARACTERISTICS case. Use it in place and remove the local declaration. Suggested-by: Greg Kurz <groug@kaod.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-4-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:11 +01:00
Philippe Mathieu-Daudé	500c2cc5d9	hw/ppc/spapr_rtas: Access MachineState via SpaprMachineState argument We received a SpaprMachineState argument. Since SpaprMachineState inherits of MachineState, use it instead of calling qdev_get_machine. Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-3-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:10 +01:00
Philippe Mathieu-Daudé	da2c8f4dcd	hw/ppc/spapr_rtas: Use local MachineState variable Since we have the MachineState already available locally, use it instead of the global current_machine. Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200121110349.25842-2-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-24 20:59:10 +01:00
Peter Xu	1df2c9a26f	migration: Define VMSTATE_INSTANCE_ID_ANY Define the new macro VMSTATE_INSTANCE_ID_ANY for callers who wants to auto-generate the vmstate instance ID. Previously it was hard coded as -1 instead of this macro. It helps to change this default value in the follow up patches. No functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>	2020-01-20 09:10:23 +01:00
Peter Maydell	b952544fe8	* Compat machines fix (Denis) * Command line parsing fixes (Michal, Peter, Xiaoyao) * Cooperlake CPU model fixes (Xiaoyao) * i386 gdb fix (mkdolata) * IOEventHandler cleanup (Philippe) * icount fix (Pavel) * RR support for random number sources (Pavel) * Kconfig fixes (Philippe) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJeFbG8AAoJEL/70l94x66DCpMIAKBwxBL+VegqI+ySKgmtIBQX LtU+ardEeZ37VfWfvuWzTFe+zQ0hsFpz/e0LHE7Ae+LVLMNWXixlmMrTIm+Xs762 hJzxBjhUhkdrMioVYTY16Kqap4Nqaxu70gDQ32Ve2sY6xYGxYLSaJooBOU5bXVgb HPspHFVpeP6ZshBd1n2LXsgURE6v3AjTwqcsPCkL/AESFdkdOsoHeXjyKWJG1oPy W7btzlUEqVsauZI8/PhhW/8hZUvUsJVHonYLTZTyy8aklU7aOILSyT2uPXFBVUVQ irkQjLtD4dWlogBKO4i/QHMuwV+Asa57WNPmqv3EcIWPUWmTY84H0g2AxRgcc2M= =48jx -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * Compat machines fix (Denis) * Command line parsing fixes (Michal, Peter, Xiaoyao) * Cooperlake CPU model fixes (Xiaoyao) * i386 gdb fix (mkdolata) * IOEventHandler cleanup (Philippe) * icount fix (Pavel) * RR support for random number sources (Pavel) * Kconfig fixes (Philippe) # gpg: Signature made Wed 08 Jan 2020 10:41:00 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (38 commits) chardev: Use QEMUChrEvent enum in IOEventHandler typedef chardev: use QEMUChrEvent instead of int chardev/char: Explicit we ignore some QEMUChrEvent in IOEventHandler monitor/hmp: Explicit we ignore a QEMUChrEvent in IOEventHandler monitor/qmp: Explicit we ignore few QEMUChrEvent in IOEventHandler virtio-console: Explicit we ignore some QEMUChrEvent in IOEventHandler vhost-user-blk: Explicit we ignore few QEMUChrEvent in IOEventHandler vhost-user-net: Explicit we ignore few QEMUChrEvent in IOEventHandler vhost-user-crypto: Explicit we ignore some QEMUChrEvent in IOEventHandler ccid-card-passthru: Explicit we ignore QEMUChrEvent in IOEventHandler hw/usb/redirect: Explicit we ignore few QEMUChrEvent in IOEventHandler hw/usb/dev-serial: Explicit we ignore few QEMUChrEvent in IOEventHandler hw/char/terminal3270: Explicit ignored QEMUChrEvent in IOEventHandler hw/ipmi: Explicit we ignore some QEMUChrEvent in IOEventHandler hw/ipmi: Remove unnecessary declarations target/i386: Add missed features to Cooperlake CPU model target/i386: Add new bit definitions of MSR_IA32_ARCH_CAPABILITIES target/i386: Fix handling of k_gs_base register in 32-bit mode in gdbstub hw/rtc/mc146818: Add missing dependency on ISA Bus hw/nvram/Kconfig: Restrict CHRP NVRAM to machines using OpenBIOS or SLOF ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-01-10 17:16:49 +00:00
Cédric Le Goater	fc2527fb02	ppc/pnv: fix check on return value of blk_getlength() blk_getlength() returns an int64_t but the result is stored in a uint32_t. Errors (negative values) won't be caught by the check in pnv_pnor_realize() and blk_blockalign() will allocate a very large buffer in such cases. Fixes Coverity issue CID 1412226. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200107171809.15556-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 12:01:14 +11:00
Cédric Le Goater	3a688294e2	ppc/pnv: check return value of blk_pwrite() When updating the PNOR file contents, we should check for a possible failure of blk_pwrite(). Fixes Coverity issue CID 1412228. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200107171809.15556-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 12:01:02 +11:00
Greg Kurz	b91cad2f07	pnv/psi: Consolidate some duplicated code in pnv_psi_realize() The proper way to do that would be to use device_class_set_parent_realize(), but defining a Pnv8PsiClass and a Pnv9PsiClass types with a parent_realize pointer adds a fair amount of code. Calling pnv_psi_realize() explicitely is fine for now. This should probably be achieved with a device realize hook in the PSI base class and device_class_set_parent_realize() in the children classes. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <157841476667.66386.13659183399113837990.stgit@bahia.tlslab.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:54:19 +11:00
Greg Kurz	fcb7e4a8f4	pnv/psi: Add device reset hook And call it from a QEMU reset handler. This allows each PNV child class to override the reset hook if needed, eg. POWER8 doesn't but POWER9 does. The proper way to do that would be to use device_class_set_parent_reset(), but defining a Pnv8PsiClass and a Pnv9PsiClass types with a parent_reset pointer adds a fair amount of code. Calling pnv_psi_reset() explicitely is fine for now. A subsequent patch will consolidate the call to qemu_register_reset() in a single place. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <157841476035.66386.17838417527621752518.stgit@bahia.tlslab.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:06:42 +11:00
Greg Kurz	806fed593d	pnv/xive: Deduce the PnvXive pointer from XiveTCTX::xptr And use it instead of reaching out to the machine. This allows to get rid of pnv_get_chip(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Cédric Le Goater	479509463b	xive: Add a "presenter" link property to the TCTX object This will be used in subsequent patches to access the XIVE associated to a TCTX without reaching out to the machine through qdev_get_machine(). Signed-off-by: Cédric Le Goater <clg@kaod.org> [ groug: - split patch - write subject and changelog ] Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-9-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Greg Kurz	d8137bb729	ppc/pnv: Add a "pnor" const link property to the BMC internal simulator This allows to get rid of a call to qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Greg Kurz	764f9b2559	ppc/pnv: Add an "nr-threads" property to the base chip class Set it at chip creation and forward it to the cores. This allows to drop a call to qdev_get_machine(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200106145645.4539-7-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Greg Kurz	d1214b819f	spapr, pnv, xive: Add a "xive-fabric" link to the XIVE router In order to get rid of qdev_get_machine(), first add a pointer to the XIVE fabric under the XIVE router and make it configurable through a QOM link property. Configure it in the spapr and pnv machine. In the case of pnv, the XIVE routers are under the chip, so this is done with a QOM alias property of the POWER9 pnv chip. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106145645.4539-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Cédric Le Goater	245cdb7f54	ppc/pnv: Introduce a "xics" property under the POWER8 chip POWER8 is the only chip using the XICS interface. Add a "xics" link and a XICSFabric attribute under this chip to remove the use of qdev_get_machine() Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20200106145645.4539-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Cédric Le Goater	34bdca8fae	ppc/pnv: Introduce a "xics" property alias under the PSI model This removes the need of the intermediate link under PSI to pass the XICS link to the underlying ICSState object. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20200106145645.4539-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Cédric Le Goater	baa45b1710	spapr/xive: remove redundant check in spapr_match_nvt() spapr_match_nvt() is a XIVE operation and is used by the machine to look for a matching target when an event notification is being delivered. An assert checks that spapr_match_nvt() is called only when the machine has selected the XIVE interrupt mode but it is redundant with the XIVE_PRESENTER() dynamic cast. Apply the cast to spapr->active_intc and remove the assert. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20200106163207.4608-1-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Greg Kurz	e44acde2f8	ppc/pnv: Drop "num-chips" machine property The number of CPU chips of the powernv machine is configurable through a "num-chips" property. This doesn't fit well with the CPU topology, eg. some configurations can come up with more CPUs than the maximum of CPUs set in the toplogy. This causes assertion to be hit with mttcg: -machine powernv,num-chips=2 -smp cores=2 -accel tcg,thread=multi ERROR: tcg/tcg.c:789:tcg_register_thread: assertion failed: (n < ms->smp.max_cpus) Aborted (core dumped) Mttcg mandates the CPU topology to be dimensioned to the actual number of CPUs, depending on the number of chips the user asked for. That is, '-machine num-chips=N' should always have a '-smp' companion with a topology that meats the resulting number of CPUs, typically '-smp sockets=N'. It thus seems that "num-chips" doesn't bring anything but forcing the user to specify the requested number of chips on the command line twice. Simplify the command line by computing the number of chips based on the CPU topology exclusively. The powernv machine isn't a production thing ; it is mostly used by developpers to prepare the bringup of real HW. Because of this and for simplicity, this deliberately ignores the official deprecation process and dumps "num-chips" right away : '-smp sockets=N' is now the only way to control the number of CPU chips. This is done at machine init because smp_parse() is called after instance init. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157830658266.533764.2214183961444213947.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Daniel Henrique Barboza	400431ef48	ppc440_bamboo.c: remove label from bamboo_load_device_tree() 'out' label can be replaced by 'return -1' in all cases. CC: David Gibson <david@gibson.dropbear.id.au> CC: qemu-ppc@nongnu.org Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200106182425.20312-3-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Daniel Henrique Barboza	9b6c1da5e9	spapr.c: remove 'out' label in spapr_dt_cas_updates() 'out' can be replaced by 'return' with the appropriate return value. CC: David Gibson <david@gibson.dropbear.id.au> CC: qemu-ppc@nongnu.org Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20200106182425.20312-2-danielhb413@gmail.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Cédric Le Goater	8f06e3705e	ppc/pnv: Modify the powerdown notifier to get the PowerNV machine Use container_of() instead of qdev_get_machine() Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191219181155.32530-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Bharata B Rao	905db91697	ppc/spapr: Support reboot of secure pseries guest A pseries guest can be run as a secure guest on Ultravisor-enabled POWER platforms. When such a secure guest is reset, we need to release/reset a few resources both on ultravisor and hypervisor side. This is achieved by invoking this new ioctl KVM_PPC_SVM_OFF from the machine reset path. As part of this ioctl, the secure guest is essentially transitioned back to normal mode so that it can reboot like a regular guest and become secure again. This ioctl has no effect when invoked for a normal guest. If this ioctl fails for a secure guest, the guest is terminated. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Message-Id: <20191219031445.8949-3-bharata@linux.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2020-01-08 11:01:59 +11:00
Philippe Mathieu-Daudé	7bebc358df	hw/nvram/Kconfig: Restrict CHRP NVRAM to machines using OpenBIOS or SLOF Only the OpenBIOS and SLOF firmwares use the CHRP NVRAM layout. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-14-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	b0048f7609	hw/ppc/Kconfig: Only select FDT helper for machines using it Not all machines use the ppc_create_page_sizes_prop() helper. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-12-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	032757adaa	hw/ppc/Kconfig: Only select fw_cfg with machines using OpenBIOS The fw_cfg helpers are only used by machines using OpenBIOS. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-11-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	7496975722	hw/ppc/Makefile: Simplify the sPAPR PCI objects rule The CONFIG_PSERIES already selects CONFIG_PCI. Simplify the Makefile rules. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-10-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	45b0bd1095	hw/ppc/Kconfig: Let the Xilinx Virtex5 ML507 use the PPC-440 devices When configured with --without-default-devices, the build fails: LINK ppc-softmmu/qemu-system-ppc /usr/bin/ld: hw/ppc/virtex_ml507.o: in function `ppc440_init_xilinx': hw/ppc/virtex_ml507.c:112: undefined reference to `ppcuic_init' collect2: error: ld returned 1 exit status make[1]: * [Makefile:206: qemu-system-ppc] Error 1 make: * [Makefile:483: ppc-softmmu/all] Error 2 Fix by selecting the PPC4XX config. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-9-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	def9119efe	hw/ppc/Kconfig: Let the Sam460ex board use the PowerPC 405 devices When configured with --without-default-devices, the build fails: LINK ppc-softmmu/qemu-system-ppc /usr/bin/ld: hw/ppc/sam460ex.o: in function `sam460ex_init': hw/ppc/sam460ex.c:313: undefined reference to `ppc4xx_plb_init' /usr/bin/ld: hw/ppc/sam460ex.c:353: undefined reference to `ppc405_ebc_init' collect2: error: ld returned 1 exit status make[1]: *** [Makefile:206: qemu-system-ppc] Error 1 Fix by selecting the PPC405 config. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-8-philmd@redhat.com> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Philippe Mathieu-Daudé	a0297be4be	hw/ppc/Kconfig: Restrict the MPC I2C controller to e500-based platforms Only the PowerPC e500-based platforms use the MPC I2C controller. Do not build it for the other machines. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191231183216.6781-7-philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-01-07 12:08:39 +01:00
Marc-André Lureau	3cad405bab	vmstate: replace DeviceState with VMStateIf Replace DeviceState dependency with VMStateIf on vmstate API. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Acked-by: Halil Pasic <pasic@linux.ibm.com>	2020-01-06 18:41:32 +04:00
Peter Maydell	4800819827	* More uses of RCU_READ_LOCK_GUARD (Dave, myself) * QOM doc improvments (Greg) * Cleanups from the Meson conversion (Marc-André) * Support for multiple -accel options (myself) * Many x86 machine cleanup (Philippe, myself) * tests/migration-test cleanup (Juan) * PC machine removal and next round of deprecation (Thomas) * kernel-doc integration (Peter, myself) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJd+YJGAAoJEL/70l94x66D0YYIAIZpS6i6NYJC8KHCl49fjI7U qHDN7MiKYTU+l3i0+iGmQL6XN5ClAY0pXkY5LBFIDpsohHR5f4jdrIKjyvcHzuIM gx/NLsiA45/niHYrn/hEo0P7CwGTrrdWL+SVmScnKcwYiBzMO/uYblxlbUBKLPNn eGaKQmEkvlUBR9GS6S1+jYg8234ZRZ4+12t5dqqADBQ7Kc0wn6KC5yebIoQxCgVc 9F5Ezdkl7befrTI7El3EC6aT18bKhIBZIs1PT/hzqzlGFhBuKM7uKDb43Yx8c7XQ bk5vzHmblPAgQyK4OETQ+DM745AOk6vBiJZbR9nrDUXWvUkrEXTQZMJKU0FXdlE= =hyYX -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging * More uses of RCU_READ_LOCK_GUARD (Dave, myself) * QOM doc improvments (Greg) * Cleanups from the Meson conversion (Marc-André) * Support for multiple -accel options (myself) * Many x86 machine cleanup (Philippe, myself) * tests/migration-test cleanup (Juan) * PC machine removal and next round of deprecation (Thomas) * kernel-doc integration (Peter, myself) # gpg: Signature made Wed 18 Dec 2019 01:35:02 GMT # gpg: using RSA key BFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: (87 commits) vga: cleanup mapping of VRAM for non-PCI VGA hw/display: Remove "rombar" hack from vga-pci and vmware_vga hw/pci: Remove the "command_serr_enable" property hw/audio: Remove the "use_broken_id" hack from the AC97 device hw/i386: Remove the deprecated machines 0.12 up to 0.15 hw/pci-host: Add Kconfig entry to select the IGD Passthrough Host Bridge hw/pci-host/i440fx: Extract the IGD passthrough host bridge device hw/pci-host/i440fx: Use definitions instead of magic values hw/pci-host/i440fx: Use size_t to iterate over ARRAY_SIZE() hw/pci-host/i440fx: Extract PCII440FXState to "hw/pci-host/i440fx.h" hw/pci-host/i440fx: Correct the header description Fix some comment spelling errors. target/i386: remove unused pci-assign codes WHPX: refactor load library migration: check length directly to make sure the range is aligned memory: include MemoryListener documentation and some missing function parameters docs: add memory API reference memory.h: Silence kernel-doc complaints docs: Create bitops.rst as example of kernel-docs bitops.h: Silence kernel-doc complaints ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-12-20 11:20:25 +00:00
Vladimir Sementsov-Ogievskiy	0c115681a5	ppc: make Error errp const where it is appropriate Mostly, Error is for returning error from the function, so the callee sets it. However kvmppc_hint_smt_possible gets already filled errp parameter. It doesn't change the pointer itself, only change the internal state of referenced Error object. So we can make it Error const errp, to stress the behavior. It will also help coccinelle script (in future) to distinguish such cases from common errp usage. While there, rename the function to kvmppc_error_append_smt_possible_hint(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20191205174635.18758-8-vsementsov@virtuozzo.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Commit message replaced] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:43:19 +01:00
Markus Armbruster	1a639fdf96	Revert "ppc: well form kvmppc_hint_smt_possible error hint helper" This reverts commit `cdcca22aab`. Commit `cdcca22aab` is a superseded version of the next commit that crept in by accident. Revert it, so the final version applies. Signed-off-by: Markus Armbruster <armbru@redhat.com>	2019-12-18 08:40:09 +01:00
Markus Armbruster	8ca63ba8c2	error: Clean up unusual names of Error * variables Local Error * variables are conventionally named @err or @local_err, and Error ** parameters @errp. Naming local variables like parameters is confusing. Clean that up. Naming parameters like local variables is also confusing. Left for another day. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191204093625.14836-17-armbru@redhat.com>	2019-12-18 08:36:15 +01:00
Paolo Bonzini	4376c40ded	kvm: introduce kvm_kernel_irqchip_* functions The KVMState struct is opaque, so provide accessors for the fields that will be moved from current_machine to the accelerator. For now they just forward to the machine object, but this will change. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-12-17 19:32:45 +01:00
Greg Kurz	5084c8b763	ppc/pnv: Drop PnvChipClass::type It isn't used anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623844102.360005.12070225703151669294.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:11 +11:00
Greg Kurz	70c059e926	ppc/pnv: Introduce PnvChipClass::xscom_pcba() method The XSCOM bus is implemented with a QOM interface, which is mostly generic from a CPU type standpoint, except for the computation of addresses on the Pervasive Connect Bus (PCB) network. This is handled by the pnv_xscom_pcba() function with a switch statement based on the chip_type class level attribute of the CPU chip. This can be achieved using QOM. Also the address argument is masked with PNV_XSCOM_SIZE - 1, which is for POWER8 only. Addresses may have different sizes with other CPU types. Have each CPU chip type handle the appropriate computation with a QOM xscom_pcba() method. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623843543.360005.13996472463887521794.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:11 +11:00
Greg Kurz	c396c58a02	ppc/pnv: Pass content of the "compatible" property to pnv_dt_xscom() Since pnv_dt_xscom() is called from chip specific dt_populate() hooks, it shouldn't have to guess the chip type in order to populate the "compatible" property. Just pass the compat string and its size as arguments. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623842430.360005.9513965612524265862.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:11 +11:00
Greg Kurz	3f5b45ca4f	ppc/pnv: Pass XSCOM base address and address size to pnv_dt_xscom() Since pnv_dt_xscom() is called from chip specific dt_populate() hooks, it shouldn't have to guess the chip type in order to populate the "reg" property. Just pass the base address and address size as arguments. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623841868.360005.17577624823547136435.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:11 +11:00
Greg Kurz	c4b2c40c0e	ppc/pnv: Introduce PnvChipClass::xscom_core_base() method The pnv_chip_core_realize() function configures the XSCOM MMIO subregion for each core of a single chip. The base address of the subregion depends on the CPU type. Its computation is currently open-code using the pnv_chip_is_powerXX() helpers. This can be achieved with QOM. Introduce a method for this in the base chip class and implement it in child classes. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623841311.360005.4705705734873339545.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:11 +11:00
Greg Kurz	85913070a6	ppc/pnv: Introduce PnvChipClass::intc_print_info() method The pnv_pic_print_info() callback checks the type of the chip in order to forward to the request appropriate interrupt controller. This can be achieved with QOM. Introduce a method for this in the base chip class and implement it in child classes. This also prepares ground for the upcoming interrupt controller of POWER10 chips. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623840755.360005.5002022339473369934.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:10 +11:00
Greg Kurz	7a90c6a1b6	ppc/pnv: Introduce PnvMachineClass::dt_power_mgt() We add an extra node to advertise power management on some machines, namely powernv9 and powernv10. This is achieved by using the pnv_is_power9() and pnv_is_power10() helpers. This can be achieved with QOM. Add a method to the base class for powernv machines and have it implemented by machine types that support power management instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623839642.360005.9243510140436689941.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:59:10 +11:00
Greg Kurz	d76f2da7a5	ppc/pnv: Introduce PnvMachineClass and PnvMachineClass::compat The pnv_dt_create() function generates different contents for the "compatible" property of the root node in the DT, depending on the CPU type. This is open coded with multiple ifs using pnv_is_powerXX() helpers. It seems cleaner to achieve with QOM. Introduce a base class for the powernv machine and a compat attribute that each child class can use to provide the value for the "compatible" property. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623839085.360005.4046508784077843216.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> [dwg: Folded in small fix Greg spotted after posting] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:58:49 +11:00
Greg Kurz	248e4e924e	ppc/pnv: Drop PnvPsiClass::chip_type It isn't used anymore. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623838530.360005.15470128760871845396.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	41c4ef7009	ppc/pnv: Introduce PnvPsiClass::compat The Processor Service Interface (PSI) model has a chip_type class level attribute, which is used to generate the content of the "compatible" DT property according to the CPU type. Since the PSI model already has specialized classes for each supported CPU type, it seems cleaner to achieve this with QOM. Provide the content of the "compatible" property with a new class level attribute. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157623837974.360005.14706607446188964477.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	3a1b70b66b	ppc/pnv: Fix OCC common area region mapping The OCC common area is mapped at a unique address on the system and each OCC is assigned a segment to expose its sensor data : ------------------------------------------------------------------------- \| Start (Offset from \| End \| Size \|Description \| \| BAR2 base address) \| \| \| \| ------------------------------------------------------------------------- \| 0x00580000 \| 0x005A57FF \|150kB \|OCC 0 Sensor Data Block\| \| 0x005A5800 \| 0x005CAFFF \|150kB \|OCC 1 Sensor Data Block\| \| : \| : \| : \| : \| \| 0x00686800 \| 0x006ABFFF \|150kB \|OCC 7 Sensor Data Block\| \| 0x006AC000 \| 0x006FFFFF \|336kB \|Reserved \| ------------------------------------------------------------------------- Maximum size is 1.5MB. We could define a "OCC common area" memory region at the machine level and sub regions for each OCC. But it adds some extra complexity to the models. Fix the current layout with a simpler model. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191211082912.2625-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	8f09231631	ppc/pnv: Introduce PBA registers The PBA bridge unit (Power Bus Access) connects the OCC (On Chip Controller) to the Power bus and System Memory. The PBA is used to gather sensor data, for power management, for sleep states, for initial boot, among other things. The PBA logic provides a set of four registers PowerBus Access Base Address Registers (PBABAR0..3) which map the OCC address space to the PowerBus space. These registers are setup by the initial FW and define the PowerBus Range of system memory that can be accessed by PBA. The current modeling of the PBABAR registers is done under the common XSCOM handlers. We introduce a specific XSCOM regions for these registers and fix : - BAR sizes and BAR masks - The mapping of the OCC common area. It is common to all chips and should be mapped once. We will address per-OCC area in the next change. - OCC common area is in BAR 3 on P8 Inspired by previous work of Balamuruhan S <bala24@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191211082912.2625-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	9e028fffaa	ppc/pnv: populate the DT with realized XSCOM devices Some devices could be initialized in the instance_init handler but not realized for configuration reasons. Nodes should not be added in the DT for such devices. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191210135845.19773-3-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	109dce3786	ppc/pnv: Loop on the whole hierarchy to populate the DT with the XSCOM nodes Some PnvXScomInterface objects lie a bit deeper (PnvPBCQState) than the first layer, so we need to loop on the whole object hierarchy to catch them. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191210135845.19773-2-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> [dwg: Corrected error in comment] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Suraj Jitindar Singh	f0ec31b1e2	target/ppc: Add SPR TBU40 The spr TBU40 is used to set the upper 40 bits of the timebase register, present on POWER5+ and later processors. This register can only be written by the hypervisor, and cannot be read. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191128134700.16091-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Suraj Jitindar Singh	5cc7e69f6d	target/ppc: Work [S]PURR implementation and add HV support The Processor Utilisation of Resources Register (PURR) and Scaled Processor Utilisation of Resources Register (SPURR) provide an estimate of the resources used by the thread, present on POWER7 and later processors. Currently the [S]PURR registers simply count at the rate of the timebase. Preserve this behaviour but rework the implementation to store an offset like the timebase rather than doing the calculation manually. Also allow hypervisor write access to the register along with the currently available read access. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [ clg: rebased on current ppc tree ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191128134700.16091-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Suraj Jitindar Singh	5d62725b2f	target/ppc: Implement the VTB for HV access The virtual timebase register (VTB) is a 64-bit register which increments at the same rate as the timebase register, present on POWER8 and later processors. The register is able to be read/written by the hypervisor and read by the supervisor. All other accesses are illegal. Currently the VTB is just an alias for the timebase (TB) register. Implement the VTB so that is can be read/written independent of the TB. Make use of the existing method for accessing timebase facilities where by the compensation is stored and used to compute the value on reads/is updated on writes. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> [ clg: rebased on current ppc tree ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191128134700.16091-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	2661f6ab2b	ppc/pnv: add a LPC Controller model for POWER10 Same a POWER9, only the MMIO window changes. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191205184454.10722-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	8b50ce8505	ppc/pnv: add a PSI bridge model for POWER10 The POWER10 PSIHB controller is very similar to the one on POWER9. We should probably introduce a common PnvPsiXive object. The ESB page size should be changed to 64k when P10 support is ready. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191205184454.10722-5-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	c5412b1d28	ppc/psi: cleanup definitions Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191205184454.10722-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	2b548a4255	ppc/pnv: Introduce a POWER10 PnvChip and a powernv10 machine This is an empty shell with the XSCOM bus and cores. The chip controllers will come later. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191205184454.10722-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	c1ad0b892c	ppc: Don't use CPUPPCState::irq_input_state with modern Book3s CPU models The power7_set_irq() and power9_set_irq() functions set this but it is never used actually. Modern Book3s compatible CPUs are only supported by the pnv and spapr machines. They have an interrupt controller, XICS for POWER7/8 and XIVE for POWER9, whose models don't require to track IRQ input states at the CPU level. Drop these lines to avoid confusion. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157548862861.3650476.16622818876928044450.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	401774387a	ppc: Deassert the external interrupt pin in KVM on reset When a CPU is reset, QEMU makes sure no interrupt is pending by clearing CPUPPCstate::pending_interrupts in ppc_cpu_reset(). In the case of a complete machine emulation, eg. a sPAPR machine, an external interrupt request could still be pending in KVM though, eg. an IPI. It will be eventually presented to the guest, which is supposed to acknowledge it at the interrupt controller. If the interrupt controller is emulated in QEMU, either XICS or XIVE, ppc_set_irq() won't deassert the external interrupt pin in KVM since it isn't pending anymore for QEMU. When the vCPU re-enters the guest, the interrupt request is still pending and the vCPU will try again to acknowledge it. This causes an infinite loop and eventually hangs the guest. The code has been broken since the beginning. The issue wasn't hit before because accel=kvm,kernel-irqchip=off is an awkward setup that never got used until recently with the LC92x IBM systems (aka, Boston). Add a ppc_irq_reset() function to do the necessary cleanup, ie. deassert the IRQ pins of the CPU in QEMU and most importantly the external interrupt pin for this vCPU in KVM. Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157548861740.3650476.16879693165328764758.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
David Gibson	d1d32d6255	spapr: Simplify ovec diff spapr_ovec_diff(ov, old, new) has somewhat complex semantics. ov is set to those bits which are in new but not old, and it returns as a boolean whether or not there are any bits in old but not new. It turns out that both callers only care about the second, not the first. This is basically equivalent to a bitmap subset operation, which is easier to understand and implement. So replace spapr_ovec_diff() with spapr_ovec_subset(). Cc: Mike Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com>	2019-12-17 10:39:48 +11:00
David Gibson	0c21e07354	spapr: Fold h_cas_compose_response() into h_client_architecture_support() spapr_h_cas_compose_response() handles the last piece of the PAPR feature negotiation process invoked via the ibm,client-architecture-support OF call. Its only caller is h_client_architecture_support() which handles most of the rest of that process. I believe it was placed in a separate file originally to handle some fiddly dependencies between functions, but mostly it's just confusing to have the CAS process split into two pieces like this. Now that compose response is simplified (by just generating the whole device tree anew), it's cleaner to just fold it into h_client_architecture_support(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	97b32a6afa	spapr: Improve handling of fdt buffer size Previously, spapr_build_fdt() constructed the device tree in a fixed buffer of size FDT_MAX_SIZE. This is a bit inflexible, but more importantly it's awkward for the case where we use it during CAS. In that case the guest firmware supplies a buffer and we have to awkwardly check that what we generated fits into it afterwards, after doing a lot of size checks during spapr_build_fdt(). Simplify this by having spapr_build_fdt() take a 'space' parameter. For the CAS case, we pass in the buffer size provided by SLOF, for the machine init case, we continue to pass FDT_MAX_SIZE. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
David Gibson	8deb8019d6	spapr: Don't trigger a CAS reboot for XICS/XIVE mode changeover PAPR allows the interrupt controller used on a POWER9 machine (XICS or XIVE) to be selected by the guest operating system, by using the ibm,client-architecture-support (CAS) feature negotiation call. Currently, if the guest selects an interrupt controller different from the one selected at initial boot, this causes the system to be reset with the new model and the boot starts again. This means we run through the SLOF boot process twice, as well as any other bootloader (e.g. grub) in use before the OS calls CAS. This can be confusing and/or inconvenient for users. Thanks to two fairly recent changes, we no longer need this reboot. 1) we now completely regenerate the device tree when CAS is called (meaning we don't need special case updates for all the device tree changes caused by the interrupt controller mode change), 2) we now have explicit code paths to activate and deactivate the different interrupt controllers, rather than just implicitly calling those at machine reset time. We can therefore eliminate the reboot for changing irq mode, simply by putting a call to spapr_irq_update_active_intc() before we call spapr_h_cas_compose_response() (which gives the updated device tree to the guest firmware and OS). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cedric Le Goater <clg@fr.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-12-17 10:39:48 +11:00
Vladimir Sementsov-Ogievskiy	cdcca22aab	ppc: well form kvmppc_hint_smt_possible error hint helper Make kvmppc_hint_smt_possible hint append helper well formed: rename errp to errp_in, as it is IN-parameter here (which is unusual for errp), rename function to be kvmppc_error_append_*_hint. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20191127191434.20945-1-vsementsov@virtuozzo.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	5373c61d6a	ppc/pnv: Clarify how the TIMA is accessed on a multichip system The TIMA region gives access to the thread interrupt context registers of a CPU. It is mapped at the same address on all chips and can be accessed by any CPU of the system. To identify the chip from which the access is being done, the PowerBUS uses a 'chip' field in the load/store messages. QEMU does not model these messages, instead, we extract the chip id from the CPU PIR and do a lookup at the machine level to fetch the targeted interrupt controller. Introduce pnv_get_chip() and pnv_xive_tm_get_xive() helpers to clarify this process in pnv_xive_get_tctx(). The latter will be removed in the subsequent patches but the same principle will be kept. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-14-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Greg Kurz	4ffb749688	spapr: Pass the maximum number of vCPUs to the KVM interrupt controller The XIVE and XICS-on-XIVE KVM devices on POWER9 hosts can greatly reduce their consumption of some scarce HW resources, namely Virtual Presenter identifiers, if they know the maximum number of vCPUs that may run in the VM. Prepare ground for this by passing the value down to xics_kvm_connect() and kvmppc_xive_connect(). This is purely mechanical, no functional change. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157478678301.67101.2717368060417156338.stgit@bahia.tlslab.ibm.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	932de7aef8	ppc/spapr: Implement the XiveFabric interface The CAM line matching sequence in the pseries machine does not change much apart from the use of the new QOM interfaces. There is an extra indirection because of the sPAPR IRQ backend of the machine. Only the XIVE backend implements the new 'match_nvt' handler. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-11-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	c722579e8c	ppc/pnv: Implement the XiveFabric interface The CAM line matching on the PowerNV machine now scans all chips of the system and all CPUs of a chip to find a dispatched NVT in the thread contexts. Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-10-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	119eaa9d11	ppc/pnv: Fix TIMA indirect access When the TIMA of a CPU needs to be accessed from the indirect page, the thread id of the target CPU is first stored in the PC_TCTXT_INDIR0 register. This thread id is relative to the chip and not to the system. Introduce a helper routine to look for a CPU of a given PIR and fix pnv_xive_get_indirect_tctx() to scan only the threads of the local chip and not the whole machine. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-8-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:48 +11:00
Cédric Le Goater	4a89e20458	ppc: Introduce a ppc_cpu_pir() helper Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-6-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	4fa28f2390	ppc/pnv: Instantiate cores separately Allocating a big void * array to store multiple objects isn't a recommended practice for various reasons: - no compile time type checking - potential dangling pointers if a reference on an individual is taken and the array is freed later on - duplicate boiler plate everywhere the array is browsed through Allocate an array of pointers and populate it instead. Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191125065820.927-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cédric Le Goater	e2392d4395	ppc/pnv: Create BMC devices at machine init The BMC of the OpenPOWER systems monitors the machine state using sensors, controls the power and controls the access to the PNOR flash device containing the firmware image required to boot the host. QEMU models the power cycle process, access to the sensors and access to the PNOR device. But, for these features to be available, the QEMU PowerNV machine needs two extras devices on the command line, an IPMI BT device for communication and a BMC backend device: -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 The BMC properties are then defined accordingly in the device tree and OPAL self adapts. If a BMC device and an IPMI BT device are not available, OPAL does not try to communicate with the BMC in any manner. This is not how real systems behave. To be closer to the default behavior, create an IPMI BMC simulator device and an IPMI BT device at machine initialization time. We loose the ability to define an external BMC device but there are benefits: - a better match with real systems, - a better test coverage of the OPAL code, - system powerdown and reset commands that work, - a QEMU device tree compliant with the specifications (). () Still needs a MBOX device. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191121162340.11049-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cédric Le Goater	ca661fae81	ppc/pnv: Add HIOMAP commands This activates HIOMAP support on the QEMU PowerNV machine. The PnvPnor model is used to access the flash contents. The model simply maps the contents at a fix offset and enables or disables the mapping. HIOMAP Protocol description : https://github.com/openbmc/hiomapd/blob/master/Documentation/protocol.md Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191028070027.22752-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cédric Le Goater	95bd61c4df	ppc/pnv: Add a LPC "ranges" property And fix a typo in the MEM address space definition. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191118091908.15044-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	818a6d30e0	spapr: Abort if XICS interrupt controller cannot be initialized Failing to set any of the ICS property should really never happen: - object_property_add_child() always succeed unless the child object already has a parent, which isn't the case here obviously since the ICS has just been created with object_new() - the ICS has an "nr-irqs" property than can be set as long as the ICS isn't realized In both cases, an error indicates there is a bug in QEMU. Propagating the error, ie. exiting QEMU since spapr_irq_init() is called with &error_fatal doesn't make much sense. Abort instead. This is consistent with what is done with XIVE : both qdev_create() and qdev_prop_set_uint32() abort QEMU on error. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157403285265.409804.8683093665795248192.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	b015a98094	xics: Link ICS_PROP_XICS property to ICSState::xics pointer The ICS object has both a pointer and an ICS_PROP_XICS property pointing to the XICS fabric. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. The property isn't optional : not being able to set the link is a bug and QEMU should rather abort than exit in this case. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157403283596.409804.17347207690271971987.stgit@bahia.lan> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	7ae54cc3a0	ppc/pnv: Link "chip" property to PnvXive::chip pointer The XIVE object has both a pointer and a "chip" property pointing to the chip object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383336564.165747.10250365296928442882.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	158e17a65e	ppc/pnv: Link "chip" property to PnvCore::chip pointer The core object has both a pointer and a "chip" property pointing to the chip object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383336007.165747.1524120147081367440.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	f2582acf99	ppc/pnv: Link "chip" property to PnvHomer::chip pointer The homer object has both a pointer and a "chip" property pointing to the chip object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383335451.165747.32301068645427993.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	ee3d27138d	ppc/pnv: Link "psi" property to PnvOCC::psi pointer The OCC object has both a pointer and a "psi" property pointing to the PSI object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383334894.165747.7617090757862105199.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	b63f389366	ppc/pnv: Link "psi" property to PnvLpc::psi pointer The LPC object has both a pointer and a "psi" property pointing to the PSI object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383334342.165747.3159314903077305653.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	82ea3a1b29	xive: Link "xive" property to XiveSource::xive pointer The source object has both a pointer and a "xive" property pointing to the notifier object. Confusing bugs could arise if these ever go out of sync. Change the property definition so that it explicitely sets the pointer. The property isn't optional : not being able to set the link is a bug and QEMU should rather abort than exit in this case. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383333227.165747.12901571295951957951.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Greg Kurz	719ed8461f	ppc/pnv: Drop "chip" link from POWER9 PSI object It has no apparent user. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157383383118.166856.2588933416368211047.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cédric Le Goater	ccb099b3bf	ppc/pnv: Add a "/qemu" device tree node It helps skiboot identifying that is running on a QEMU platform. The compatible string will define the POWERPC processor version. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191106142129.4908-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cédric Le Goater	35dde57662	ppc/pnv: Add a PNOR model On a POWERPC PowerNV system, the host firmware is stored in a PNOR flash chip which contents is mapped on the LPC bus. This model adds a simple dummy device to map the contents of a block device in the host address space. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191021131215.3693-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-12-17 10:39:47 +11:00
Cornelia Huck	3eb74d2087	hw: add compat machines for 5.0 Add 5.0 machine types for arm/i440fx/q35/s390x/spapr. For i440fx and q35, unversioned cpu models are still translated to -v1; I'll leave changing this (if desired) to the respective maintainers. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <20191112104811.30323-1-cohuck@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>	2019-12-14 10:25:50 +01:00
Evgeny Yakovlev	5f2585772f	virtio-blk: advertise F_WCE (F_FLUSH) if F_CONFIG_WCE is advertised Virtio spec 1.1 (and earlier), 5.2.5.2 Driver Requirements: Device Initialization: "Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it if they offer VIRTIO_BLK_F_CONFIG_WCE" Currently F_CONFIG_WCE and F_WCE are not connected to each other. Qemu will advertise F_CONFIG_WCE if config-wce argument is set for virtio-blk device. And F_WCE is advertised only if underlying block backend actually has it's caching enabled. Fix this by advertising F_WCE if F_CONFIG_WCE is also advertised. To preserve backwards compatibility with newer machine types make this behaviour governed by "x-enable-wce-if-config-wce" virtio-blk-device property and introduce hw_compat_4_2 with new property being off by default for all machine types <= 4.2 (but don't introduce 4.3 machine type itself yet). Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru> Message-Id: <1572978137-189218-1-git-send-email-wrfsh@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2019-12-13 11:22:06 +00:00
PanNengyuan	59d0533b85	ppc/spapr_events: fix potential NULL pointer dereference in rtas_event_log_dequeue This fixes coverity issues 68911917: 360 CID 68911917: (NULL_RETURNS) 361. dereference: Dereferencing "source", which is known to be "NULL". 361 if (source->mask & event_mask) { 362 break; 363 } Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: PanNengyuan <pannengyuan@huawei.com> Message-Id: <1574685291-38176-1-git-send-email-pannengyuan@huawei.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-11-26 10:12:58 +11:00
David Gibson	b14848f5d7	spapr: Work around spurious warnings from vfio INTx initialization Traditional PCI INTx for vfio devices can only perform well if using an in-kernel irqchip. Therefore, vfio_intx_update() issues a warning if an in kernel irqchip is not available. We usually do have an in-kernel irqchip available for pseries machines on POWER hosts. However, because the platform allows feature negotiation of what interrupt controller model to use, we don't currently initialize it until machine reset. vfio_intx_update() is called (first) from vfio_realize() before that, so it can issue a spurious warning, even if we will have an in kernel irqchip by the time we need it. To workaround this, make a call to spapr_irq_update_active_intc() from spapr_irq_init() which is called at machine realize time, before the vfio realize. This call will be pretty much obsoleted by the later call at reset time, but it serves to suppress the spurious warning from VFIO. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-26 10:11:30 +11:00
David Gibson	e532e1d93c	spapr: Handle irq backend changes with VFIO PCI devices pseries machine type can have one of two different interrupt controllers in use depending on feature negotiation with the guest. Usually this is invisible to devices, because they route to a common set of qemu_irqs which in turn dispatch to the correct back end. VFIO passthrough devices, however, wire themselves up directly to the KVM irqchip for performance, which means they are affected by this change in interrupt controller. To get them to adjust correctly for the change in irqchip, we need to fire the kvm irqchip change notifier. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>	2019-11-26 10:11:30 +11:00
Alexey Kardashevskiy	a49f62b9fd	spapr: Add /chosen to FDT only at reset time to preserve kernel and initramdisk Since "spapr: Render full FDT on ibm,client-architecture-support" we build the entire flatten device tree (FDT) twice - at the reset time and when "ibm,client-architecture-support" (CAS) is called. The full FDT from CAS is then applied on top of the SLOF internal device tree. This is mostly ok, however there is a case when the QEMU is started with -initrd and for some reason the guest decided to move/unpack the init RAM disk image - the guest correctly notifies SLOF about the change but at CAS it is overridden with the QEMU initial location addresses and the guest may fail to boot if the original initrd memory was changed. This fixes the problem by only adding the /chosen node at the reset time to prevent the original QEMU's linux,initrd-start/linux,initrd-end to override the updated addresses. This only treats /chosen differently as we know there is a special case already and it is unlikely anything else will need to change /chosen at CAS we are better off not touching /chosen after we handed it over to SLOF. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20191024041308.5673-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2019-11-18 11:50:33 +01:00
Greg Kurz	0990ce6a2e	ppc: Add intc_destroy() handlers to SpaprInterruptController/PnvChip SpaprInterruptControllerClass and PnvChipClass have an intc_create() method that calls the appropriate routine, ie. icp_create() or xive_tctx_create(), to establish the link between the VCPU and the presenter component of the interrupt controller during realize. There aren't any symmetrical call to be called when the VCPU gets unrealized though. It is assumed that object_unparent() is the only thing to do. This is questionable because the parenting logic around the CPU and presenter objects is really an implementation detail of the interrupt controller. It shouldn't be open-coded in the machine code. Fix this by adding an intc_destroy() method that undoes what was done in intc_create(). Also NULLify the presenter pointers to avoid having stale pointers around. This will allow to reliably check if a vCPU has a valid presenter. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157192724208.3146912.7254684777515287626.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Laurent Vivier <lvivier@redhat.com>	2019-11-18 11:49:11 +01:00
Wei Yang	038adc2f58	core: replace getpagesize() with qemu_real_host_page_size There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-26 15:38:06 +02:00
Philippe Mathieu-Daudé	819ce6b2a5	hw: Move M48T59 device from hw/timer/ to hw/rtc/ subdirectory The M48T59 is a Real Time Clock, not a timer. Move it under the hw/rtc/ subdirectory. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191003230404.19384-5-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-10-24 20:20:45 +02:00
Philippe Mathieu-Daudé	bcdb90640a	hw: Move MC146818 device from hw/timer/ to hw/rtc/ subdirectory The MC146818 is a Real Time Clock, not a timer. Move it under the hw/rtc/ subdirectory. Use copyright statement from `80cabfad16` for "hw/rtc/mc146818rtc.h". Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20191003230404.19384-4-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>	2019-10-24 20:13:10 +02:00
Peter Maydell	58560ad254	ppc patch queue 2019-10-24 Last pull request before soft freeze. * Lots of fixes and cleanups for spapr interrupt controllers * More SLOF updates to fix problems with full FDT rendering at CAS time (alas, more yet are to come) * A few other assorted changes This isn't quite as well tested as I usually try to do before a pull request. But I've been sick and running into some other difficulties, and wanted to get this sent out before heading towards KVM forum. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl2xXWcACgkQbDjKyiDZ s5Jy/BAAsSo514vGCjdszXcRH3nFeODKJadlSsUX+32JFP1yJS9ooxkcmIN7o9Wp 3VCkMHQPVV9jjIvvShWOSGfDDO3o8fTEucOIX/Nn9wfq+NiY+EJst0v+8OT48CSX LEXiy9Wghs9pZMLCUZ3rlLPBiU/Lhzf+KTCoUdc40tfoZMMz1lp/Uy8IdIYTYwLl /z++r4X8FOsXsDDsFopWffVdVBLJz6Var6NgBa8ISk2gGnUOAKsrTE3bD9L6n4PR YYbMSkv+SbvXg4gm53jUb9cQgpBqQpWHUYBIbKia/16EzbIkkZjFE2jGQMP5c72h ZOml7ZQtQVWIEEZwKPN67S8bKiVbEfayxHYViejn/uUqv3AwW0wi7FlBVv37YNJ4 TxPvLBu+0DaFbk5y6/XHyL6XomG1/oH6qXOM2JhIWON7HI3rRWoMQbZ6QVJ1Gwk2 uwrvOOL5kVZySotOw5bDkTXYp/Nm1JE4QwOXFPkXzaekcZhRlEqqrkBddhKtF80p 1e5hGp5RgoILIe8uHJQ7decUMk889J7Qdtakv6BWvOci4dbIiZEp/smFlzgTcPnW DQJONP/awnoAOS3v0bItf59DROvkZ5xyv8yQZFP3qThSOfZl4e95WNRbtR3vtjU4 Bl4Pdte15URKy5nM0XnnLg9mzl2xufdwEsu76lQMuYpe6nCI2h0= =FRHF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191024' into staging ppc patch queue 2019-10-24 Last pull request before soft freeze. * Lots of fixes and cleanups for spapr interrupt controllers * More SLOF updates to fix problems with full FDT rendering at CAS time (alas, more yet are to come) * A few other assorted changes This isn't quite as well tested as I usually try to do before a pull request. But I've been sick and running into some other difficulties, and wanted to get this sent out before heading towards KVM forum. # gpg: Signature made Thu 24 Oct 2019 09:14:31 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.2-20191024: (28 commits) spapr/xive: Set the OS CAM line at reset ppc/pnv: Fix naming of routines realizing the CPUs ppc: Reset the interrupt presenter from the CPU reset handler ppc/pnv: Add a PnvChip pointer to PnvCore ppc/pnv: Introduce a PnvCore reset handler spapr_cpu_core: Implement DeviceClass::reset spapr: move CPU reset after presenter creation spapr: Don't request to unplug the same core twice pseries: Update SLOF firmware image spapr: Move SpaprIrq::nr_xirqs to SpaprMachineClass spapr: Remove SpaprIrq::nr_msis spapr, xics, xive: Move SpaprIrq::post_load hook to backends spapr, xics, xive: Move SpaprIrq::reset hook logic into activate/deactivate spapr: Remove SpaprIrq::init_kvm hook spapr, xics, xive: Match signatures for XICS and XIVE KVM connect routines spapr, xics, xive: Move dt_populate from SpaprIrq to SpaprInterruptController spapr, xics, xive: Move print_info from SpaprIrq to SpaprInterruptController spapr, xics, xive: Move set_irq from SpaprIrq to SpaprInterruptController spapr: Formalize notion of active interrupt controller spapr, xics, xive: Move irq claim and free from SpaprIrq to SpaprInterruptController ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-10-24 16:22:58 +01:00
Igor Mammedov	2def24f159	ppc: rs6000_mc: drop usage of memory_region_allocate_system_memory() rs6000mc_realize() violates memory_region_allocate_system_memory() contract by calling it multiple times which could break -mem-path. Replace it with plain memory_region_init_ram() instead. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20191008113318.7012-3-imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-10-23 23:37:42 -03:00
Cédric Le Goater	00d6f4db60	ppc/pnv: Fix naming of routines realizing the CPUs The 'vcpu' suffix is inherited from the sPAPR machine. Use better names for PowerNV. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20191022163812.330-7-clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:34:09 +11:00
Cédric Le Goater	d49e8a9b46	ppc: Reset the interrupt presenter from the CPU reset handler On the sPAPR machine and PowerNV machine, the interrupt presenters are created by a machine handler at the core level and are reset independently. This is not consistent and it raises issues when it comes to handle hot-plugged CPUs. In that case, the presenters are not reset. This is less of an issue in XICS, although a zero MFFR could be a concern, but in XIVE, the OS CAM line is not set and this breaks the presenting algorithm. The current code has workarounds which need a global cleanup. Extend the sPAPR IRQ backend and the PowerNV Chip class with a new cpu_intc_reset() handler called by the CPU reset handler and remove the XiveTCTX reset handler which is now redundant. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191022163812.330-6-clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:33:45 +11:00
Cédric Le Goater	aa5ac64b23	ppc/pnv: Add a PnvChip pointer to PnvCore We will use it to reset the interrupt presenter from the CPU reset handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20191022163812.330-5-clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:33:33 +11:00
Cédric Le Goater	fa06541b5d	ppc/pnv: Introduce a PnvCore reset handler in which individual CPUs are reset. It will ease the introduction of future change reseting the interrupt presenter from the CPU reset handler. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Message-Id: <20191022163812.330-4-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:32:58 +11:00
Greg Kurz	d1f2b4691a	spapr_cpu_core: Implement DeviceClass::reset Since vCPUs aren't plugged into a bus, we manually register a reset handler for each vCPU. We also call this handler at realize time to ensure hot plugged vCPUs are reset before being exposed to the guest. This results in vCPUs being reset twice at machine reset. It doesn't break anything but it is slightly suboptimal and above all confusing. The hotplug path in device_set_realized() already knows how to reset a hotplugged device if the device reset handler is present. Implement one for sPAPR CPU cores that resets all vCPUs under a core. While here rename spapr_cpu_reset() to spapr_reset_vcpu() for consistency with spapr_realize_vcpu() and spapr_unrealize_vcpu(). Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> [clg: add documentation on the reset helper usage ] Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191022163812.330-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:32:33 +11:00
Cédric Le Goater	90f8db52bb	spapr: move CPU reset after presenter creation This change prepares ground for future changes which will reset the interrupt presenter in the reset handler of the sPAPR and PowerNV cores. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191022163812.330-2-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 13:32:33 +11:00
Greg Kurz	47c8c915b1	spapr: Don't request to unplug the same core twice We must not call spapr_drc_detach() on a detached DRC otherwise bad things can happen, ie. QEMU hangs or crashes. This is easily demonstrated with a CPU hotplug/unplug loop using QMP. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157185826035.3073024.1664101000438499392.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 09:37:54 +11:00
David Gibson	54255c1f65	spapr: Move SpaprIrq::nr_xirqs to SpaprMachineClass For the benefit of peripheral device allocation, the number of available irqs really wants to be the same on a given machine type version, regardless of what irq backends we are using. That's the case now, but only because we make sure the different SpaprIrq instances have the same value except for the special legacy one. Since this really only depends on machine type version, move the value to SpaprMachineClass instead of SpaprIrq. This also puts the code to set it to the lower value on old machine types right next to setting legacy_irq_allocation, which needs to go hand in hand. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	8cbe71ecb8	spapr: Remove SpaprIrq::nr_msis The nr_msis value we use here has to line up with whether we're using legacy or modern irq allocation. Therefore it's safer to derive it based on legacy_irq_allocation rather than having SpaprIrq contain a canned value. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	605994e5b7	spapr, xics, xive: Move SpaprIrq::post_load hook to backends The remaining logic in the post_load hook really belongs to the interrupt controller backends, and just needs to be called on the active controller (after the active controller is set to the right thing based on the incoming migration in the generic spapr_irq_post_load() logic). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	567192d486	spapr, xics, xive: Move SpaprIrq::reset hook logic into activate/deactivate It turns out that all the logic in the SpaprIrq::reset hooks (and some in the SpaprIrq::post_load hooks) isn't really related to resetting the irq backend (that's handled by the backends' own reset routines). Rather its about getting the backend ready to be the active interrupt controller or stopping being the active interrupt controller - reset (and post_load) is just the only time that changes at present. To make this flow clearer, move the logic into the explicit backend activate and deactivate hooks. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	0a17e0c39f	spapr: Remove SpaprIrq::init_kvm hook This hook is a bit odd. The only caller is spapr_irq_init_kvm(), but it explicitly takes an SpaprIrq *, so it's never really called through the current SpaprIrq. Essentially this is just a way of passing through a function pointer so that spapr_irq_init_kvm() can handle some configuration and error handling logic without duplicating it between the xics and xive reset paths. So, make it just take that function pointer. Because of earlier reworks to the KVM connect/disconnect code in the xics and xive backends we can also eliminate some wrapper functions and streamline error handling a bit. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	98a39a7927	spapr, xics, xive: Match signatures for XICS and XIVE KVM connect routines Both XICS and XIVE have routines to connect and disconnect KVM with similar but not identical signatures. This adjusts them to match exactly, which will be useful for further cleanups later. While we're there, we add an explicit return value to the connect path to streamline error reporting in the callers. We remove error reporting the disconnect path. In the XICS case this wasn't used at all. In the XIVE case the only error case was if the KVM device was set up, but KVM didn't have the capability to do so which is pretty obviously impossible. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	05289273c0	spapr, xics, xive: Move dt_populate from SpaprIrq to SpaprInterruptController This method depends only on the active irq controller. Now that we've formalized the notion of active controller we can dispatch directly through that, rather than dispatching via SpaprIrq with the dual version having to do a second conditional dispatch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	328d8eb24d	spapr, xics, xive: Move print_info from SpaprIrq to SpaprInterruptController This method depends only on the active irq controller. Now that we've formalized the notion of active controller we can dispatch directly through that, rather than dispatching via SpaprIrq with the dual version having to do a second conditional dispatch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	7bcdbcca2f	spapr, xics, xive: Move set_irq from SpaprIrq to SpaprInterruptController This method depends only on the active irq controller. Now that we've formalized the notion of active controller we can dispatch directly through that, rather than dispatching via SpaprIrq with the dual version having to do a second conditional dispatch. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	81106ddd1a	spapr: Formalize notion of active interrupt controller spapr now has the mechanism of constructing both XICS and XIVE instances of the SpaprInterruptController interface. However, only one of the interrupt controllers will actually be active at any given time, depending on feature negotiation with the guest. This is handled in the current code via spapr_irq_current() which checks the OV5 vector from feature negotiation to determine the current backend. Determining the active controller at the point we need it like this can be pretty confusing, because it makes it very non obvious at what points the active controller can change. This can make it difficult to reason about the code and where a change of active controller could appear in sequence with other events. Make this mechanism more explicit by adding an 'active_intc' pointer and an explicit spapr_irq_update_active_intc() function to update it from the CAS state. We also add hooks on the intc backend which will get called when it is activated or deactivated. For now we just introduce the switch and hooks, later patches will actually start using them. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	0b0e52b131	spapr, xics, xive: Move irq claim and free from SpaprIrq to SpaprInterruptController These methods, like cpu_intc_create, really belong to the interrupt controller, but need to be called on all possible intcs. Like cpu_intc_create, therefore, make them methods on the intc and always call it for all existing intcs. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	ebd6be089b	spapr, xics, xive: Move cpu_intc_create from SpaprIrq to SpaprInterruptController This method essentially represents code which belongs to the interrupt controller, but needs to be called on all possible intcs, rather than just the currently active one. The "dual" version therefore calls into the xics and xive versions confusingly. Handle this more directly, by making it instead a method on the intc backend, and always calling it on every backend that exists. While we're there, streamline the error reporting a bit. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
David Gibson	150e25f85b	spapr, xics, xive: Introduce SpaprInterruptController QOM interface The SpaprIrq structure is used to represent ths spapr machine's irq backend. Except that it kind of conflates two concepts: one is the backend proper - a specific interrupt controller that we might or might not be using, the other is the irq configuration which covers the layout of irq space and which interrupt controllers are allowed. This leads to some pretty confusing code paths for the "dual" configuration where its hooks redirect to other SpaprIrq structures depending on the currently active irq controller. To clean this up, we start by introducing a new SpaprInterruptController QOM interface to represent strictly an interrupt controller backend, not counting anything configuration related. We implement this interface in the XICs and XIVE interrupt controllers, and in future we'll move relevant methods from SpaprIrq into it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-24 09:36:55 +11:00
Greg Kurz	29cb418749	spapr: Set VSMT to smp_threads by default Support for setting VSMT is available in KVM since linux-4.13. Most distros that support KVM on POWER already have it. It thus seem reasonable enough to have the default machine to set VSMT to smp_threads. This brings contiguous VCPU ids and thus brings their upper bound down to the machine's max_cpus. This is especially useful for XIVE KVM devices, which may thus allocate only one VP descriptor per VCPU. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <157010411885.246126.12610015369068227139.stgit@bahia.lan> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 09:36:55 +11:00
Cédric Le Goater	06d26eeb47	ppc/pnv: Use address_space_stq_be() when triggering an interrupt from PSI Include the XIVE_TRIGGER_PQ bit in the trigger data which is how hardware signals to the IC that the PQ bits of the interrupt source have been checked. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191007084102.29776-3-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-24 09:36:55 +11:00
Tao Xu	0533ef5f20	numa: Introduce MachineClass::auto_enable_numa for implicit NUMA node Add MachineClass::auto_enable_numa field. When it is true, a NUMA node is expected to be created implicitly. Acked-by: David Gibson <david@gibson.dropbear.id.au> Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Tao Xu <tao3.xu@intel.com> Message-Id: <20190905083238.1799-1-tao3.xu@intel.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>	2019-10-15 18:18:08 -03:00
Peter Maydell	0f0b43868a	ppc patch queue 2019-10-04 Here's the next batch of ppc and spapr patches. Includes: * Fist part of a large cleanup to irq infrastructure * Recreate the full FDT at CAS time, instead of making a difficult to follow set of updates. This will help us move towards eliminating CAS reboots altogether * No longer provide RTAS blob to SLOF - SLOF can include it just as well itself, since guests will generally need to relocate it with a call to instantiate-rtas * A number of DFP fixes and cleanups from Mark Cave-Ayland * Assorted bugfixes * Several new small devices for powernv -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAl2XEn0ACgkQbDjKyiDZ s5I6bA/7B5sjY/QxuE8axm5KupoAnE8zf205hN8mbYASwtDfFwgaeNreVaOSJUpr fgcx/g9G3rAryGZv3O6i02+wcRgNw1DnJ3ynCthIrExZEcfbTYJiS4s9apwPEQy8 HFmBNdPDqrhFI0aFvXEUauiOp1aapPUUklm34eFscs94lJXxphRUEfa3XT5uEhUh xrIZwYq20A+ih4UHwk3Onyx/cvFpl6BRB2nVEllQFqzwF5eTTfz9t8+JGTebxD/7 8qqt8ti0KM3wxSDTQnmyMUmpgy+C1iCvNYvv6nWFg+07QuGs48EHlQUUVVni4r9j kUrDwKS2eC+8e8gP/xdIXEq3R2DsAMq+wFIswXZ3X6x4DoUV0OAJSHc9iMD4l+pr LyWnVpDprc6XhJHWKpuHZ5w9EuBnZFbIXdlZGFno+8UvXtusnbbuwAZzHTrRJRqe /AWVpFwGAoOF4KxIOFlPVBI8m4vFad/soVojC0vzIbRqaogOFZAjiL/yD5GwLmMa tywOEMBUJ/j2lgudTCyKn5uCa/Ew3DS1TSdenJjyqRi/gZM0IaORIhJhyFYW/eO1 U7Uh8BnbC+4J11wwvFR5+W789dgM2+EEtAX9uI08VcE/R2ASabZlN4Zwrl0w4cb/ VRybMT4bgmjzHRpfrqYPxpn8wqPcIw0BCeipSOjY3QU1Q25TEYQ= =PXXe -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.2-20191004' into staging ppc patch queue 2019-10-04 Here's the next batch of ppc and spapr patches. Includes: * Fist part of a large cleanup to irq infrastructure * Recreate the full FDT at CAS time, instead of making a difficult to follow set of updates. This will help us move towards eliminating CAS reboots altogether * No longer provide RTAS blob to SLOF - SLOF can include it just as well itself, since guests will generally need to relocate it with a call to instantiate-rtas * A number of DFP fixes and cleanups from Mark Cave-Ayland * Assorted bugfixes * Several new small devices for powernv # gpg: Signature made Fri 04 Oct 2019 10:35:57 BST # gpg: using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392 # gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full] # gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full] # gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full] # gpg: aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown] # Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392 * remotes/dgibson/tags/ppc-for-4.2-20191004: (53 commits) ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine spapr: Eliminate SpaprIrq::init hook spapr: Add return value to spapr_irq_check() spapr: Use less cryptic representation of which irq backends are supported xive: Improve irq claim/free path spapr, xics, xive: Better use of assert()s on irq claim/free paths spapr: Handle freeing of multiple irqs in frontend only spapr: Remove unhelpful tracepoints from spapr_irq_free_xics() spapr: Eliminate SpaprIrq:get_nodename method spapr: Simplify spapr_qirq() handling spapr: Fix indexing of XICS irqs spapr: Eliminate nr_irqs parameter to SpaprIrq::init spapr: Clarify and fix handling of nr_irqs spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper spapr: Fold spapr_phb_lsi_qirq() into its single caller xics: Create sPAPR specific ICS subtype xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes xics: Eliminate reset hook xics: Rename misleading ics_simple_*() functions xics: Eliminate 'reject', 'resend' and 'eoi' class hooks ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2019-10-07 13:49:02 +01:00
Eric Auger	549d400587	memory: allow memory_region_register_iommu_notifier() to fail Currently, when a notifier is attempted to be registered and its flags are not supported (especially the MAP one) by the IOMMU MR, we generally abruptly exit in the IOMMU code. The failure could be handled more nicely in the caller and especially in the VFIO code. So let's allow memory_region_register_iommu_notifier() to fail as well as notify_flag_changed() callback. All sites implementing the callback are updated. This patch does not yet remove the exit(1) in the amd_iommu code. in SMMUv3 we turn the warning message into an error message saying that the assigned device would not work properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2019-10-04 18:49:18 +02:00
Cédric Le Goater	1aba8716c8	ppc/pnv: Remove the XICSFabric Interface from the POWER9 machine The POWER8 PowerNV machine needs to implement a XICSFabric interface as this is the POWER8 interrupt controller model. But the POWER9 machine uselessly inherits of XICSFabric from the common PowerNV machine definition. Open code machine definitions to have a better control on the different interfaces each machine should define. Fixes: `f30c843ced` ("ppc/pnv: Introduce PowerNV machines with fixed CPU models") Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-Id: <20191003143617.21682-1-clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>	2019-10-04 19:08:23 +10:00
David Gibson	f478d9af21	spapr: Eliminate SpaprIrq::init hook This method is used to set up the interrupt backends for the current configuration. However, this means some confusing redirection between the "dual" mode init and the init hooks for xics only and xive only modes. Since we now have simple flags indicating whether XICS and/or XIVE are supported, it's easier to just open code each initialization directly in spapr_irq_init(). This will also make some future cleanups simpler. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	0a3fd3df6f	spapr: Add return value to spapr_irq_check() Explicitly return success or failure, rather than just relying on the Error ** parameter. This makes handling it less verbose in the caller. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	ca62823b79	spapr: Use less cryptic representation of which irq backends are supported SpaprIrq::ov5 stores the value for a particular byte in PAPR option vector 5 which indicates whether XICS, XIVE or both interrupt controllers are available. As usual for PAPR, the encoding is kind of overly complicated and confusing (though to be fair there are some backwards compat things it has to handle). But to make our internal code clearer, have SpaprIrq encode more directly which backends are available as two booleans, and derive the OV5 value from that at the point we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	e594c2ad1c	xive: Improve irq claim/free path spapr_xive_irq_claim() returns a bool to indicate if it succeeded. But most of the callers and one callee use int return values and/or an Error * with more information instead. In any case, ints are a more common idiom for success/failure states than bools (one never knows what sense they'll be in). So instead change to an int return value to indicate presence of error + an Error * to describe the details through that call chain. It also didn't actually check if the irq was already claimed, which is one of the primary purposes of the claim path, so do that. spapr_xive_irq_free() also returned a bool... which no callers checked and was always true, so just drop it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	580dde5e4a	spapr, xics, xive: Better use of assert()s on irq claim/free paths The irq claim and free paths for both XICS and XIVE check for some validity conditions. Some of these represent genuine runtime failures, however others - particularly checking that the basic irq number is in a sane range - could only fail in the case of bugs in the callin code. Therefore use assert()s instead of runtime failures for those. In addition the non backend-specific part of the claim/free paths should only be used for PAPR external irqs, that is in the range SPAPR_XIRQ_BASE to the maximum irq number. Put assert()s for that into the top level dispatchers as well. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	f233cee97b	spapr: Handle freeing of multiple irqs in frontend only spapr_irq_free() can be used to free multiple irqs at once. That's useful for its callers, but there's no need to make the individual backend hooks handle this. We can loop across the irqs in spapr_irq_free() itself and have the hooks just do one at time. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-04 19:08:23 +10:00
David Gibson	85d0425652	spapr: Remove unhelpful tracepoints from spapr_irq_free_xics() These traces contain some useless information (the always-0 source#) and have no equivalents for XIVE mode. For now just remove them, and we can put back something more sensible if and when we need it. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-10-04 19:08:22 +10:00
David Gibson	14789694cd	spapr: Eliminate SpaprIrq:get_nodename method This method is used to determine the name of the irq backend's node in the device tree, so that we can find its phandle (after SLOF may have modified it from the phandle we initially gave it). But, in the two cases the only difference between the node name is the presence of a unit address. Searching for a node name without considering unit address is standard practice for the device tree, and fdt_subnode_offset() will do exactly that, making this method unecessary. While we're there, remove the XICS_NODENAME define. The name "interrupt-controller" is required by PAPR (and IEEE1275), and a bunch of places assume it already. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	af1861511d	spapr: Simplify spapr_qirq() handling Currently spapr_qirq(), whic is used to find the qemu_irq for an spapr global irq number, redirects through the SpaprIrq::qirq method. But the array of qemu_irqs is allocated in the PAPR layer, not the backends, and so the method implementations all return the same thing, just differing in the preliminary checks they make. So, we can remove the method, and just implement spapr_qirq() directly, including all the relevant checks in one place. We change all those checks into assert()s as well, since a failure here indicates an error in the calling code. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-10-04 19:08:22 +10:00
David Gibson	9f53c0db19	spapr: Fix indexing of XICS irqs spapr global irq numbers are different from the source numbers on the ICS when using XICS - they're offset by XICS_IRQ_BASE (0x1000). But spapr_irq_set_irq_xics() was passing through the global irq number to the ICS code unmodified. We only got away with this because of a counteracting bug - we were incorrectly adjusting the qemu_irq we returned for a requested global irq number. That approach mostly worked but is very confusing, incorrectly relies on the way the qemu_irq array is allocated, and undermines the intention of having the global array of qemu_irqs for spapr have a consistent meaning regardless of irq backend. So, fix both set_irq and qemu_irq indexing. We rename some parameters at the same time to make it clear that they are referring to spapr global irq numbers. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	fe9b61b246	spapr: Eliminate nr_irqs parameter to SpaprIrq::init The only reason this parameter was needed was to work around the inconsistent meaning of nr_irqs between xics and xive. Now that we've fixed that, we can consistently use the number directly in the SpaprIrq configuration. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	ad8de98636	spapr: Clarify and fix handling of nr_irqs Both the XICS and XIVE interrupt backends have a "nr-irqs" property, but it means slightly different things. For XICS (or, strictly, the ICS) it indicates the number of "real" external IRQs. Those start at XICS_IRQ_BASE (0x1000) and don't include the special IPI vector. For XIVE, however, it includes the whole IRQ space, including XIVE's many IPI vectors. The spapr code currently doesn't handle this sensibly, with the nr_irqs value in SpaprIrq having different meanings depending on the backend. We fix this by renaming nr_irqs to nr_xirqs and making it always indicate just the number of external irqs, adjusting the value we pass to XIVE accordingly. We also move to using common constants in most of the irq configurations, to make it clearer that the IRQ space looks the same to the guest (and emulated devices), even if the backend is different. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	7678b74a94	spapr: Replace spapr_vio_qirq() helper with spapr_vio_irq_pulse() helper Every caller of spapr_vio_qirq() immediately calls qemu_irq_pulse() with the result, so we might as well just fold that into the helper. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-10-04 19:08:22 +10:00
David Gibson	258aa5ce1c	spapr: Fold spapr_phb_lsi_qirq() into its single caller No point having a two-line helper that's used exactly once, and not likely to be used anywhere else in future. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>	2019-10-04 19:08:22 +10:00
David Gibson	9db8c551c9	xics: Create sPAPR specific ICS subtype We create a subtype of TYPE_ICS specifically for sPAPR. For now all this does is move the setup of the PAPR specific hcalls and RTAS calls to the realize() function for this, rather than requiring the PAPR code to explicitly call xics_spapr_init(). In future it will have some more function. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	642e92719e	xics: Merge TYPE_ICS_BASE and TYPE_ICS_SIMPLE classes TYPE_ICS_SIMPLE is the only subtype of TYPE_ICS_BASE that's ever instantiated. The existence of different classes is mostly a hang over from when we (misguidedly) had separate subtypes for the KVM and non-KVM version of the device. There could be some call for an abstract base type for ICS variants that use a different representation of their state (PowerNV PHB3 might want this). The current split isn't really in the right place for that though. If we need this in future, we can re-implement it more in line with what we actually need. So, collapse the two classes together into just TYPE_ICS. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00
David Gibson	28976c99cf	xics: Rename misleading ics_simple_() functions There are a number of ics_simple_() functions that aren't actually specific to TYPE_XICS_SIMPLE at all, and are equally valid on TYPE_XICS_BASE. Rename them to ics_*() accordingly. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org>	2019-10-04 19:08:22 +10:00

... 2 3 4 5 6 ...

2217 Commits