The ACPI CPU hotplug states must be migrated along with other vCPU
hotplug states to the destination VM. Update the GED's VM State
Description (VMSD) table subsection to conditionally include the CPU
Hotplug VM State Description (VMSD).
Excerpt of GED VMSD State Dump at Source:
"acpi-ged (16)": {
"ged_state": {
"sel": "0x00000000"
},
[...]
"acpi-ged/cpuhp": {
"cpuhp_state": {
"selector": "0x00000005",
"command": "0x00",
"devs": [
{
"is_inserting": false,
"is_removing": false,
"ost_event": "0x00000000",
"ost_status": "0x00000000"
},
[...]
{
"is_inserting": false,
"is_removing": false,
"ost_event": "0x00000000",
"ost_status": "0x00000000"
}
]
}
}
},
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Message-Id: <20241103102419.202225-6-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reflect the QOM vCPUs ACPI CPU hotplug states in the `_STA.Present` and
and `_STA.Enabled` bits when the guest kernel evaluates the ACPI
`_STA` method during initialization, as well as when vCPUs are
hot-plugged or hot-unplugged. If the CPU is present then the its
`enabled` status can be fetched using architecture-specific code [1].
Reference:
[1] Example implementation of architecture-specific hook to fetch CPU
`enabled status
Link: c0b416b11e
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Message-Id: <20241103102419.202225-4-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On most architectures, during vCPU hot-plug and hot-unplug actions, the
firmware or VMM/QEMU can update the OS on vCPU status by toggling the
ACPI method `_STA.Present` bit. However, certain CPU architectures
prohibit [1] modifications to a CPU’s `presence` status after the kernel
has booted.
This limitation [2][3] exists because many per-CPU components, such as
interrupt controllers and various per-CPU features tightly integrated
with CPUs, may not support reconfiguration once the kernel is
initialized. Often, these components cannot be powered down, as they may
belong to an `always-on` power domain. As a result, some architectures
require all CPUs to remain `_STA.Present` after system initialization.
Therefore, it is essential to mirror the exact QOM vCPU status through
ACPI for the Guest kernel. For this, we should determine—via
architecture-specific code[4]—whether vCPUs must always remain present
and whether the associated `AcpiCpuStatus::cpu` object should remain
valid, even following a vCPU hot-unplug operation.
References:
[1] Check comment 5 in the bugzilla entry
Link: https://bugzilla.tianocore.org/show_bug.cgi?id=4481#c5
[2] KVMForum 2023 Presentation: Challenges Revisited in Supporting Virt CPU Hotplug on
architectures that don’t Support CPU Hotplug (like ARM64)
a. Kernel Link: https://kvm-forum.qemu.org/2023/KVM-forum-cpu-hotplug_7OJ1YyJ.pdf
b. Qemu Link: https://kvm-forum.qemu.org/2023/Challenges_Revisited_in_Supporting_Virt_CPU_Hotplug_-__ii0iNb3.pdf
[3] KVMForum 2020 Presentation: Challenges in Supporting Virtual CPU Hotplug on
SoC Based Systems (like ARM64)
Link: https://kvmforum2020.sched.com/event/eE4m
[4] Example implementation of architecture-specific CPU persistence hook
Link: c0b416b11e
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Message-Id: <20241103102419.202225-2-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>From review of the Generic Ports support.
These properties had no description set so add one.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174321.1843228-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>From review of generic port introduction.
The value is handled as a uint32_t so store it in that type.
The value cannot in reality exceed MAX_NODES which is currently
128 but if the types are matched there is no need to rely on that
restriction.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174237.1843213-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
These are very similar to the recently added Generic Initiators
but instead of representing an initiator of memory traffic they
represent an edge point beyond which may lie either targets or
initiators. Here we add these ports such that they may
be targets of hmat_lb records to describe the latency and
bandwidth from host side initiators to the port. A discoverable
mechanism such as UEFI CDAT read from CXL devices and switches
is used to discover the remainder of the path, and the OS can build
up full latency and bandwidth numbers as need for work and data
placement decisions.
Acked-by: Markus Armbruster <armbru@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916174122.1843197-1-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Whilst ACPI SRAT Generic Initiator Afinity Structures are able to refer to
both PCI and ACPI Device Handles, the QEMU implementation only implements
the PCI Device Handle case. For now move the code into the existing
hw/acpi/pci.c file and header. If support for ACPI Device Handles is
added in the future, perhaps this will be moved again.
Also push the struct AcpiGenericInitiator down into the c file as not
used outside pci.c.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Using a property allows us to hide the internal details of the PCI device
from the code to build a SRAT Generic Initiator Affinity Structure with
PCI Device Handle.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Igor noted that this function only builds one instance, so was rather
misleadingly named. Fix that.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-5-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rather than attempting to create a generic function with mess of the two
different device handle types, use a PCI handle specific variant. If the
ACPI handle form is needed then that can be introduced alongside this
with little duplicated code.
Drop the PCIDeviceHandle in favor of just passing the bus, devfn
and segment directly. devfn kept as a single byte because ARI means
that in this case it is just an 8 bit function number.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Link: https://lore.kernel.org/qemu-devel/20240618142333.102be976@imammedo.users.ipa.redhat.com/
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Before making additional modification, tidy up this misleading indentation.
Reviewed-by: Ankit Agrawal <ankita@nvidia.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The ordering in ACPI specification [1] has bus number in the lowest byte.
As ACPI tables are little endian this is the reverse of the ordering
used by PCI_BUILD_BDF(). As a minimal fix split the QEMU BDF up
into bus and devfn and write them as single bytes in the correct
order.
[1] ACPI Spec 6.3, Table 5.80
Fixes: 0a5b5acdf2 ("hw/acpi: Implement the SRAT GI affinity structure")
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240916171017.1841767-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Macro definition is added for acpi sleep control register, ged emulation
driver can use the macro , also it can be used in FDT table if ged is
exposed with FDT table.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240918014206.2165821-2-maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-18-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
This patch is part of a series that moves towards a consistent use of
g_assert_not_reached() rather than an ad hoc mix of different
assertion mechanisms.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Message-ID: <20240919044641.386068-2-pierrick.bouvier@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Use device_class_set_legacy_reset() instead of opencoding an
assignment to DeviceClass::reset. This change was produced
with:
spatch --macro-file scripts/cocci-macro-file.h \
--sp-file scripts/coccinelle/device-reset.cocci \
--keep-comments --smpl-spacing --in-place --dir hw
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
This patch implements the periodic and the swsmi ICH9 chipset timers. They are
especially useful when prototyping UEFI firmware (e.g. with EDK2's OVMF)
using QEMU.
For backwards compatibility, the compat properties "x-smi-swsmi-timer",
and "x-smi-periodic-timer" are introduced.
Additionally, writes to the SMI_STS register are enabled for the
corresponding two bits using a write mask to make future work easier.
Signed-off-by: Dominic Prinz <git@dprinz.de>
Message-Id: <1d90ea69e01ab71a0f2ced116801dc78e04f4448.1725991505.git.git@dprinz.de>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
CPUs Control device(\\_SB.PCI0) register interface for the x86 arch is IO port
based and existing CPUs AML code assumes _CRS objects would evaluate to a system
resource which describes IO Port address. But on ARM arch CPUs control
device(\\_SB.PRES) register interface is memory-mapped hence _CRS object should
evaluate to system resource which describes memory-mapped base address. Update
build CPUs AML function to accept both IO/MEMORY region spaces and accordingly
update the _CRS object.
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-6-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
OSPM evaluates _EVT method to map the event. The CPU hotplug event eventually
results in start of the CPU scan. Scan figures out the CPU and the kind of
event(plug/unplug) and notifies it back to the guest. Update the GED AML _EVT
method with the call to method \\_SB.CPUS.CSCN (via \\_SB.GED.CSCN)
Architecture specific code [1] might initialize its CPUs AML code by calling
common function build_cpus_aml() like below for ARM:
build_cpus_aml(scope, ms, opts, xx_madt_cpu_entry, memmap[VIRT_CPUHP_ACPI].base,
"\\_SB", "\\_SB.GED.CSCN", AML_SYSTEM_MEMORY);
[1] https://lore.kernel.org/qemu-devel/20240613233639.202896-13-salil.mehta@huawei.com/
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-5-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ACPI GED (as described in the ACPI 6.4 spec) uses an interrupt listed in the
_CRS object of GED to intimate OSPM about an event. Later then demultiplexes the
notified event by evaluating ACPI _EVT method to know the type of event. Use
ACPI GED to also notify the guest kernel about any CPU hot(un)plug events.
Note, GED interface is used by many hotplug events like memory hotplug, NVDIMM
hotplug and non-hotplug events like system power down event. Each of these can
be selected using a bit in the 32 bit GED IO interface. A bit has been reserved
for the CPU hotplug event.
ACPI CPU hotplug related initialization should only happen if ACPI_CPU_HOTPLUG
support has been enabled for particular architecture. Add cpu_hotplug_hw_init()
stub to avoid compilation break.
Co-developed-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20240716111502.202344-4-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
CPU ctrl-dev MMIO region length could be used in ACPI GED and various other
architecture specific places. Move ACPI_CPU_HOTPLUG_REG_LEN macro to more
appropriate common header file.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com>
Tested-by: Xianglai Li <lixianglai@loongson.cn>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
Tested-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240716111502.202344-3-salil.mehta@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
acpi_memory_hotplug::is_enabled is set to %true once via
ich9_lpc_initfn() -> ich9_pm_add_properties(). No need to
check it, so remove now dead code.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240617071118.60464-8-philmd@linaro.org>
No external code sets the 'memory-hotplug-support'
property, remove it.
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20240617071118.60464-7-philmd@linaro.org>
The MEM_UNPLUG_ERROR event is deprecated since commit d43f1670c7
("qapi/qdev.json: add DEVICE_UNPLUG_GUEST_ERROR QAPI event"),
time to remove it.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20240530071548.20074-2-philmd@linaro.org>
With a numa set up such as
-numa nodeid=0,cpus=0 \
-numa nodeid=1,memdev=mem \
-numa nodeid=2,cpus=1
and appropriate hmat_lb entries the initiator list is correctly
computed and writen to HMAT as 0,2 but then the LB data is accessed
using the node id (here 2), landing outside the entry_list array.
Stash the reverse lookup when writing the initiator list and use
it to get the correct array index index.
Fixes: 4586a2cb83 ("hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240307160326.31570-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If qemu is started with a proximity node containing CPUs alone,
it will provide one of these structures to say memory in this
node is directly connected to itself.
This description is arguably pointless even if there is memory
in the node. If there is no memory present, and hence no SRAT
entry it breaks Linux HMAT passing and the table is rejected.
https://elixir.bootlin.com/linux/v6.7/source/drivers/acpi/numa/hmat.c#L444
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20240307160326.31570-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
ACPI spec provides a scheme to associate "Generic Initiators" [1]
(e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with
integrated compute or DMA engines GPUs) with Proximity Domains. This is
achieved using Generic Initiator Affinity Structure in SRAT. During bootup,
Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA
node for each unique PXM ID encountered. Qemu currently do not implement
these structures while building SRAT.
Add GI structures while building VM ACPI SRAT. The association between
device and node are stored using acpi-generic-initiator object. Lookup
presence of all such objects and use them to build these structures.
The structure needs a PCI device handle [2] that consists of the device BDF.
The vfio-pci device corresponding to the acpi-generic-initiator object is
located to determine the BDF.
[1] ACPI Spec 6.3, Section 5.2.16.6
[2] ACPI Spec 6.3, Table 5.80
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Cedric Le Goater <clg@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-3-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows
partitioning of the GPU device resources (including device memory) into
several (upto 8) isolated instances. Each of the partitioned memory needs
a dedicated NUMA node to operate. The partitions are not fixed and they
can be created/deleted at runtime.
Unfortunately Linux OS does not provide a means to dynamically create/destroy
NUMA nodes and such feature implementation is not expected to be trivial. The
nodes that OS discovers at the boot time while parsing SRAT remains fixed. So
we utilize the Generic Initiator (GI) Affinity structures that allows
association between nodes and devices. Multiple GI structures per BDF is
possible, allowing creation of multiple nodes by exposing unique PXM in each
of these structures.
Implement the mechanism to build the GI affinity structures as Qemu currently
does not. Introduce a new acpi-generic-initiator object to allow host admin
link a device with an associated NUMA node. Qemu maintains this association
and use this object to build the requisite GI Affinity Structure.
When multiple NUMA nodes are associated with a device, it is required to
create those many number of acpi-generic-initiator objects, each representing
a unique device:node association.
Following is one of a decoded GI affinity structure in VM ACPI SRAT.
[0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity]
[0C9h 0201 1] Length : 20
[0CAh 0202 1] Reserved1 : 00
[0CBh 0203 1] Device Handle Type : 01
[0CCh 0204 4] Proximity Domain : 00000007
[0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00
00 00 00 00 00
[0E0h 0224 4] Flags (decoded below) : 00000001
Enabled : 1
[0E4h 0228 4] Reserved2 : 00000000
[0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity]
[0E9h 0233 1] Length : 20
An admin can provide a range of acpi-generic-initiator objects, each
associating a device (by providing the id through pci-dev argument)
to the desired NUMA node (using the node argument). Currently, only PCI
device is supported.
For the grace hopper system, create a range of 8 nodes and associate that
with the device using the acpi-generic-initiator object. While a configuration
of less than 8 nodes per device is allowed, such configuration will prevent
utilization of the feature to the fullest. The following sample creates 8
nodes per PCI device for a VM with 2 PCI devices and link them to the
respecitve PCI device using acpi-generic-initiator objects:
-numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \
-numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \
-numa node,nodeid=8 -numa node,nodeid=9 \
-device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \
-object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \
-object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \
-object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \
-object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \
-object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \
-object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \
-object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \
-object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \
-numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \
-numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \
-numa node,nodeid=16 -numa node,nodeid=17 \
-device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \
-object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \
-object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \
-object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \
-object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \
-object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \
-object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \
-object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \
-object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \
Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1]
Cc: Jonathan Cameron <qemu-devel@nongnu.org>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
Message-Id: <20240308145525.10886-2-ankita@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
RISC-V should also generate the SPCR in a manner similar to ARM.
Therefore, instead of replicating the code, relocate this function
to the common AML build.
Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-ID: <20240129021440.17640-2-jeeheng.sia@starfivetech.com>
[ Changes by AF:
- Add missing Language SPCR entry
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
X86 CPU hotplug isn't specific to PC machines.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Luc Michel <luc.michel@amd.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
cpu_hotplug.c only needs the PCI *device* definitions.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Luc Michel <luc.michel@amd.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
CONFIG_ALL is tricky to use and was ported over to Meson from the
recursive processing of Makefile variables. Meson sourcesets
however have all_sources() and all_dependencies() methods that
remove the need for it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If a vcpu with an apic-id that is not supported by the legacy
interface (>255) is hot-plugged, the legacy code will dynamically switch
to the modern interface. However, the hotplug event is not forwarded to
the new interface resulting in the vcpu not being fully/properly added
to the machine config. This BUG is evidenced by OVMF when it
it attempts to count the vcpus and reports an inconsistent vcpu count
reported by the fw_cfg interface and the modern hotpug interface.
Fix is to propagate the hotplug event after making the switch from
the legacy interface to the modern interface.
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Aaron Young <aaron.young@oracle.com>
Message-Id: <0e8a9baebbb29f2a6c87fd08e43dc2ac4019759a.1702398644.git.Aaron.Young@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
erst_realizefn() passes @errp to functions without checking for
failure. If it runs into another failure, it trips error_setv()'s
assertion.
Use the ERRP_GUARD() macro and check *errp, as suggested in commit
ae7c80a7bd ("error: New macro ERRP_GUARD()").
Cc: qemu-stable@nongnu.org
Fixes: f7e26ffa59 ("ACPI ERST: support for ACPI ERST feature")
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20231120130017.81286-1-philmd@linaro.org>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
infrastructure for vhost-vdpa shadow work
piix south bridge rework
reconnect for vhost-user-scsi
dummy ACPI QTG DSM for cxl
tests, cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
=ILek
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: features, cleanups
infrastructure for vhost-vdpa shadow work
piix south bridge rework
reconnect for vhost-user-scsi
dummy ACPI QTG DSM for cxl
tests, cleanups, fixes all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU
# pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd
# 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT
# ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV
# 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa
# zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg=
# =ILek
# -----END PGP SIGNATURE-----
# gpg: Signature made Sun 22 Oct 2023 02:18:43 PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (62 commits)
intel-iommu: Report interrupt remapping faults, fix return value
MAINTAINERS: Add include/hw/intc/i8259.h to the PC chip section
vhost-user: Fix protocol feature bit conflict
tests/acpi: Update DSDT.cxl with QTG DSM
hw/cxl: Add QTG _DSM support for ACPI0017 device
tests/acpi: Allow update of DSDT.cxl
hw/i386/cxl: ensure maxram is greater than ram size for calculating cxl range
vhost-user: fix lost reconnect
vhost-user-scsi: start vhost when guest kicks
vhost-user-scsi: support reconnect to backend
vhost: move and rename the conn retry times
vhost-user-common: send get_inflight_fd once
hw/i386/pc_piix: Make PIIX4 south bridge usable in PC machine
hw/isa/piix: Implement multi-process QEMU support also for PIIX4
hw/isa/piix: Resolve duplicate code regarding PCI interrupt wiring
hw/isa/piix: Reuse PIIX3's PCI interrupt triggering in PIIX4
hw/isa/piix: Rename functions to be shared for PCI interrupt triggering
hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4
hw/isa/piix: Share PIIX3's base class with PIIX4
hw/isa/piix: Harmonize names of reset control memory regions
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Add a simple _DSM call support for the ACPI0017 device to return fake QTG
ID values of 0 and 1 in all cases. This for _DSM plumbing testing from the OS.
Following edited for readability
Device (CXLM)
{
Name (_HID, "ACPI0017") // _HID: Hardware ID
...
Method (_DSM, 4, Serialized) // _DSM: Device-Specific Method
{
If ((Arg0 == ToUUID ("f365f9a6-a7de-4071-a66a-b40c0b4f8e52")))
{
If ((Arg2 == Zero))
{
Return (Buffer (One) { 0x01 })
}
If ((Arg2 == One))
{
Return (Package (0x02)
{
One,
Package (0x02)
{
Zero,
One
}
})
}
}
}
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20231012125623.21101-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fix:
hw/acpi/pcihp.c:499:36: error: declaration shadows a variable in the global scope [-Werror,-Wshadow]
MemoryRegion *address_space_io,
^
include/exec/address-spaces.h:35:21: note: previous declaration is here
extern AddressSpace address_space_io;
^
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20231010115048.11856-5-philmd@linaro.org>
vdpa:
shadow vq vlan support
net migration with cvq
cxl:
support emulating 4 HDM decoders
serial number extended capability
virtio:
hared dma-buf
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmUd4/YPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpyM8H/02cRbJcQOjYt7j68zPW6GaDXxBI/UmdWDyG
15LZZbGNOPjyjNd3Vz1M7stQ5rhoKcgo/RdI+0E60a78svgW5JvpXoXR3pksc3Dx
v28B/akXwHUErYFSZQ+2VHNc8OhCd0v2ehxZxbwPEAYIOAj3hcCIVoPGXTnKJmAJ
imr5hjH0wZUc0+xdsmn8Vfdv5NTzpwfVObbGiMZejeJsaoh0y6Rt8RANBMY67KQD
S7/HPlVuDYf/y43t4ZEHNYuV9RaCdZZYlLWwV1scdKaYcofgmtJOKbOdCjHRXgj+
004Afb3rggIoCfnCzOFzhGx+MLDtLjvEn2N4oLEWCLi+k/3huaA=
=GAvH
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pci: features, cleanups
vdpa:
shadow vq vlan support
net migration with cvq
cxl:
support emulating 4 HDM decoders
serial number extended capability
virtio:
hared dma-buf
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
libvhost-user: handle shared_object msg
vhost-user: add shared_object msg
hw/display: introduce virtio-dmabuf
util/uuid: add a hash function
virtio: remove unused next argument from virtqueue_split_read_next_desc()
virtio: remove unnecessary thread fence while reading next descriptor
virtio: use shadow_avail_idx while checking number of heads
libvhost-user.c: add assertion to vu_message_read_default
pcie_sriov: unregister_vfs(): fix error path
hw/i386/pc: improve physical address space bound check for 32-bit x86 systems
amd_iommu: Fix APIC address check
vdpa net: follow VirtIO initialization properly at cvq isolation probing
vdpa net: stop probing if cannot set features
vdpa net: fix error message setting virtio status
hw/pci-bridge/cxl-upstream: Add serial number extended capability support
hw/cxl: Support 4 HDM decoders at all levels of topology
hw/cxl: Fix and use same calculation for HDM decoder block size everywhere
hw/cxl: Add utility functions decoder interleave ways and target count.
hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c
vdpa net: zero vhost_vdpa iova_tree pointer at cleanup
...
Conflicts:
hw/core/machine.c
Context conflict with commit 314e0a84cd ("hw/core: remove needless
includes") because it removed an adjacent #include.
The bit positions of both registers are related. Tracing the registers
independently results in the same offsets across these registers which
eases debugging.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-9-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-8-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The "hw/boards.h" is unused since the previous commit. Since its removal
requires include fixes in various unrelated files to keep the code compiling it
has been split in a dedicated commit.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-5-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This virtual method was always set to the x86-specific pc_madt_cpu_entry(),
even in piix4 which is also used in MIPS. The previous changes use
pc_madt_cpu_entry() otherwise, so madt_cpu can be dropped.
Since pc_madt_cpu_entry() is now only used in x86-specific code, the stub
in hw/acpi/acpi-x86-stub can be removed as well.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
build_cpus_aml() is architecture independent but needs to create architecture-
specific CPU AML. So far this was achieved by using a virtual method from
TYPE_ACPI_DEVICE_IF. However, build_cpus_aml() would resolve this interface from
global (!) state. This makes it quite incomprehensible where this interface
comes from (TYPE_PIIX4_PM?, TYPE_ICH9_LPC_DEVICE?, TYPE_ACPI_GED_X86?) an can
lead to crashes when the generic code is ported to new architectures.
So far, build_cpus_aml() is only called in architecture-specific code -- and
only in x86. We can therefore simply pass pc_madt_cpu_entry() as callback to
build_cpus_aml(). This is the same callback that would be used through
TYPE_ACPI_DEVICE_IF.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230908084234.17642-3-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Code changes in acpi that addresses all compiler complaints coming from enabling
-Wshadow flags. Enabling -Wshadow catches cases of local variables shadowing
other local variables or parameters. These makes the code confusing and/or adds
bugs that are difficult to catch. See also
Subject: Help wanted for enabling -Wshadow=local
Message-Id: <87r0mqlf9x.fsf@pond.sub.org>
https://lore.kernel.org/qemu-devel/87r0mqlf9x.fsf@pond.sub.org
The code is tested to build with and without the flag turned on.
CC: Markus Armbruster <armbru@redhat.com>
CC: Philippe Mathieu-Daude <philmd@linaro.org>
CC: mst@redhat.com
CC: imammedo@redhat.com
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Message-ID: <20230922124203.127110-1-anisinha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
[Commit message tweaked]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Currently, when using a true R/O NVDIMM (ROM memory backend) with a label
area, the VM can easily crash QEMU by trying to write to the label area,
because the ROM memory is mmap'ed without PROT_WRITE.
[root@vm-0 ~]# ndctl disable-region region0
disabled 1 region
[root@vm-0 ~]# ndctl zero-labels nmem0
-> QEMU segfaults
Let's remember whether we have a ROM memory backend and properly
reject the write request:
[root@vm-0 ~]# ndctl disable-region region0
disabled 1 region
[root@vm-0 ~]# ndctl zero-labels nmem0
zeroed 0 nmem
In comparison, on a system with a R/W NVDIMM:
[root@vm-0 ~]# ndctl disable-region region0
disabled 1 region
[root@vm-0 ~]# ndctl zero-labels nmem0
zeroed 1 nmem
For ACPI, just return "unsupported", like if no label exists. For spapr,
return "H_P2", similar to when no label area exists.
Could we rely on the "unarmed" property? Maybe, but it looks cleaner to
only disallow what certainly cannot work.
After all "unarmed=on" primarily means: cannot accept persistent writes. In
theory, there might be setups where devices with "unarmed=on" set could
be used to host non-persistent data (temporary files, system RAM, ...); for
example, in Linux, admins can overwrite the "readonly" setting and still
write to the device -- which will work as long as we're not using ROM.
Allowing writing label data in such configurations can make sense.
Message-ID: <20230906120503.359863-2-david@redhat.com>
Fixes: dbd730e859 ("nvdimm: check -object memory-backend-file, readonly=on option")
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
On pegasos2 which has ACPI as part of VT8231 south bridge the board
firmware writes PM control register by accessing the second byte so
addr will be 1. This wasn't handled correctly and the write went to
addr 0 instead. Remove the acpi_pm1_cnt_write() function which is used
only once and does not take addr into account and handle non-zero
address in acpi_pm_cnt_{read|write}. This fixes ACPI shutdown with
pegasos2 firmware.
The issue below is possibly related to the same memory core bug.
Link: https://gitlab.com/qemu-project/qemu/-/issues/360
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20230607200125.A9988746377@zero.eik.bme.hu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.
Mechanical change doing:
$ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>