PCIDeviceClass and PCIDevice are defined in pci.h. Many users of the
header don't actually need them. Similar structs live in their own
headers: PCIBusClass and PCIBus in pci_bus.h, PCIBridge in
pci_bridge.h, PCIHostBridgeClass and PCIHostState in pci_host.h,
PCIExpressHost in pcie_host.h, and PCIERootPortClass, PCIEPort, and
PCIESlot in pcie_port.h.
Move PCIDeviceClass and PCIDeviceClass to new pci_device.h, along with
the code that needs them. Adjust include directives.
This also enables the next commit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221222100330.380143-6-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
hw/pci/pci_bridge.h and hw/cxl/cxl.h include each other.
Fortunately, breaking the loop is merely a matter of deleting
unnecessary includes from headers, and adding them back in places
where they are now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221222100330.380143-2-armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that all ACPI controllers select the ACPI and APM dependencies
themselves, these explicit dependencies became redundant. Remove them.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20221216130355.41667-7-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
TYPE_PIIX4_PM is only used in machines where PIIX chipsets are used
which is currently PC and Malta. There is no point building it for the
other ACPI_X86 machines.
Note that this also removes unneeded ACPI_PIIX4 from PEGASOS2.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20221216130355.41667-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
-machine kernel-irqchip=off is broken for many guest OSes; kernel-irqchip=split
is the replacement that works, so remove the deprecated support for the former.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
and use cast to TYPE_PCI_BRIDGE instead.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221129101341.185621-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
The TCO watchdog implementation default behaviour from POV of the
guest OS relies on the initial values for two I/O ports:
* TCO1_CNT == 0x0
Since bit 11 (TCO Timer Halt) is clear, the watchdog state
is considered to be initially running
* GCS == 0x20
Since bit 5 (No Reboot) is set, the watchdog will not trigger
when the timer expires
This is a safe default, because the No Reboot bit will prevent the
watchdog from triggering if the guest OS is unaware of its existance,
or is slow in configuring it. When a Linux guest initializes the TCO
watchdog, it will attempt to clear the "No Reboot" flag, and read the
value back. If the clear was honoured, the driver will treat this as
an indicator that the watchdog is functional and create the guest
watchdog device.
QEMU implements a second "no reboot" flag, however, via pin straps
which overrides the behaviour of the guest controlled "no reboot"
flag:
commit 5add35bec1
Author: Paulo Alcantara <pcacjr@gmail.com>
Date: Sun Jun 28 14:58:58 2015 -0300
ich9: implement strap SPKR pin logic
This second 'noreboot' pin was defaulted to high, which also inhibits
triggering of the requested watchdog actions, unless QEMU is launched
with the magic flag "-global ICH9-LPC.noreboot=false".
This is a bad default as we are exposing a watchdog to every guest OS
using the q35 machine type, but preventing it from actually doing what
it is designed to do. What is worse is that the guest OS and its apps
have no way to know that the watchdog is never going to fire, due to
this second 'noreboot' pin.
If a guest OS had no watchdog device at all, then apps whose operation
and/or data integrity relies on a watchdog can refuse to launch, and
alert the administrator of the problematic deployment. With Q35 machines
unconditionally exposing a watchdog though, apps will think their
deployment is correct but in fact have no protection at all.
This patch flips the default of the second 'no reboot' flag, so that
configured watchdog actions will be honoured out of the box for the
7.2 Q35 machine type onwards, if the guest enables use of the watchdog.
See also related bug reports
https://bugzilla.redhat.com/show_bug.cgi?id=2080207https://bugzilla.redhat.com/show_bug.cgi?id=2136889https://bugzilla.redhat.com/show_bug.cgi?id=2137346
Reviewed-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20221216125749.596075-5-berrange@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We can move setting default_cpu_version into the base machine options,
and we need to unset alias and is_default only once.
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221212152145.124317-3-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Add 8.0 machine types for arm/i440fx/m68k/q35/s390x/spapr.
Reviewed-by: Cédric Le Goater <clg@kaod.org> [ppc]
Reviewed-by: Thomas Huth <thuth@redhat.com> [s390x]
Reviewed-by: Greg Kurz <groug@kaod.org> [ppc]
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20221212152145.124317-2-cohuck@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
The legacy function qdev_reset_all() performs a recursive reset,
starting from a qdev. However, it does not permit any of the devices
in the tree to use three-phase reset, because device reset goes
through the device_legacy_reset() function that only calls the single
DeviceClass::reset method.
Switch to using the device_cold_reset() function instead. This also
performs a recursive reset, where first the children are reset and
then finally the parent, but it uses the new (...in 2020...)
Resettable mechanism, which supports both the old style single-reset
method and also the new 3-phase reset handling.
This commit changes the five remaining uses of this function.
Commit created with:
sed -i -e 's/qdev_reset_all/device_cold_reset/g' hw/i386/xen/xen_platform.c hw/input/adb.c hw/remote/vfio-user-obj.c hw/s390x/s390-virtio-ccw.c hw/usb/dev-uas.c
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit 012d4c96e2 changed the visitor functions taking Error ** to
return bool instead of void, and the commits following it used the new
return value to simplify error checking. Since then a few more uses
in need of the same treatment crept in. Do that. All pretty
mechanical except for
* balloon_stats_get_all()
This is basically the same transformation commit 012d4c96e2 applied
to the virtual walk example in include/qapi/visitor.h.
* set_max_queue_size()
Additionally replace "goto end of function" by return.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221121085054.683122-10-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
=zTCn
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
pci,pc,virtio: features, tests, fixes, cleanups
lots of acpi rework
first version of biosbits infrastructure
ASID support in vhost-vdpa
core_count2 support in smbios
PCIe DOE emulation
virtio vq reset
HMAT support
part of infrastructure for viommu support in vhost-vdpa
VTD PASID support
fixes, tests all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmNpXDkPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpD0AH/2G8ZPrgrxJC9y3uD5/5J6QRzO+TsDYbg5ut
# uBf4rKSHHzcu6zdyAfsrhbAKKzyD4HrEGNXZrBjnKM1xCiB/SGBcDIWntwrca2+s
# 5Dpbi4xvd4tg6tVD4b47XNDCcn2uUbeI0e2M5QIbtCmzdi/xKbFAfl5G8DQp431X
# Kmz79G4CdKWyjVlM0HoYmdCw/4FxkdjD02tE/Uc5YMrePNaEg5Bw4hjCHbx1b6ur
# 6gjeXAtncm9s4sO0l+sIdyiqlxiTry9FSr35WaQ0qPU+Og5zaf1EiWfdl8TRo4qU
# EAATw5A4hyw11GfOGp7oOVkTGvcNB/H7aIxD7emdWZV8+BMRPKo=
# =zTCn
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 07 Nov 2022 14:27:53 EST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (83 commits)
checkpatch: better pattern for inline comments
hw/virtio: introduce virtio_device_should_start
tests/acpi: update tables for new core count test
bios-tables-test: add test for number of cores > 255
tests/acpi: allow changes for core_count2 test
bios-tables-test: teach test to use smbios 3.0 tables
hw/smbios: add core_count2 to smbios table type 4
vhost-user: Support vhost_dev_start
vhost: Change the sequence of device start
intel-iommu: PASID support
intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a function
intel-iommu: drop VTDBus
intel-iommu: don't warn guest errors when getting rid2pasid entry
vfio: move implement of vfio_get_xlat_addr() to memory.c
tests: virt: Update expected *.acpihmatvirt tables
tests: acpi: aarch64/virt: add a test for hmat nodes with no initiators
hw/arm/virt: Enable HMAT on arm virt machine
tests: Add HMAT AArch64/virt empty table files
tests: acpi: q35: update expected blobs *.hmat-noinitiators expected HMAT:
tests: acpi: q35: add test for hmat nodes without initiators
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch introduce ECAP_PASID via "x-pasid-mode". Based on the
existing support for scalable mode, we need to implement the following
missing parts:
1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation
with PASID
2) tag IOTLB with PASID
3) PASID cache and its flush
4) PASID based IOTLB invalidation
For simplicity PASID cache is not implemented so we can simply
implement the PASID cache flush as a no and leave it to be implemented
in the future. For PASID based IOTLB invalidation, since we haven't
had L1 stage support, the PASID based IOTLB invalidation is not
implemented yet. For PASID based device IOTLB invalidation, it
requires the support for vhost so we forbid enabling device IOTLB when
PASID is enabled now. Those work could be done in the future.
Note that though PASID based IOMMU translation is ready but no device
can issue PASID DMA right now. In this case, PCI_NO_PASID is used as
PASID to identify the address without PASID. vtd_find_add_as() has
been extended to provision address space with PASID which could be
utilized by the future extension of PCI core to allow device model to
use PASID based DMA translation.
This feature would be useful for:
1) prototyping PASID support for devices like virtio
2) future vPASID work
3) future PRS and vSVA work
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221028061436.30093-5-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an
internal goto which prevents it from being reused. This patch convert
that macro to a dedicated function and let the caller to decide what
to do (e.g using goto or not). This makes sure it can be re-used for
other function that requires fault reporting.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221028061436.30093-4-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
We introduce VTDBus structure as an intermediate step for searching
the address space. This works well with SID based matching/lookup. But
when we want to support SID plus PASID based address space lookup,
this intermediate steps turns out to be a burden. So the patch simply
drops the VTDBus structure and use the PCIBus and devfn as the key for
the g_hash_table(). This simplifies the codes and the future PASID
extension.
To prevent being slower for past vtd_find_as_from_bus_num() callers, a
vtd_as cache indexed by the bus number is introduced to store the last
recent search result of a vtd_as belongs to a specific bus.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221028061436.30093-3-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
We use to warn on wrong rid2pasid entry. But this error could be
triggered by the guest and could happens during initialization. So
let's don't warn in this case.
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20221028061436.30093-2-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
The code currently assumes Q35 iff ICH9 and i440fx iff PIIX. Now that more
AML generation has been moved into the south bridges and since the
machines define themselves primarily through their north bridges, let's
switch to resolving the north bridges for AML generation instead. This
also allows for easier experimentation with different south bridges in
the "pc" machine, e.g. with PIIX4 and VT82xx.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Message-Id: <20221028103419.93398-4-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The is_piix4 attribute is set once in one location and read once in
another. Doing both in one location allows for removing the attribute
altogether.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221026133110.91828-3-shentey@gmail.com>
Message-Id: <20221028103419.93398-3-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ammends commit b23046abe7 'pc: acpi-build:
simplify PCI bus tree generation'.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221026133110.91828-2-shentey@gmail.com>
Message-Id: <20221028103419.93398-2-shentey@gmail.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Early-boot e820 records will be inserted by the bios/efi/early boot
software and be reported to the kernel via insert_resource. Later, when
CXL drivers iterate through the regions again, they will insert another
resource and make the RESERVED memory area a child.
This RESERVED memory area causes the memory region to become unusable,
and as a result attempting to create memory regions with
`cxl create-region ...`
Will fail due to the RESERVED area intersecting with the CXL window.
During boot the following traceback is observed:
0xffffffff81101650 in insert_resource_expand_to_fit ()
0xffffffff83d964c5 in e820__reserve_resources_late ()
0xffffffff83e03210 in pcibios_resource_survey ()
0xffffffff83e04f4a in pcibios_init ()
Which produces a call to reserve the CFMWS area:
(gdb) p *new
$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved",
flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0,
child = 0x0}
Later the Kernel parses ACPI tables and reserves the exact same area as
the CXL Fixed Memory Window:
0xffffffff811016a4 in insert_resource_conflict ()
insert_resource ()
0xffffffff81a81389 in cxl_parse_cfmws ()
0xffffffff818c4a81 in call_handler ()
acpi_parse_entries_array ()
(gdb) p/x *new
$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0",
flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0,
child = 0x0}
This produces the following output in /proc/iomem:
590000000-68fffffff : CXL Window 0
590000000-68fffffff : Reserved
This reserved area causes `get_free_mem_region()` to fail due to a check
against `__region_intersects()`. Due to this reserved area, the
intersect check will only ever return REGION_INTERSECTS, which causes
`cxl create-region` to always fail.
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Message-Id: <20221026205912.8579-1-gregory.price@memverge.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Move _GPE block declaration before it gets referenced by other
hotplug handlers. While at it move PCI hotplug (_E01) handler
after PCI tree description to avoid forward reference to
to not yet declared methods/devices.
PS:
Forward 'usage' usualy is fine as long as it's hidden within
method, however 'iasl' may print warnings. So be nice
to iasl/guest OS and do things in proper order.
PS2: Also follow up patches will move some of hotplug code
from PCI tree to _E01 and that also requires PCI Device
nodes build first, before Scope can reuse that from
global context.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20221017102146.2254096-11-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
to make that happen (bridge sits at _ADR: 0x001F0003),
relax PCI enumeration logic to include devices with *function* > 0
if device has something to say about itself (i.e. has build_dev_aml
callback set).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20221017102146.2254096-8-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
PCI-ISA bridges that are built in PIIX/Q35 are building its own AML
using AcpiDevAmlIf interface. Now build_append_pci_bus_devices()
gained AcpiDevAmlIf interface support to get AML of devices atached
to PCI slots.
So drop ad-hoc build_q35_isa_bridge()/build_piix4_isa_bridge()
and let PCI bus enumeration to include PCI-ISA bridge AML
when it's enumerated by build_append_pci_bus_devices().
AML change is mostly contextual, which moves whole ISA hierarchy
directly under PCI host bridge instead of it being described
as separate \SB.PCI0.ISA block.
Note:
If bus/slot that hosts ISA bridge has BSEL set, it will gain new
ASUN and _DMS entries (i.e. acpi-index support, but it should not
cause any functional change and that is fine from PCI Firmware
spec point of view), potentially it's possible to suppress that
by adding a flag to PCIDevice but I don't see a reason to do that
yet, I'd rather treat bridge just as any other PCI device if it's
possible.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20221017102146.2254096-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
NB:
We do not expect any functional change in any ACPI tables with this
change. It's only a refactoring.
NB2:
Some targets (or1k) do not support acpi and CONFIG_ACPI is off for them.
However, modules are reused between all architectures so CONFIG_ACPI is
on. For those architectures, dummy stub function definitions help to
resolve symbols. This change uses more of these and so it adds a couple
of dummy stub definitions so that symbols for those can be resolved.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20221017102146.2254096-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
CC: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20221107152744.868434-1-ani@anisinha.ca>
e820 reserved entries were used before the dynamic entries with fw config files
were intoduced. Please see the following change:
7d67110f2d9a6("pc: add etc/e820 fw_cfg file")
Identical support was introduced into seabios as well with the following commit:
ce39bd4031820 ("Add support for etc/e820 fw_cfg file")
Both the above commits are now quite old. QEMU machines 1.7 and newer no longer
use the reserved entries. Seabios uses fw config files and
dynamic e820 entries by default and only falls back to using reserved entries
when it has to work with old qemu (versions earlier than 1.7). Please see
functions qemu_cfg_e820() and qemu_early_e820(). It is safe to remove legacy
FW_CFG_E820_TABLE and associated code now as QEMU 7.0 has deprecated i440fx
machines 1.7 and older. It would be incredibly rare to run the latest qemu
version with a very old version of seabios that did not support fw config files
for e820.
As far as I could see, edk2/ovfm never supported reserved entries and uses fw
config files from the beginning. So there should be no incompatibilities with
ovfm as well.
CC: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220831045311.33083-1-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Suggested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221022150508.26830-9-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Just like in the real hardware (and in PIIX4), create the DMA
controllers in the south bridges.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20221022150508.26830-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Specify maximum possible APIC ID assigned for current VM session to KVM
prior to the creation of vCPUs. By this setting, KVM can set up VM-scoped
data structure indexed by the APIC ID, e.g. Posted-Interrupt Descriptor
pointer table to support Intel IPI virtualization, with the most optimal
memory footprint.
It can be achieved by calling KVM_ENABLE_CAP for KVM_CAP_MAX_VCPU_ID
capability once KVM has enabled it. Ignoring the return error if KVM
doesn't support this capability yet.
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220825025246.26618-1-guang.zeng@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Snapshot loading is supposed to be deterministic, so we shouldn't
re-randomize the various seeds used.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-4-Jason@zx2c4.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Snapshot loading only expects to call deterministic handlers, not
non-deterministic ones. So introduce a way of registering handlers that
won't be called when reseting for snapshots.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-id: 20221025004327.568476-2-Jason@zx2c4.com
[PMM: updated json doc comment with Markus' text; fixed
checkpatch style nit]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Resetting a guest that has Hyper-V VMBus support enabled triggers a QEMU
assertion failure:
hw/hyperv/hyperv.c:131: synic_reset: Assertion `QLIST_EMPTY(&synic->sint_routes)' failed.
This happens both on normal guest reboot or when using "system_reset" HMP
command.
The failing assertion was introduced by commit 64ddecc88b ("hyperv: SControl is optional to enable SynIc")
to catch dangling SINT routes on SynIC reset.
The root cause of this problem is that the SynIC itself is reset before
devices using SINT routes have chance to clean up these routes.
Since there seems to be no existing mechanism to force reset callbacks (or
methods) to be executed in specific order let's use a similar method that
is already used to reset another interrupt controller (APIC) after devices
have been reset - by invoking the SynIC reset from the machine reset
handler via a new x86_cpu_after_reset() function co-located with
the existing x86_cpu_reset() in target/i386/cpu.c.
Opportunistically move the APIC reset handler there, too.
Fixes: 64ddecc88b ("hyperv: SControl is optional to enable SynIc") # exposed the bug
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Message-Id: <cb57cee2e29b20d06f81dce054cbcea8b5d497e8.1664552976.git.maciej.szmigiero@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
No functional changes nor AML bytecode changes.
Consolidate code that generates empty and populated slot
descriptors. Besides eliminating duplication,
it helps consolidate conditions for generating
parts of Device{} desriptor in one place, which makes
code more compact and easier to read.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-18-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
no functional change, align order of fields in empty slot
descriptor with a populated slot ordering.
Expected diff:
- Name (_SUN, 0x0X) // _SUN: Slot User Number
Name (_ADR, 0xY) // _ADR: Address
...
+ Name (_SUN, 0xX) // _SUN: Slot User Number
that will eliminate contextual changes (causing test failures)
when follow up patches merge code generating populated and empty
slots descriptors.
Put mandatory _ADR as the 1st field, then ASUN as it can be
present for both pupulated and empty slots and only then _SUN
which is present only when slot is hotpluggable.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-13-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
add ASUN variable to hotpluggable slots and use it
instead of _SUN which has the same value to reuse
_DMS code on both branches (hot- and non-hotpluggable).
No functional change.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-10-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Numer of possible arguments to pass to a method is limited
in ACPI. The following patches will need to pass over more
parameters to PDSM method, will hit that limit.
Prepare for this by passing structure (Package) to method,
which let us workaround arguments limitation.
Pass to PDSM all standard arguments of _DSM as is, and
pack custom parameters into Package that is passed as
the last argument to PDSM.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-7-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.., it will help with code readability and make easier
to extend method in followup patches
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-6-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
HPET AML doesn't depend on piix4 nor q35, move code buiding it
to common scope to avoid duplication.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220701133515.137890-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It's true that when vcpus<=255 we don't require the length of 32bit APIC
IDs. However here since we already have EIM=ON it means the hypervisor
will declare the VM as x2apic supported (e.g. VT-d ECAP register will have
EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width
even if vcpus<=255. In short, commit 77250171bd breaks any simple cmdline
that wants to boot a VM with >=9 but <=255 vcpus with:
-device intel-iommu,intremap=on
For anyone who does not want to enable x2apic, we can use eim=off in the
intel-iommu parameters to skip enabling KVM x2apic.
This partly reverts commit 77250171bd, while
keeping the valid bit on checking split irqchip, but revert the other change.
One thing to mention is that this patch may break migration compatibility
of such VM, however that's probably the best thing we can do, because the
old behavior was simply wrong and not working for >8 vcpus. For <=8 vcpus,
there could be a light guest ABI change (by enabling KVM x2apic after this
patch), but logically it shouldn't affect the migration from working.
Also, this is not the 1st commit to change x2apic behavior. Igor provided
a full history of how this evolved for the past few years:
https://lore.kernel.org/qemu-devel/20220922154617.57d1a1fb@redhat.com/
Relevant commits for reference:
fb506e701e ("intel_iommu: reject broken EIM", 2016-10-17)
c1bb5418e3 ("target/i386: Support up to 32768 CPUs without IRQ remapping", 2020-12-10)
77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks", 2022-05-16)
dc89f32d92 ("target/i386: Fix sanity check on max APIC ID / X2APIC enablement", 2022-05-16)
We may want to have this for stable too (mostly for 7.1.0 only). Adding a
fixes tag.
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Claudio Fontana <cfontana@suse.de>
Cc: Igor Mammedov <imammedo@redhat.com>
Fixes: 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks")
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220926153206.10881-1-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
We don't want it to be possible to re-read the RNG seed after ingesting
it, because this ruins forward secrecy. Currently, however, the setup
data section can just be re-read. Since the kernel is always read after
the setup data, use the selection of the kernel as a trigger to
re-initialize the RNG seed, just like we do on reboot, to preserve
forward secrecy.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220922152847.3670513-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts 3824e25db1 ("x86: disable rng seeding via setup_data"), but
for 7.2 rather than 7.1, now that modifying setup_data is safe to do.
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-4-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since this is read from fw_cfg on each boot, the kernel zeroing it out
alone is insufficient to prevent it from being used twice. And indeed on
reboot we always want a new seed, not the old one. So re-fill it in this
circumstance.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-3-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The preferred style is SetupData as a typedef, not setup_data as a plain
struct.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-2-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If setup_data is being read into a specific memory location, then
generally the setup_data address parameter is read first, so that the
caller knows where to read it into. In that case, we should return
setup_data containing the absolute addresses that are hard coded and
determined a priori. This is the case when kernels are loaded by BIOS,
for example. In contrast, when setup_data is read as a file, then we
shouldn't modify setup_data, since the absolute address will be wrong by
definition. This is the case when OVMF loads the image.
This allows setup_data to be used like normal, without crashing when EFI
tries to use it.
(As a small development note, strangely, fw_cfg_add_file_callback() was
exported but fw_cfg_add_bytes_callback() wasn't, so this makes that
consistent.)
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Laurent Vivier <laurent@vivier.eu>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220921093134.2936487-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use autofree heap allocation instead of variable-length array on
the stack. Replace the snprintf() call by g_strdup_printf().
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220819153931.3147384-9-peter.maydell@linaro.org
It's convenient to call iova_tree_remove from a map returned from
iova_tree_find or iova_tree_find_iova. With the current code this is not
possible, since we will free it, and then we will try to search for it
again.
Fix it making accepting the map by value, forcing a copy of the
argument. Not applying a fixes tag, since there is no use like that at
the moment.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Causes regressions when doing direct kernel boots with OVMF.
At this point in the release cycle the only sensible action
is to just disable this for 7.1 and sort it properly in the
7.2 devel cycle.
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220817083940.3174933-1-kraxel@redhat.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
The added enforcing is only relevant in the case of AMD where the
range right before the 1TB is restricted and cannot be DMA mapped
by the kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST
or possibly other kinds of IOMMU events in the AMD IOMMU.
Although, there's a case where it may make sense to disable the
IOVA relocation/validation when migrating from a
non-amd-1tb-aware qemu to one that supports it.
Relocating RAM regions to after the 1Tb hole has consequences for
guest ABI because we are changing the memory mapping, so make
sure that only new machine enforce but not older ones.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-12-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It is assumed that the whole GPA space is available to be DMA
addressable, within a given address space limit, except for a
tiny region before the 4G. Since Linux v5.4, VFIO validates
whether the selected GPA is indeed valid i.e. not reserved by
IOMMU on behalf of some specific devices or platform-defined
restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
-EINVAL.
AMD systems with an IOMMU are examples of such platforms and
particularly may only have these ranges as allowed:
0000000000000000 - 00000000fedfffff (0 .. 3.982G)
00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
0000010000000000 - ffffffffffffffff (1Tb .. 16Pb[*])
We already account for the 4G hole, albeit if the guest is big
enough we will fail to allocate a guest with >1010G due to the
~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).
[*] there is another reserved region unrelated to HT that exists
in the 256T boundary in Fam 17h according to Errata #1286,
documeted also in "Open-Source Register Reference for AMD Family
17h Processors (PUB)"
When creating the region above 4G, take into account that on AMD
platforms the HyperTransport range is reserved and hence it
cannot be used either as GPAs. On those cases rather than
establishing the start of ram-above-4g to be 4G, relocate instead
to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
Topology", for more information on the underlying restriction of
IOVAs.
After accounting for the 1Tb hole on AMD hosts, mtree should
look like:
0000000000000000-000000007fffffff (prio 0, i/o):
alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff
0000010000000000-000001ff7fffffff (prio 0, i/o):
alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff
If the relocation is done or the address space covers it, we
also add the the reserved HT e820 range as reserved.
Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
to address 1Tb (0xff ffff ffff). On AMD platforms, if a
ram-above-4g relocation is attempted and the CPU wasn't configured
with a big enough phys-bits, an error message will be printed
due to the maxphysaddr vs maxusedaddr check previously added.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-11-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Calculate max *used* GPA against the CPU maximum possible address
and error out if the former surprasses the latter. This ensures
max used GPA is reacheable by configured phys-bits. Default phys-bits
on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough for the CPU to
address 1Tb (0xff ffff ffff) or 1010G (0xfc ffff ffff) in AMD hosts
with IOMMU.
This is preparation for AMD guests with >1010G, where it will want relocate
ram-above-4g to be after 1Tb instead of 4G.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-10-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move obtaining hole64_start from device_memory memory region base/size
into an helper alongside correspondent getters in pc_memory_init() when
the hotplug range is unitialized. While doing that remove the memory
region based logic from this newly added helper.
This is the final step that allows pc_pci_hole64_start() to be callable
at the beginning of pc_memory_init() before any memory regions are
initialized.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-9-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Remove pc_get_cxl_range_end() dependency on the CXL memory region,
and replace with one that does not require the CXL host_mr to determine
the start of CXL start.
This in preparation to allow pc_pci_hole64_start() to be called early
in pc_memory_init(), handle CXL memory region end when its underlying
memory region isn't yet initialized.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Message-Id: <20220719170014.27028-8-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Factor out the calculation of the base address of the memory region.
It will be used later on for the cxl range end counterpart calculation
and as well in pc_memory_init() CXL memory region initialization, thus
avoiding duplication.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-7-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move calculation of CXL memory region end to separate helper.
This is in preparation to a future change that removes CXL range
dependency on the CXL memory region, with the goal of allowing
pc_pci_hole64_start() to be called before any memory region are
initialized.
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-6-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There's a couple of places that seem to duplicate this calculation
of RAM size above the 4G boundary. Move all those to a helper function.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-5-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use the pre-initialized pci-host qdev and fetch the
pci-hole64-size into pc_memory_init() newly added argument.
Use PCI_HOST_PROP_PCI_HOLE64_SIZE pci-host property for
fetching pci-hole64-size.
This is in preparation to determine that host-phys-bits are
enough and for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-4-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
At the start of pc_memory_init() we usually pass a range of
0..UINT64_MAX as pci_memory, when really its 2G (i440fx) or
32G (q35). To get the real user value, we need to get pci-host
passed property for default pci_hole64_size. Thus to get that,
create the qdev prior to memory init to better make estimations
on max used/phys addr.
This is in preparation to determine that host-phys-bits are
enough and also for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-3-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.
This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-2-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Previously broken_reserved_end was taken into account, but Igor Mammedov
identified that this could lead to a clash between potential RAM being
mapped in the region and CXL usage. Hence always add the size of the
device_memory memory region. This only affects the case where the
broken_reserved_end flag was set.
Fixes: 6e4e3ae936 ("hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220701132300.2264-3-Jonathan.Cameron@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (≥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.
At Paolo's request, we don't pass these to versioned machine types ≤7.0.
Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220721125636.446842-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The pcie host bridge has no io window on microvm,
so io reservations will not work.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220701091516.43489-1-kraxel@redhat.com>
xen_piix_pci_write_config_client() is implemented in the xen sub tree and
uses PIIX constants internally, thus creating a direct dependency on
PIIX. Now that xen_set_pci_link_route() is stubbable, the logic of
xen_piix_pci_write_config_client() can be moved to PIIX which resolves
the dependency.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20220626094656.15673-3-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The only user of xen_set_pci_link_route() is
xen_piix_pci_write_config_client() which implements PIIX-specific logic in
the xen namespace. This makes xen-hvm depend on PIIX which could be
avoided if xen_piix_pci_write_config_client() was implemented in PIIX. In
order to do this, xen_set_pci_link_route() needs to be stubbable which
this patch addresses.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20220626094656.15673-2-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
pi440fx_state is an out-parameter which is never read by the caller.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220612192800.40813-1-shentey@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220520180109.8224-5-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
The macro seems to be used only internally, so remove it.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220520180109.8224-4-shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
TYPE_I8042 is exported, so reuse it for consistency.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220520180109.8224-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Exposing the io_base offset as a QOM property not only allows it to be
configurable but also to be displayed in HMP:
Before:
(qemu) info qtree
...
dev: mc146818rtc, id ""
gpio-out "" 1
base_year = 0 (0x0)
irq = 8 (0x8)
lost_tick_policy = "discard"
After:
dev: mc146818rtc, id ""
gpio-out "" 1
base_year = 0 (0x0)
iobase = 112 (0x70)
irq = 8 (0x8)
lost_tick_policy = "discard"
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220529184006.10712-4-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Since commit 3b004a1654 'hw/rtc/
mc146818rtc: QOM'ify IRQ number' mc146818rtc's IRQ number is
configurable. Fix microvm-dt to respect its value.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220529184006.10712-3-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
New code will be added where this is best practice. So update existing code
as well.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220529184006.10712-2-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
During the previous changesets piix3_create() became a trivial
wrapper around more generic functions. Modernize the code.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220603185045.143789-12-shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Modernizes the code.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20220603185045.143789-11-shentey@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Now that all external logic has been removed from piix4_pm_initfn() the PIIX4_PM
device can be instantiated directly.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-11-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Initialize the SMI IRQ in piix4_pm_init().
The smi_irq can now be wired up directly using a qdev gpio instead
of having to set the IRQ externally in piix4_pm_initfn().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-10-mark.cave-ayland@ilande.co.uk>
[PMD: Partially squash 20220528091934.15520-8-mark.cave-ayland@ilande.co.uk]
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Introduce piix4_pm_init() instance init function and use it to
initialise the separate qdev gpio for the SCI IRQ.
The sci_irq can now be wired up directly using a qdev gpio instead
of having to set the IRQ externally in piix4_pm_initfn().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-9-mark.cave-ayland@ilande.co.uk>
[PMD: Partially squash 20220528091934.15520-8-mark.cave-ayland@ilande.co.uk]
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
When QOMifying a device it is typical to use _init() as the suffix for an
instance_init function, however this name is already in use by the legacy
piix4_pm_init() wrapper function. Eventually the wrapper function will be
removed, but for now rename it to piix4_pm_initfn() to avoid a naming
collision.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-7-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This exposes the PIIX4_PM device to the caller to allow any qdev gpios to be
mapped outside of piix4_pm_init().
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-6-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
This allows the QOM types in hw/acpi/piix4.c to be used elsewhere by simply including
hw/acpi/piix4.h.
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220528091934.15520-5-mark.cave-ayland@ilande.co.uk>
Reviewed-by: Bernhard Beschow <shentey@gmail.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
more CXL patches
VIOT
Igor's huge AML rework
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKj4YcPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpkNcIAKTsMfUVueTjelC2RwIdegQkypycKhCweKzc
QxddaEr0w+N2164byT3IUy9h53hV3qAAmMuGE4d8B2r5rykf+SwDfIeNmHNqntnA
oLraXIxSSAf4/1cTsRCVL/BXo2E9P+WHI3huw37HClmPLdyMjQa1AtpTpKnIsbmO
sBZf7t5yHDJ2WGZwBQ1IbAxvsdGo1fa1TCt1jZ9g1dmnQSTteQG8DHkGoRnkwTi7
510jb0e8uQEgKytCdLTHqESHlfgjvoI73OFOAR2dzTKy6KelFmdLYSo2FtsIdtT5
1fZNaDjtl6zQ4b2iLBgPpHtikKch9BzzhDMbCsq7FpvasZ8u2FE=
=LXG0
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
virtio,pc,pci: fixes,cleanups,features
more CXL patches
VIOT
Igor's huge AML rework
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmKj4YcPHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpkNcIAKTsMfUVueTjelC2RwIdegQkypycKhCweKzc
# QxddaEr0w+N2164byT3IUy9h53hV3qAAmMuGE4d8B2r5rykf+SwDfIeNmHNqntnA
# oLraXIxSSAf4/1cTsRCVL/BXo2E9P+WHI3huw37HClmPLdyMjQa1AtpTpKnIsbmO
# sBZf7t5yHDJ2WGZwBQ1IbAxvsdGo1fa1TCt1jZ9g1dmnQSTteQG8DHkGoRnkwTi7
# 510jb0e8uQEgKytCdLTHqESHlfgjvoI73OFOAR2dzTKy6KelFmdLYSo2FtsIdtT5
# 1fZNaDjtl6zQ4b2iLBgPpHtikKch9BzzhDMbCsq7FpvasZ8u2FE=
# =LXG0
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 10 Jun 2022 05:27:51 PM PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits)
hw/vhost-user-scsi|blk: set `supports_config` flag correctly
hw/virtio/vhost-user: don't use uninitialized variable
tests/acpi: virt: update golden masters for VIOT
hw/acpi/viot: sort VIOT ACPI table entries by PCI host bridge min_bus
tests/acpi: virt: allow VIOT acpi table changes
hw/acpi/viot: build array of PCI host bridges before generating VIOT ACPI table
hw/acpi/viot: move the individual PCI host bridge entry generation to a new function
hw/acpi/viot: rename build_pci_range_node() to enumerate_pci_host_bridges()
hw/cxl: Fix missing write mask for HDM decoder target list registers
pci: fix overflow in snprintf string formatting
hw/machine: Drop cxl_supported flag as no longer useful
hw/cxl: Move the CXLState from MachineState to machine type specific state.
tests/acpi: Update q35/CEDT.cxl for new memory addresses.
pci/pci_expander_bridge: For CXL HB delay the HB register memory region setup.
tests/acpi: Allow modification of q35 CXL CEDT table.
hw/cxl: Push linking of CXL targets into i386/pc rather than in machine.c
hw/acpi/cxl: Pass in the CXLState directly rather than MachineState
hw/cxl: Make the CXL fixed memory window setup a machine parameter.
x86: acpi-build: do not include hw/isa/isa.h directly
tests: acpi: update expected DSDT.tis.tpm2/DSDT.tis.tpm12 blobs
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
As all the CXL elements have moved to boards that support
CXL, there is no need to maintain a top level flag.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-9-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This removes the last of the CXL code from the MachineState where it
is visible to all Machines to only those that support CXL (currently i386/pc)
As i386/pc always support CXL now, stop allocating the state independently.
Note the pxb register hookup code runs even if cxl=off in order to detect
pxb_cxl host bridges and fail to start if any are present as they won't
have the control registers available.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-8-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As the CXLState will no long be accessible via MachineState
at time of PXB_CXL realization, come back later from the machine specific
code to fill in the missing memory region setup. Only at this stage
is it possible to check if cxl=on, so that check is moved to this
later point.
Note that for multiple host bridges, the allocation order of the
register spaces is changed. This will be reflected in ACPI CEDT.
Stubs are added to handle case of CONFIG_PXB=n for machines that
call these functions.
The bus walking logic is common to all machines so add a utility
function + stub to cxl-host*.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Whilst here take the oportunity to shorten the function name.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-4-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Refactoring step on path to moving all CXL state out of
MachineState.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Message-Id: <20220608145440.26106-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Paolo Bonzini requested this change to simplify the ongoing
effort to allow machine setup entirely via RPC.
Includes shortening the command line form cxl-fixed-memory-window
to cxl-fmw as the command lines are extremely long even with this
change.
The json change is needed to ensure that there is
a CXLFixedMemoryWindowOptionsList even though the actual
element in the json is never used. Similar to existing
SgxEpcProperties.
Update qemu-options.hx to reflect that this is now a -machine
parameter. The bulk of -M / -machine parameters are documented
under machine, so use that in preference to M.
Update cxl-test and bios-tables-test to reflect new parameters.
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Message-Id: <20220608145440.26106-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
the last remaining dependency on ISA in acpi-build.c
is iapc_boot_arch_8042() which pulls in in isa.h
in its own header hw/input/i8042.h. Clean up
not longer needed direct inclusion of isa.h in
acpi-build.c
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-36-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.. and clean up not longer needed conditionals in DSTD build code
tpm-tis AML will be fetched and included when ISA bridge will
build its own AML code (including attached devices).
Expected AML change:
the device under separate _SB.PCI0.ISA scope is moved directly
under Device(ISA) node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-34-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-33-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
tpm-tis 2.0, is not a PCI device but ISA one, move it
under ISA scope to fix incorrect placement.
Fixes: 24cf5413aa (acpi: Make TPM 2.0 with TIS available as MSFT0101)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-32-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.. and clean up not longer needed conditionals in DSTD build code
pvpanic-isa AML will be fetched and included when ISA bridge will
build its own AML code (including attached devices).
Expected AML change:
the device under separate _SB.PCI0.ISA scope is moved directly
under Device(ISA) node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-29-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
.. and clean up not longer needed conditionals in DSTD build
code. applesmc AML will be fetched and included when ISA bridge
will build its own AML code (incl. attached devices).
Expected AML change:
the device under separate _SB.PCI0.ISA scope is moved directly
under Device(ISA) node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-25-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
replaces adhoc build_isa_devices_aml() with generic AcpiDevAmlIf
way to build bridge AML including all devices that are attached
to its ISA bus.
Later when PCI is converted to AcpiDevAmlIf, build_q35_isa_bridge()
will also be dropped since PCI parts itself will take care of
building device prologue/epilogue AML for each enumerated PCI device.
Expected AML change is contextual, where ISA devices are moved from
separately declared _SB.PCI0.ISA scope, directly under Device(ISA)
node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-21-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
replaces ad-hoc build_isa_devices_aml() with generic AcpiDevAmlIf
way to build bridge AML including all devices that are attached to
its ISA bus.
Later when PCI is converted to AcpiDevAmlIf, build_piix4_isa_bridge()
will also be dropped since PCI parts itself will take care of
building device prologue/epilogue AML for each enumerated PCI
device.
Expected AML change is contextual, where ISA devices are moved
from separately declared _SB.PCI0.ISA scope , directly under
Device(ISA) node.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-20-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
by default we do not version ACPI AML as it's considered
a part of firmware. Drop do_not_add_smb_acpi that blocked
SMBUS AML description on 3.1 and older machine types without
providing justification.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-18-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
convert ad-hoc way we use to generate AML for ISA/SMB IPMI devices
to a generic approach (i.e. make devices provide its own AML blobs
like it is done with other ISA devices (ex. KBD))
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-17-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
smbus-ipmi AML description needs to specify a path to its parent
node in _CRS. The rest of IPMI inplementations (ISA based)
do not need path at all. Instead of passing through a full path
use relative path to point to smbus-ipmi's parent node, it will
let follow up patches to create IPMI device AML in a generic
way instead of current ad-hoc way. (i.e. AML will be generated
the same way it's done for other ISA device, and smbus will be
converted to generate AML for its slave devices the same way
as ISA)
expected AML change:
Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings
{
I2cSerialBusV2 (0x0000, ControllerInitiated, 0x000186A0,
- AddressingMode7Bit, "\\_SB.PCI0.SMB0",
+ AddressingMode7Bit, "^",
0x00, ResourceProducer, , Exclusive,
)
})
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-14-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Message-Id: <20220608135340.3304695-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Various loader functions return an int which limits images to 2GB which
is fine for things like a BIOS/kernel image, but if we want to be able
to load memory images or large ramdisks then any file over 2GB would
silently fail to load.
Cc: Luc Michel <lmichel@kalray.eu>
Signed-off-by: Jamie Iles <jamie@nuviainc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Luc Michel <lmichel@kalray.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20211111141141.3295094-2-jamie@nuviainc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
This function was declared in a generic and public header, implemented
in a device-specific source file but only used in xen_platform. Given its
'aux' parameter, this function is more xen-specific than piix-specific.
Also, the hardcoded magic constants seem to be generic and related to
PCIIDEState and IDEBus rather than piix.
Therefore, move this function to xen_platform, unexport it, and drop the
"piix3" in the function name as well.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220513180957.90514-4-shentey@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Coverity issues several UNINIT warnings against amd_iommu.c [1]. This
patch fixes them by clearing evt before encoding. On top of it, this
patch changes the event log size to 16 bytes per IOMMU specification,
and fixes the event log entry format in amdvi_encode_event().
[1] CID 1487116/1487200/1487190/1487232/1487115/1487258
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Wei Huang <wei.huang2@amd.com>
Message-Id: <20220422055146.3312226-1-wei.huang2@amd.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Legacy PIC (8259) cannot be supported for TDX guests since TDX module
doesn't allow directly interrupt injection. Using posted interrupts
for the PIC is not a viable option as the guest BIOS/kernel will not
do EOI for PIC IRQs, i.e. will leave the vIRR bit set.
Make PIC the property of common x86 machine type. Hence all x86
machines, including microvm, can disable it.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-3-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Both pc and microvm have pit property individually. Let's just make it
the property of common x86 base machine type.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20220310122811.807794-2-xiaoyao.li@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We need to update iq_dw according to the DMA_IRQ_REG during post
load. Otherwise we may get wrong IOTLB invalidation descriptor after
migration.
Fixes: fb43cf739e ("intel_iommu: scalable mode emulation")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220317080522.14621-2-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We need check whether passthrough is enabled during
vtd_switch_address_space() by checking the context entries. This
requires the root_scalable to be set correctly otherwise we may try to
check legacy rsvd bits instead of scalable ones.
Fixing this by updating root_scalable before switching the address
spaces during post_load.
Fixes: fb43cf739e ("intel_iommu: scalable mode emulation")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220317080522.14621-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
According to vtd spec v3.3 3.14:
"""
Software must not program paging-structure entries to remap any
address to the interrupt address range. Untranslated requests and
translation requests that result in an address in the interrupt range
will be blocked with condition code LGN.4 or SGN.8.
"""
This patch blocks the request that result in interrupt address range.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220210092815.45174-2-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
This fault reason is not used and is duplicated with SPT.2 condition
code. So let's remove it.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220210092815.45174-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
We don't need to check kvm_enable_x2apic(). It's perfectly OK to support
interrupt remapping even if we can't address CPUs above 254. Kind of
pointless, but still functional.
The check on kvm_enable_x2apic() needs to happen *anyway* in order to
allow CPUs above 254 even without an IOMMU, so allow that to happen
elsewhere.
However, we do require the *split* irqchip in order to rewrite I/OAPIC
destinations. So fix that check while we're here.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220314142544.150555-4-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We should probably check if we were meant to be exposing IR, before
letting the guest turn the IRE bit on.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220314142544.150555-3-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
By setting none of the SAGAW bits we can indicate to a guest that DMA
translation isn't supported. Tested by booting Windows 10, as well as
Linux guests with the fix at https://git.kernel.org/torvalds/c/c40aaaac10
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220314142544.150555-2-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The check on x86ms->apic_id_limit in pc_machine_done() had two problems.
Firstly, we need KVM to support the X2APIC API in order to allow IRQ
delivery to APICs >= 255. So we need to call/check kvm_enable_x2apic(),
which was done elsewhere in *some* cases but not all.
Secondly, microvm needs the same check. So move it from pc_machine_done()
to x86_cpus_init() where it will work for both.
The check in kvm_cpu_instance_init() is now redundant and can be dropped.
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Acked-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20220314142544.150555-1-dwmw2@infradead.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add the CFMWs memory regions to the memorymap and adjust the
PCI window to avoid hitting the same memory.
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20220429144110.25167-36-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.
CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.
CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.
There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.
As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.
Link: https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qmjpl@intel.com/T/#t
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-26-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO. This patch includes
i386/pc support.
Also hook up the device reset now that we have have the MMIO
space in which the results are visible.
Note that we duplicate the PCI express case for the aml_build but
the implementations will diverge when the CXL specific _OSC is
introduced.
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-24-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-14-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
error_setg_errno() expects a normal errno value, not a negated
one, so we should use ENOTSUP instead of -ENOSUP.
Fixes: Coverity CID 1487174
Fixes: ("intel_iommu: support snoop control")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220401022824.9337-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
As part of converting -boot to a property with a QAPI type, define
the struct and use it throughout QEMU to access boot configuration.
machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by
hand, for now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
igd-passthrough-isa-bridge is only requested in xen_pt but was
implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc
which is hereby resolved.
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20220326165825.30794-2-shentey@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Header guard symbols should match their file name to make guard
collisions less likely.
Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]
Don't register firmware as rom, not needed (see comment).
Add x86_firmware_configure() call for proper sev initialization.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-4-kraxel@redhat.com>
move sev firmware setup to separate function so it can be used from
other code paths. No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-3-kraxel@redhat.com>
Switch to usual goto-end-of-function error handling style.
No functional change.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20220425135051.551037-2-kraxel@redhat.com>
"0x%u" format is very misleading, replace by "0x%x".
Found running:
$ git grep -E '0x%[0-9]*([lL]*|" ?PRI)[dDuU]' hw/
Inspired-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-id: 20220323114718.58714-3-philippe.mathieu.daude@gmail.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Be more explicit that the loop must roll at least once. Avoids the
following warning:
FAILED: libqemu-x86_64-softmmu.fa.p/hw_i386_amd_iommu.c.o
In function 'pte_get_page_mask',
inlined from 'amdvi_page_walk' at hw/i386/amd_iommu.c:945:25,
inlined from 'amdvi_do_translate' at hw/i386/amd_iommu.c:989:5,
inlined from 'amdvi_translate' at hw/i386/amd_iommu.c:1038:5:
hw/i386/amd_iommu.c:877:38: error: 'oldlevel' may be used uninitialized [-Werror=maybe-uninitialized]
877 | return ~((1UL << ((oldlevel * 9) + 3)) - 1);
| ~~~~~~~~~~~~~~~~^~~~
hw/i386/amd_iommu.c: In function 'amdvi_translate':
hw/i386/amd_iommu.c:906:41: note: 'oldlevel' was declared here
906 | unsigned level, present, pte_perms, oldlevel;
| ^~~~~~~~
cc1: all warnings being treated as errors
Having:
$ gcc --version
gcc (Debian 12-20220313-1) 12.0.1 20220314 (experimental)
Reported-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer,
for two reasons. One, it catches multiplication overflowing size_t.
Two, it returns T * rather than void *, which lets the compiler catch
more type errors.
This commit only touches allocations with size arguments of the form
sizeof(T).
Patch created mechanically with:
$ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \
--macro-file scripts/cocci-macro-file.h FILES...
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20220315144156.1595462-4-armbru@redhat.com>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
On Solaris, 'sun' is #define'd to 1, which causes errors if a variable
is named 'sun'. Slightly change the name of the var for the Slot User
Number so we can build on Solaris.
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Signed-off-by: Andrew Deason <adeason@sinenomine.net>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20220316035227.3702-3-adeason@sinenomine.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The second bit of IAPC_BOOT_ARCH in FADT table indicates the presence of
keyboard controller implemented as 8042 or equivalent micro controller. This
change enables this flag for microvms if such a device exists (for example,
when added explicitly from the QEMU commandline). Change
654701e292d98b308b0 ("hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table")
enabled this flag for i386 q35 based machines. The reason for doing the same
for micro-vms is to make sure we provide the correct tables to the guest OS
uniformly in all cases when an i8042 device is present. When this bit is not
enabled, guest OSes has to find other indirect methods to detect the device
which we would like to avoid.
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20220304154032.2071585-5-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This can allow the guest OS to determine more easily if i8042 controller
is present in the system or not, so it doesn't need to do probing of the
controller, but just initialize it immediately, before enumerating the
ACPI AML namespace.
The 8042 bit in IAPC_BOOT_ARCH was introduced from ACPI spec v2 (FADT
revision 2 and above). Therefore, in this change, we only enable this bit for
x86/q35 machine types since x86/i440fx machines use FADT ACPI table with
revision 1.
Signed-off-by: Liav Albani <liavalb@gmail.com>
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20220304154032.2071585-3-ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
COMPAT_PROP_PCP is 'on' by default and it's used for turning
off PCP capability on PCIe slots for 2.0 machine types using
compat machinery.
Drop not needed compat glue as Q35 supports migration starting
from 2.4 machine types.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220222102504.3080104-1-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
commit
f862ddbb1a (hw/i386: Remove the deprecated pc-1.x machine types)
removed the last user of broken APIC ID compat knob,
but compat_apic_id_mode itself was forgotten.
Clean it up and simplify x86_cpu_apic_id_from_index()
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220228131634.3389805-1-imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
On some older software like Windows 7 installer, having both a PS/2
mouse and USB mouse results in only one device working property (which
might be a different device each boot). While the workaround to not use
a USB mouse with such software is valid, it creates an inconsistent
experience if the user wishes to always use a USB mouse.
This introduces a new machine property to inhibit the creation of the
i8042 PS/2 controller.
Signed-off-by: Joelle van Dyne <j@getutm.app>
Message-Id: <20220227210655.45592-1-j@getutm.app>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The list of machine types grows larger and larger each release ... and
it is unlikely that many people still use the very old ones for live
migration. QEMU v1.7 has been released more than 8 years ago, so most
people should have updated their machines to a newer version in those
8 years at least once. Thus let's mark the very old 1.x machine types
as deprecated now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220117191639.278497-1-thuth@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Replce the literal magic number 48 with length calculation (32 bytes at
the end of the firmware after the table footer + 16 bytes of the OVMF
table footer GUID).
No functional change intended.
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Message-Id: <20220222071906.2632426-3-dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
When pc_system_parse_ovmf_flash() parses the optional GUIDed table in
the end of the OVMF flash memory area, the table length field is checked
for sizes that are too small, but doesn't error on sizes that are too
big (bigger than the flash content itself).
Add a check for maximal size of the OVMF table, and add an error report
in case the size is invalid. In such a case, an error like this will be
displayed during launch:
qemu-system-x86_64: OVMF table has invalid size 4047
and the table parsing is skipped.
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Message-Id: <20220222071906.2632426-2-dovmurik@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
SC is required for some kernel features like vhost-vDPA. So this patch
implements basic SC feature. The idea is pretty simple, for software
emulated DMA it would be always coherent. In this case we can simple
advertise ECAP_SC bit. For VFIO and vhost, thing will be more much
complicated, so this patch simply fail the IOMMU notifier
registration.
In the future, we may want to have a dedicated notifiers flag or
similar mechanism to demonstrate the coherency so VFIO could advertise
that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't
need that since it's a software backend.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220214060346.72455-1-jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
More than 1k of TypeInfo instances are already marked as const. Mark the
remaining ones, too.
This commit was created with:
git grep -z -l 'static TypeInfo' -- '*.c' | \
xargs -0 sed -i 's/static TypeInfo/static const TypeInfo/'
Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Message-id: 20220117145805.173070-2-shentey@gmail.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This change exposes ACPI ERST support for x86 guests.
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <1643402289-22216-8-git-send-email-eric.devolder@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We already have a CONFIG_ISAPC switch - but we're not using it yet.
Add some "#ifdefs" to make it possible to disable this machine now.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220107160713.235918-1-thuth@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* "meson test" switch for iotests
* deprecation of old SGX QAPI
* unexport InterruptStatsProviderClass-related functions
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmHzwdAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOlMwgApVO6T934TwuTOFBs02Tp8KcjcKQq
5Qf5PrwbsluE8YixJERAFgfxgzbQKL/iHzkP97KoVo3CFFDc1sss7v3mrbbHxbwB
jiYIgbNjRfBbZq7nNmxf2bcXxTpZYvMNEia0QPyNl1GQ0JS/RFOsEYcyVUPvVmAI
djeESRyku1OS5bS5veqE2tL+AWDuodjHeW78qi+yIDRPx3jHLRzuGpQ/3KtgMN6o
d61h5vVrqdxe1HYH/8sh+qU71w7drpNZVYq8PqkZyit/LXgDMyDDOks1XWarrrJE
9xzNBUNR+B6CuHtNw1DGk4rvsCBV/F0ZVki6guVGoqG8njF3l6xyqRHPbw==
=2/gW
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging
* configure and meson fixes
* "meson test" switch for iotests
* deprecation of old SGX QAPI
* unexport InterruptStatsProviderClass-related functions
# gpg: Signature made Fri 28 Jan 2022 10:13:36 GMT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream:
configure: fix parameter expansion of --cross-cc-cflags options
qapi: Cleanup SGX related comments and restore @section-size
check-block: replace -makecheck with TAP output
qemu-iotests: require at least an argument to check-block.sh
build: make check-block a meson test
scripts/mtest2make: add support for SPEED=thorough
check-block.sh: passthrough -jN flag of make to -j N flag of check
meson: Use find_program() to resolve the entitlement.sh script
exec/cpu: Make host pages variables / macros 'target agnostic'
meson.build: Use a function from libfdt 1.5.1 for the library check
intc: Unexport InterruptStatsProviderClass-related functions
docker: add msitools to Fedora/mingw cross
build-sys: fix undefined ARCH error
build-sys: fix a meson deprecation warning
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The SGX NUMA patches were merged into Qemu 7.0 release, we need
clarify detailed version history information and also change
some related comments, which make SGX related comments clearer.
The QMP command schema promises backwards compatibility as standard.
We temporarily restore "@section-size", which can avoid incompatible
API breakage. The "@section-size" will be deprecated in 7.2 version.
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20220120223104.437161-1-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In some cases, a particular mapcache entry may be mapped 256 times
causing the lock field to wrap to 0. For example, this may happen when
using emulated NVME and the guest submits a large scatter-gather write.
At this point, the entry map be remapped causing QEMU to write the wrong
data or crash (since remap is not atomic).
Avoid this overflow by increasing the lock field to a uint32_t and also
detect it and abort rather than continuing regardless.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Message-Id: <20220124104450.152481-1-ross.lagerwall@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
commit f37f29d314 "xen: slightly simplify bufioreq handling" hard
coded setting req.count = 1 during initial field setup before the main
loop. This missed a subtlety that an early exit from the loop when
there are no ioreqs to process, would have req.count == 0 for the return
value. handle_buffered_io() would then remove state->buffered_io_timer.
Instead handle_buffered_iopage() is basically always returning true and
handle_buffered_io() always re-setting the timer.
Restore the disabling of the timer by introducing a new handled_ioreq
boolean and use as the return value. The named variable will more
clearly show the intent of the code.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20211210193434.75566-1-jandryuk@gmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Add missing TPM device identification objects _STR and _UID. They will
appear as files 'description' and 'uid' under Linux sysfs.
Following inspection of sysfs entries for hardware TPMs we chose
uid '1'.
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Ani Sinha <ani@anisinha.ca>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/708
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Shannon Zhao <shannon.zhaosl@gmail.com>
Message-id: 20211223022310.575496-3-stefanb@linux.ibm.com
Message-Id: <20220104175806.872996-3-stefanb@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
When running "qemu-system-x86_64 -M pc,help" I noticed that some
properties were still missing their description. Add them now so
that users get at least a slightly better idea what they are all
about.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20211206134255.94784-1-thuth@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
if QEMU is started with used provided SLIC table blob,
-acpitable sig=SLIC,oem_id='CRASH ',oem_table_id="ME",oem_rev=00002210,asl_compiler_id="",asl_compiler_rev=00000000,data=/dev/null
it will assert with:
hw/acpi/aml-build.c:61:build_append_padded_str: assertion failed: (len <= maxlen)
and following backtrace:
...
build_append_padded_str (array=0x555556afe320, str=0x555556afdb2e "CRASH ME", maxlen=0x6, pad=0x20) at hw/acpi/aml-build.c:61
acpi_table_begin (desc=0x7fffffffd1b0, array=0x555556afe320) at hw/acpi/aml-build.c:1727
build_fadt (tbl=0x555556afe320, linker=0x555557ca3830, f=0x7fffffffd318, oem_id=0x555556afdb2e "CRASH ME", oem_table_id=0x555556afdb34 "ME") at hw/acpi/aml-build.c:2064
...
which happens due to acpi_table_begin() expecting NULL terminated
oem_id and oem_table_id strings, which is normally the case, but
in case of user provided SLIC table, oem_id points to table's blob
directly and as result oem_id became longer than expected.
Fix issue by handling oem_id consistently and make acpi_get_slic_oem()
return NULL terminated strings.
PS:
After [1] refactoring, oem_id semantics became inconsistent, where
NULL terminated string was coming from machine and old way pointer
into byte array coming from -acpitable option. That used to work
since build_header() wasn't expecting NULL terminated string and
blindly copied the 1st 6 bytes only.
However commit [2] broke that by replacing build_header() with
acpi_table_begin(), which was expecting NULL terminated string
and was checking oem_id size.
1) 602b45820 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
2)
Fixes: 4b56e1e4eb ("acpi: build_fadt: use acpi_table_begin()/acpi_table_end() instead of build_header()")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/786
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211227193120.1084176-2-imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Denis Lisov <dennis.lissov@gmail.com>
Tested-by: Alexander Tsoy <alexander@tsoy.me>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When scalable mode is enabled, the passthrough more is not determined
by the context entry but PASID entry, so switch to use the logic of
vtd_dev_pt_enabled() to determine the passthrough mode in
vtd_do_iommu_translate().
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220105041945.13459-2-jasowang@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Set the new default to "auto", keeping it set to "off" for compat
machines. This property is only available for x86 targets.
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211217134039.29670-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The i440fx and Q35 machine types are both hardcoded to use the
legacy SMBIOS 2.1 (32-bit) entry point. This is a sensible
conservative choice because SeaBIOS only supports SMBIOS 2.1
EDK2, however, can also support SMBIOS 3.0 (64-bit) entry points,
and QEMU already uses this on the ARM virt machine type.
This adds a property to allow the choice of SMBIOS entry point
versions For example to opt in to 64-bit SMBIOS entry point:
$QEMU -machine q35,smbios-entry-point-type=64
Based on a patch submitted by Daniel Berrangé.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20211026151100.1691925-4-ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Rename the enums to match the naming style used by QAPI, and to
use "32" and "64" instead of "20" and "31". This will allow us
to more easily move the enum to the QAPI schema later.
About the naming choice: "SMBIOS 2.1 entry point"/"SMBIOS 3.0
entry point" and "32-bit entry point"/"64-bit entry point" are
synonymous in the SMBIOS specification. However, the phrases
"32-bit entry point" and "64-bit entry point" are used more often.
The new names also avoid confusion between the entry point format
and the actual SMBIOS version reported in the entry point
structure. For example: currently the 32-bit entry point
actually report SMBIOS 2.8 support, not 2.1.
Based on portions of a patch submitted by Daniel P. Berrangé.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20211026151100.1691925-2-ehabkost@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add 7.0 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20211217143948.289995-1-cohuck@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
If the 'i8042' property is not set, mouse events handled by
vmmouse_mouse_event() end calling i8042_isa_mouse_fake_event()
with a NULL argument, resulting in ps2_mouse_fake_event() being
called with invalid PS2MouseState pointer. Fix by requiring
the 'i8042' property to be always set:
$ qemu-system-x86_64 -device vmmouse
qemu-system-x86_64: -device vmmouse: 'i8042' link is not set
Fixes: 91c9e09147 ("vmmouse: convert to qdev")
Reported-by: Calvin Buckley <calvin@cmpct.info>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/752
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20211201223253.36080-1-f4bug@amsat.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add the SGXEPCSection list into SGXInfo to show the multiple
SGX EPC sections detailed info, not the total size like before.
This patch can enable numa support for 'info sgx' command and
QMP interfaces. The new interfaces show each EPC section info
in one numa node. Libvirt can use QMP interface to get the
detailed host SGX EPC capabilities to decide how to allocate
host EPC sections to guest.
(qemu) info sgx
SGX support: enabled
SGX1 support: enabled
SGX2 support: enabled
FLC support: enabled
NUMA node #0: size=67108864
NUMA node #1: size=29360128
The QMP interface show:
(QEMU) query-sgx
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 67108864}, {"node": 1, "size": 29360128}], "flc": true}}
(QEMU) query-sgx-capabilities
{"return": {"sgx": true, "sgx2": true, "sgx1": true, "sections": \
[{"node": 0, "size": 17070817280}, {"node": 1, "size": 17079205888}], "flc": true}}
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-4-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The basic SGX did not enable numa for SGX EPC sections, which
result in all EPC sections located in numa node 0. This patch
enable SGX numa function in the guest and the EPC section can
work with RAM as one numa node.
The Guest kernel related log:
[ 0.009981] ACPI: SRAT: Node 0 PXM 0 [mem 0x180000000-0x183ffffff]
[ 0.009982] ACPI: SRAT: Node 1 PXM 1 [mem 0x184000000-0x185bfffff]
The SRAT table can normally show SGX EPC sections menory info in different
numa nodes.
The SGX EPC numa related command:
......
-m 4G,maxmem=20G \
-smp sockets=2,cores=2 \
-cpu host,+sgx-provisionkey \
-object memory-backend-ram,size=2G,host-nodes=0,policy=bind,id=node0 \
-object memory-backend-epc,id=mem0,size=64M,prealloc=on,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-1,memdev=node0 \
-object memory-backend-ram,size=2G,host-nodes=1,policy=bind,id=node1 \
-object memory-backend-epc,id=mem1,size=28M,prealloc=on,host-nodes=1,policy=bind \
-numa node,nodeid=1,cpus=2-3,memdev=node1 \
-M sgx-epc.0.memdev=mem0,sgx-epc.0.node=0,sgx-epc.1.memdev=mem1,sgx-epc.1.node=1 \
......
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When booting with scalable mode, I hit this error:
qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero iova=0xfffff002, level=0x1slpte=0x102681803)
qemu-system-x86_64: vtd_iommu_translate: detected translation failure (dev=01:00:00, iova=0xfffff002)
qemu-system-x86_64: New fault is not recorded due to compression of faults
This is because the SNP bit is set for second level page table since
Linux kernel commit 6c00612d0cba1 ("iommu/vt-d: Report right snoop
capability when using FL for IOVA") even if SC is not supported by the
hardware.
To unbreak the guest, ignore the leaf SNP bit for scalable mode
first. In the future we may consider to add SC support.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20211129033618.3857-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
pci power management fixes
acpi hotplug fixes
misc other fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGSh40PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpH2AH/RLY+ONL98GT4D+hi8MGCOhN669jrPbAqJ1L
l9p2GxDoXSD4PCDReU3VCzLtRsxsv/camgMx/DnDxaZtdgKm8SlXGJutMtNHpTY6
PHZLMLoixZSsKi6Tm3xGut/FSzsTZl4Gc3rqwaryHQLqptNO+XQJSBmP+oEAdjAd
nKVepHRveOTVVBBzCmoNpFA+BEXRTItdfG0ZKprPkXUobc2jeV7ymkTX9s2OBLEf
/pE49tZj1K4ab8g4+RY4cFEFoDZbXZ55Aq3ck5LAb47qIr/1cXPVP7PxINmasy4Y
H+oTUpWLBM7rFbdP/GBFANu5HkEQ5pnjeosWYOKsopE4UFCyDxc=
=y9C8
-----END PGP SIGNATURE-----
Merge tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu into staging
pci,pc,virtio: bugfixes
pci power management fixes
acpi hotplug fixes
misc other fixes
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 15 Nov 2021 05:15:09 PM CET
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
pcie: expire pending delete
pcie: fast unplug when slot power is off
pcie: factor out pcie_cap_slot_unplug()
pcie: add power indicator blink check
pcie: implement slot power control for pcie root ports
pci: implement power state
vdpa: Check for existence of opts.vhostdev
vdpa: Replace qemu_open_old by qemu_open at
virtio: use virtio accessor to access packed event
virtio: use virtio accessor to access packed descriptor flags
tests: bios-tables-test update expected blobs
hw/i386/acpi-build: Deny control on PCIe Native Hot-plug in _OSC
bios-tables-test: Allow changes in DSDT ACPI tables
hw/acpi/ich9: Add compat prop to keep HPC bit set for 6.1 machine type
pcie: rename 'native-hotplug' to 'x-native-hotplug'
hw/mem/pc-dimm: Restrict NUMA-specific code to NUMA machines
vhost: Fix last vq queue index of devices with no cvq
vhost: Rename last_index to vq_index_end
softmmu/qdev-monitor: fix use-after-free in qdev_set_id()
net/vhost-vdpa: fix memory leak in vhost_vdpa_get_max_queue_pairs()
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
There are two ways to enable ACPI PCI Hot-plug:
* Disable the Hot-plug Capable bit on PCIe slots.
This was the first approach which led to regression [1-2], as
I/O space for a port is allocated only when it is hot-pluggable,
which is determined by HPC bit.
* Leave the HPC bit on and disable PCIe Native Hot-plug in _OSC
method.
This removes the (future) ability of hot-plugging switches with PCIe
Native hotplug since ACPI PCI Hot-plug only works with cold-plugged
bridges. If the user wants to explicitely use this feature, they can
disable ACPI PCI Hot-plug with:
--global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
Change the bit in _OSC method so that the OS selects ACPI PCI Hot-plug
instead of PCIe Native.
[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211112110857.3116853-5-imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To solve issues [1-2] the Hot Plug Capable bit in PCIe Slots will be
turned on, while the switch to ACPI Hot-plug will be done in the
DSDT table.
Introducing 'x-keep-native-hpc' property disables the HPC bit only
in 6.1 and as a result keeps the forced 'reserve-io' on
pcie-root-ports in 6.1 too.
[1] https://gitlab.com/qemu-project/qemu/-/issues/641
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2006409
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211112110857.3116853-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Mark property as experimental/internal adding 'x-' prefix.
Property was introduced in 6.1 and it should have provided
ability to turn on native PCIE hotplug on port even when
ACPI PCI hotplug is in use is user explicitly sets property
on CLI. However that never worked since slot is wired to
ACPI hotplug controller.
Another non-intended usecase: disable native hotplug on slot
when APCI based hotplug is disabled, which works but slot has
'hotplug' property for this taks.
It should be relatively safe to rename it to experimental
as no users should exist for it and given that the property
is broken we don't really want to leave it around for much
longer lest users start using it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20211112110857.3116853-2-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
For bare-metal SGX on real hardware, the hardware provides guarantees
SGX state at reboot. For instance, all pages start out uninitialized.
The vepc driver provides a similar guarantee today for freshly-opened
vepc instances, but guests such as Windows expect all pages to be in
uninitialized state on startup, including after every guest reboot.
Qemu can invoke the ioctl to bring its vEPC pages back to uninitialized
state. There is a possibility that some pages fail to be removed if they
are SECS pages, and the child and SECS pages could be in separate vEPC
regions. Therefore, the ioctl returns the number of EREMOVE failures,
telling Qemu to try the ioctl again after it's done with all vEPC regions.
The related kernel patches:
Link: https://lkml.kernel.org/r/20211021201155.1523989-3-pbonzini@redhat.com
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20211101162009.62161-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The device is created by the machine based on the sgx-epc property.
It should not be created by users.
Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmGBUeEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOh+Qf+OMRhRiv6dYjbK/5zXrx81AgxYAY3
dBUSr8v16LyrMl1U3DZWzhD+MzQsC83m/Xsh4lGxlHDWtkK9QQA5xDG95JZdY26i
MGCbbjnFHISbyBQV9Y724gPfPjOOODuoFbzafSx6VLITOcyv1ye0cm7TOjOPB+tt
E4c3JqTZ7g8a5yMe8ItkVhz5pPY+oVw8dxMNRp6Sup5Dbfx0DjacIwLasLsHfPL7
qBADfqB20ovHUzLjXu7oWgEd4KxJ6kiSCaJJu/KD36hg0wB8+WVP1o43j4PkczHT
QjU7eZaeaTrN5Cf34ttPge6QReMi5SFNCaA9O9/HLqrQgdEtt/diZWuqjQ==
=a2mC
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
* Build system fixes and cleanups
* DMA support in the multiboot option ROM
* Rename default-bus-bypass-iommu
* Deprecate -watchdog and cleanup -watchdog-action
* HVF fix for <PAGE_SIZE regions
* Support TSC scaling for AMD nested virtualization
* Fix for ESP fuzzing bug
# gpg: Signature made Tue 02 Nov 2021 10:57:37 AM EDT
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
* remotes/bonzini/tags/for-upstream: (27 commits)
configure: fix --audio-drv-list help message
configure: Remove the check for the __thread keyword
Move the l2tpv3 test from configure to meson.build
meson: remove unnecessary coreaudio test program
meson: remove pointless warnings
meson.build: Allow to disable OSS again
meson: bump submodule to 0.59.3
qtest/am53c974-test: add test for cancelling in-flight requests
esp: ensure in-flight SCSI requests are always cancelled
KVM: SVM: add migration support for nested TSC scaling
hw/i386: fix vmmouse registration
watchdog: remove select_watchdog_action
vl: deprecate -watchdog
watchdog: add information from -watchdog help to -device help
hw/i386: Rename default_bus_bypass_iommu
hvf: Avoid mapping regions < PAGE_SIZE as ram
configure: do not duplicate CPU_CFLAGS into QEMU_LDFLAGS
configure: remove useless NPTL probe
target/i386: use DMA-enabled multiboot ROM for new-enough QEMU machine types
optionrom: add a DMA-enabled multiboot ROM
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
virtio-iommu support for x86/ACPI.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGAefYPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpiCUH/2pIs3FmOGIasEqn4BnqXr4dHMReUO5Ghg0v
cXle4ZUrbg7Qpnxh07CwMuUpJV3Qv+xtVK7hzbD13nnxrkTZuKzBRV1AthkA1Hly
zIKOxnEgV497LaXoaSOtqAx48fuznk5XOHju91usgu4mehJ0qe2gcwb4H8uWGkQi
hrsR7a9woP0M4H/jvb3+aQRCJKMscj8ReabM1ulOugNpPdNI/jIKtBvZBtTxAqtQ
CH9/DJLfVmzDRYdeBpnF06A+tXm4uU1Q5BmpmF9qaymk/PzthN54gdnDd6zH405Z
Tmjp9UA2xfEYDmKzuTCBdPmoUe6OI7mU9o0WbB5MGYx5RRRBETw=
=R7DD
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc,pci,virtio: features, fixes
virtio-iommu support for x86/ACPI.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 01 Nov 2021 07:36:22 PM EDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
* remotes/mst/tags/for_upstream:
hw/i386: fix vmmouse registration
pci: Export pci_for_each_device_under_bus*()
pci: Define pci_bus_dev_fn/pci_bus_fn/pci_bus_ret_fn
hw/i386/pc: Allow instantiating a virtio-iommu device
hw/i386/pc: Move IOMMU singleton into PCMachineState
hw/i386/pc: Remove x86_iommu_get_type()
hw/acpi: Add VIOT table
vhost-vdpa: Set discarding of RAM broken when initializing the backend
qtest: fix 'expression is always false' build failure in qtest_has_accel()
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allows edk2 detect virtio-mmio devices and pcie ecam.
See comment in hw/i386/microvm-dt.c for more details.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Sergio Lopez <slp@redhat.com>
Message-Id: <20211014193617.2475578-1-kraxel@redhat.com>
According to the logic of vmmouse_update_handler function,
vmmouse should be registered as an event handler when
it's status is zero.
vmmouse_read_id resets the status but does not register
the handler.
This patch adds vmmouse registration and activation when
status is reset.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <163524204515.1914131.16465061981774791228.stgit@pasha-ThinkPad-X280>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since commit d8fb7d0969 ("vl: switch -M parsing to keyval"), machine
parameter definitions cannot use underscores, because keyval_dashify()
transforms them to dashes and the parser doesn't find the parameter.
This affects option default_bus_bypass_iommu which was introduced in the
same release:
$ qemu-system-x86_64 -M q35,default_bus_bypass_iommu=on
qemu-system-x86_64: Property 'pc-q35-6.1-machine.default-bus-bypass-iommu' not found
Rename the parameter to "default-bus-bypass-iommu". Passing
"default_bus_bypass_iommu" is still valid since the underscore are
transformed automatically.
Fixes: c9e96b04fc ("hw/i386: Add a default_bus_bypass_iommu pc machine option")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211025104737.1560274-1-jean-philippe@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As long as fw_cfg supports DMA, the new ROM can be used also on older
machine types because it has the same size as the existing one.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This removes a parameter from x86_load_linux, and will avoid code
duplication between the linux and multiboot cases once multiboot
starts to support DMA.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
As qemu guidelines:
Unless a pointer is used to modify the pointed-to storage, give it the
"const" attribute.
In the particular case of iova_tree_find it allows to enforce what is
requested by its comment, since the compiler would shout in case of
modifying or freeing the const-qualified returned pointer.
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211013182713.888753-2-eperezma@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
According to the logic of vmmouse_update_handler function,
vmmouse should be registered as an event handler when
it's status is zero.
vmmouse_read_id resets the status but does not register
the handler.
This patch adds vmmouse registration and activation when
status is reset.
Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Message-Id: <163524204515.1914131.16465061981774791228.stgit@pasha-ThinkPad-X280>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
They're actually more commonly used than the helper without _under_bus, because
most callers do have the pci bus on hand. After exporting we can switch a lot
of the call sites to use these two helpers.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20211028043129.38871-3-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Allow instantiating a virtio-iommu device by adding an ACPI Virtual I/O
Translation table (VIOT), which describes the relation between the
virtio-iommu and the endpoints it manages.
Add a hotplug handler for virtio-iommu on x86 and set the necessary
reserved region property. On x86, the [0xfee00000, 0xfeefffff] DMA
region is reserved for MSIs. DMA transactions to this range either
trigger IRQ remapping in the IOMMU or bypasses IOMMU translation.
Although virtio-iommu does not support IRQ remapping it must be informed
of the reserved region so that it can forward DMA transactions targeting
this region.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-5-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We're about to support a third vIOMMU for x86, virtio-iommu which
doesn't inherit X86IOMMUState. Move the IOMMU singleton into
PCMachineState, so it can be shared between all three vIOMMUs.
The x86_iommu_get_default() helper is still needed by KVM and IOAPIC to
fetch the default IRQ-remapping IOMMU. Since virtio-iommu doesn't
support IRQ remapping, this interface doesn't need to change for the
moment. We could later replace X86IOMMUState with an "IRQ remapping
IOMMU" interface if necessary.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-4-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
To generate the IOMMU ACPI table, acpi-build.c can use base QEMU types
instead of a special IommuType value.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20211026182024.2642038-3-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
since dirty ring has been introduced, there are two methods
to track dirty pages of vm. it seems that "logging" has
a hint on the method, so rename the global_dirty_log to
global_dirty_tracking would make description more accurate.
dirty rate measurement may start or stop dirty tracking during
calculation. this conflict with migration because stop dirty
tracking make migration leave dirty pages out then that'll be
a problem.
make global_dirty_tracking a bitmask can let both migration and
dirty rate measurement work fine. introduce GLOBAL_DIRTY_MIGRATION
and GLOBAL_DIRTY_DIRTY_RATE to distinguish what current dirty
tracking aims for, migration or dirty rate.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <9c9388657cfa0301bd2c1cfa36e7cf6da4aeca19.1624040308.git.huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Move qmp_query_sgx() and hmp_info_sgx() from target/i386/monitor.c
to hw/i386/sgx.c, removing the sgx_get_info() indirection and the
"hw/i386/sgx.h" header.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007175612.496366-5-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move qmp_query_sgx_capabilities() from target/i386/monitor.c to
hw/i386/sgx.c, removing the sgx_get_capabilities() indirection.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007175612.496366-4-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007175612.496366-3-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
"sysemu/sev.h" is only used from x86-specific files. Let's move it
to include/hw/i386, and merge it with target/i386/sev.h.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211007161716.453984-16-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SEV is a x86 specific feature, and the "sev_i386.h" header
is already in target/i386/. Rename it as "sev.h" to simplify.
Patch created mechanically using:
$ git mv target/i386/sev_i386.h target/i386/sev.h
$ sed -i s/sev_i386.h/sev.h/ $(git grep -l sev_i386.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20211007161716.453984-15-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
A huge acpi refactoring.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmFcxB8PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpldIH/jpPtsQr8oC1RrP8lTmiWh9vC9c3Xa5xfH7k
lXOn01kPprAWsm+qpoydOKJk/6v8TJ2E5nQ5AIU5EQvOEqSp8VnO0swHvTC7bwBa
1mGBvN3Ssjd/XFcgEc1xRJMzTO+Z1uF9tlq9ymDhLDtnuswyu5fl71NoXNVqE4st
r9wQrDX1lgzAUzL2QBr/g8wzx7aDS9HIyTrbl+ehzVEBrNKggo+AQgtSpyS1lxEC
NCIirfoaGgTKKrMASms5dXW44XNL/lXsZtheYrfF5nLiatCcthJNGnZ/BhUelH2B
EjXax7zs3IpWvRoGxRyqxElWgaCA2IXVvtLsT3WBqrTmi0Hoku8=
=nFLU
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc,pci,virtio: features, fixes
A huge acpi refactoring.
Fixes, cleanups all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Tue 05 Oct 2021 02:31:11 PM PDT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
* remotes/mst/tags/for_upstream: (47 commits)
hw/i386/amd_iommu: Add description/category to TYPE_AMD_IOMMU_PCI
hw/i386/amd_iommu: Rename SysBus specific functions as amdvi_sysbus_X()
hw/i386/amd_iommu: Rename amdviPCI TypeInfo
nvdimm: release the correct device list
virtio-balloon: Fix page-poison subsection name
bios-tables-test: Update ACPI DSDT table golden blobs for q35
hw/i386/acpi: fix conflicting IO address range for acpi pci hotplug in q35
bios-tables-test: allow changes in DSDT ACPI tables for q35
acpi: AcpiGenericAddress no longer used to map/access fields of MMIO, drop packed attribute
acpi: remove no longer used build_header()
acpi: build_facs: use build_append_int_noprefix() API to compose table
acpi: arm/virt: build_gtdt: use acpi_table_begin()/acpi_table_end() instead of build_header()
acpi: arm/virt: build_spcr: use acpi_table_begin()/acpi_table_end() instead of build_header()
acpi: arm/virt: build_spcr: fix invalid cast
acpi: arm/virt: convert build_iort() to endian agnostic build_append_FOO() API
acpi: arm: virt: build_iort: use acpi_table_begin()/acpi_table_end() instead of build_header()
acpi: arm: virt: build_dsdt: use acpi_table_begin()/acpi_table_end() instead of build_header()
acpi: build_dsdt_microvm: use acpi_table_begin()/acpi_table_end() instead of build_header()
acpi: arm/virt: madt: use build_append_int_noprefix() API to compose MADT table
acpi: x86: madt: use build_append_int_noprefix() API to compose MADT table
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
TYPE_AMD_IOMMU_PCI is user-creatable but not well described.
Implement its class_init() handler to add it to the 'Misc
devices' category, and add a description.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210926175648.1649075-4-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Various functions are SysBus specific. Rename them using the
consistent amdvi_sysbus_XXX() pattern, to differentiate them
from PCI specific functions (which we'll add in the next
commit).
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210926175648.1649075-3-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Per 'QEMU Coding Style':
Naming
======
Variables are lower_case_with_underscores; easy to type and read.
Rename amdviPCI variable as amdvi_pci.
amdviPCI_register_types() register more than PCI types:
TYPE_AMD_IOMMU_DEVICE inherits TYPE_X86_IOMMU_DEVICE which
itself inherits TYPE_SYS_BUS_DEVICE.
Rename it more generically as amdvi_register_types().
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210926175648.1649075-2-f4bug@amsat.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian
conversions when building table and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-34-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-27-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building MADT table for arm/x86 and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-25-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Instead of composing disabled _MAT entry and then later on
patching it to enabled for hotpluggbale CPUs in DSDT,
set it to enabled at the time _MAT entry is built.
It will allow to drop usage of packed structures in
following patches when build_madt() is switched to use
build_append_int_noprefix() API.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-24-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-22-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-21-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-20-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries tables.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-19-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Drop usage of packed structures and explicit endian conversions
when building SRAT tables for arm/x86 and use endian agnostic
build_append_int_noprefix() API to build it.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-18-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries (which also removes some manual offset
calculations)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-17-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
While at it switch to build_append_int_noprefix() to build
table entries (which also removes some manual offset
calculations).
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-16-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
while at it convert build_hpet() to endian agnostic
build_append_FOO() API
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210924122802.1455362-15-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
it replaces error-prone pointer arithmetic for build_header() API,
with 2 calls to start and finish table creation,
which hides offsets magic from API user.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20210924122802.1455362-14-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
If SEV is enabled and a kernel is passed via -kernel, pass the hashes of
kernel/initrd/cmdline in an encrypted guest page to OVMF for SEV
measured boot.
Co-developed-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: James Bottomley <jejb@linux.ibm.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210930054915.13252-3-dovmurik@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM implements some Hyper-V 2016 functions so providing WS2008R2 version
is somewhat incorrect. While generally guests shouldn't care about it
and always check feature bits, it is known that some tools in Windows
actually check version info.
For compatibility reasons make the change for 6.2 machine types only.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210902093530.345756-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now we have a common structure SMPCompatProps used to store information
about SMP compatibility stuff, so we can also move smp_prefer_sockets
there for cleaner code.
No functional change intended.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-15-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently the only difference between smp_parse and pc_smp_parse
is the support of dies parameter and the related error reporting.
With some arch compat variables like "bool dies_supported", we can
make smp_parse generic enough for all arches and the PC specific
one can be removed.
Making smp_parse() generic enough can reduce code duplication and
ease the code maintenance, and also allows extending the topology
with more arch specific members (e.g., clusters) in the future.
Suggested-by: Andrew Jones <drjones@redhat.com>
Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-13-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In the real SMP hardware topology world, it's much more likely that
we have high cores-per-socket counts and few sockets totally. While
the current preference of sockets over cores in smp parsing results
in a virtual cpu topology with low cores-per-sockets counts and a
large number of sockets, which is just contrary to the real world.
Given that it is better to make the virtual cpu topology be more
reflective of the real world and also for the sake of compatibility,
we start to prefer cores over sockets over threads in smp parsing
since machine type 6.2 for different arches.
In this patch, a boolean "smp_prefer_sockets" is added, and we only
enable the old preference on older machines and enable the new one
since type 6.2 for all arches by using the machine compat mechanism.
Suggested-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-10-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We have two requirements for a valid SMP configuration:
the product of "sockets * cores * threads" must represent all the
possible cpus, i.e., max_cpus, and then must include the initially
present cpus, i.e., smp_cpus.
So we only need to ensure 1) "sockets * cores * threads == maxcpus"
at first and then ensure 2) "maxcpus >= cpus". With a reasonable
order of the sanity check, we can simplify the error reporting code.
When reporting an error message we also report the exact value of
each topology member to make users easily see what's going on.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210929025816.21076-7-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently we directly calculate the omitted cpus based on the given
incomplete collection of parameters. This makes some cmdlines like:
-smp maxcpus=16
-smp sockets=2,maxcpus=16
-smp sockets=2,dies=2,maxcpus=16
-smp sockets=2,cores=4,maxcpus=16
not work. We should probably set the value of cpus to match maxcpus
if it's omitted, which will make above configs start to work.
So the calculation logic of cpus/maxcpus after this patch will be:
When both maxcpus and cpus are omitted, maxcpus will be calculated
from the given parameters and cpus will be set equal to maxcpus.
When only one of maxcpus and cpus is given then the omitted one
will be set to its counterpart's value. Both maxcpus and cpus may
be specified, but maxcpus must be equal to or greater than cpus.
Note: change in this patch won't affect any existing working cmdlines
but allows more incomplete configs to be valid.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-6-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
We are currently using maxcpus to calculate the omitted sockets
but using cpus to calculate the omitted cores/threads. This makes
cmdlines like:
-smp cpus=8,maxcpus=16
-smp cpus=8,cores=4,maxcpus=16
-smp cpus=8,threads=2,maxcpus=16
work fine but the ones like:
-smp cpus=8,sockets=2,maxcpus=16
-smp cpus=8,sockets=2,cores=4,maxcpus=16
-smp cpus=8,sockets=2,threads=2,maxcpus=16
break the sanity check.
Since we require for a valid config that the product of "sockets * cores
* threads" should equal to the maxcpus, we should uniformly use maxcpus
to calculate their omitted values.
Also the if-branch of "cpus == 0 || sockets == 0" was split into two
branches of "cpus == 0" and "sockets == 0" so that we can clearly read
that we are parsing the configuration with a preference on cpus over
sockets over cores over threads.
Note: change in this patch won't affect any existing working cmdlines
but improves consistency and allows more incomplete configs to be valid.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-5-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
To pave the way for the functional improvement in later patches,
make some refactor/cleanup for the smp parsers, including using
local maxcpus instead of ms->smp.max_cpus in the calculation,
defaulting dies to 0 initially like other members, cleanup the
sanity check for dies.
We actually also fix a hidden defect by avoiding directly using
the provided *zero value* in the calculation, which could cause
a segment fault (e.g. using dies=0 in the calculation).
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210929025816.21076-4-wangyanan55@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Provide a name field for all the memory listeners. It can be used to identify
which memory listener is which.
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210817013553.30584-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Libvirt can use query-sgx-capabilities to get the host
sgx capabilities to decide how to allocate SGX EPC size to VM.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210910102258.46648-3-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The QMP and HMP interfaces can be used by monitor or QMP tools to retrieve
the SGX information from VM side when SGX is enabled on Intel platform.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210910102258.46648-2-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since there is no fill_device_info() callback support, and when we
execute "info memory-devices" command in the monitor, the segfault
will be found.
This patch will add this callback support and "info memory-devices"
will show sgx epc memory exposed to guest. The result as below:
qemu) info memory-devices
Memory device [sgx-epc]: ""
memaddr: 0x180000000
size: 29360128
memdev: /objects/mem1
Memory device [sgx-epc]: ""
memaddr: 0x181c00000
size: 10485760
memdev: /objects/mem2
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-33-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable SGX EPC virtualization, which is currently only support by KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-22-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable SGX EPC virtualization, which is currently only support by KVM.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-21-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The ACPI Device entry for SGX EPC is essentially a hack whose primary
purpose is to provide software with a way to autoprobe SGX support,
e.g. to allow software to implement SGX support as a driver. Details
on the individual EPC sections are not enumerated through ACPI tables,
i.e. software must enumerate the EPC sections via CPUID. Furthermore,
software expects to see only a single EPC Device in the ACPI tables
regardless of the number of EPC sections in the system.
However, several versions of Windows do rely on the ACPI tables to
enumerate the address and size of the EPC. So, regardless of the number
of EPC sections exposed to the guest, create exactly *one* EPC device
with a _CRS entry that spans the entirety of all EPC sections (which are
guaranteed to be contiguous in Qemu).
Note, NUMA support for EPC memory is intentionally not considered as
enumerating EPC NUMA information is not yet defined for bare metal.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-20-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Note that SGX EPC is currently guaranteed to reside in a single
contiguous chunk of memory regardless of the number of EPC sections.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-19-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add helpers to detect if SGX EPC exists above 4g, and if so, where SGX
EPC above 4g ends. Use the helpers to adjust the device memory range
if SGX EPC exists above 4g.
For multiple virtual EPC sections, we just put them together physically
contiguous for the simplicity because we don't support EPC NUMA affinity
now. Once the SGX EPC NUMA support in the kernel SGX driver, we will
support this in the future.
Note that SGX EPC is currently hardcoded to reside above 4g.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-18-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Request SGX an SGX Launch Control to be enabled in FEATURE_CONTROL
when the features are exposed to the guest. Our design is the SGX
Launch Control bit will be unconditionally set in FEATURE_CONTROL,
which is unlike host bios.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-17-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose SGX to the guest if and only if KVM is enabled and supports
virtualization of SGX. While the majority of ENCLS can be emulated to
some degree, because SGX uses a hardware-based root of trust, the
attestation aspects of SGX cannot be emulated in software, i.e.
ultimately emulation will fail as software cannot generate a valid
quote/report. The complexity of partially emulating SGX in Qemu far
outweighs the value added, e.g. an SGX specific simulator for userspace
applications can emulate SGX for development and testing purposes.
Note, access to the PROVISIONKEY is not yet advertised to the guest as
KVM blocks access to the PROVISIONKEY by default and requires userspace
to provide additional credentials (via ioctl()) to expose PROVISIONKEY.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-13-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because SGX EPC is enumerated through CPUID, EPC "devices" need to be
realized prior to realizing the vCPUs themselves, i.e. long before
generic devices are parsed and realized. From a virtualization
perspective, the CPUID aspect also means that EPC sections cannot be
hotplugged without paravirtualizing the guest kernel (hardware does
not support hotplugging as EPC sections must be locked down during
pre-boot to provide EPC's security properties).
So even though EPC sections could be realized through the generic
-devices command, they need to be created much earlier for them to
actually be usable by the guest. Place all EPC sections in a
contiguous block, somewhat arbitrarily starting after RAM above 4g.
Ensuring EPC is in a contiguous region simplifies calculations, e.g.
device memory base, PCI hole, etc..., allows dynamic calculation of the
total EPC size, e.g. exposing EPC to guests does not require -maxmem,
and last but not least allows all of EPC to be enumerated in a single
ACPI entry, which is expected by some kernels, e.g. Windows 7 and 8.
The new compound properties command for sgx like below:
......
-object memory-backend-epc,id=mem1,size=28M,prealloc=on \
-object memory-backend-epc,id=mem2,size=10M \
-M sgx-epc.0.memdev=mem1,sgx-epc.1.memdev=mem2
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-6-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
SGX EPC is enumerated through CPUID, i.e. EPC "devices" need to be
realized prior to realizing the vCPUs themselves, which occurs long
before generic devices are parsed and realized. Because of this,
do not allow 'sgx-epc' devices to be instantiated after vCPUS have
been created.
The 'sgx-epc' device is essentially a placholder at this time, it will
be fully implemented in a future patch along with a dedicated command
to create 'sgx-epc' devices.
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-5-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add new CONFIG_SGX for sgx support in the Qemu, and the Kconfig
default enable sgx in the i386 platform.
Signed-off-by: Yang Zhong <yang.zhong@intel.com>
Message-Id: <20210719112136.57018-32-yang.zhong@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop abs64() and use uabs64() from host-utils, which avoids
an undefined behavior when taking abs of the most negative value.
Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210910112624.72748-5-luis.pires@eldorado.org.br>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
The PC_ROM_* definitions are only used by the PC machine,
and are irrelevant to the other architectures / machines.
Reduce their scope by moving them to hw/i386/pc.c.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20210917185949.2244956-1-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Since commits aa57020774 ("numa: move numa global variable
nb_numa_nodes into MachineState") and 7e721e7b10 ("numa: move
numa global variable numa_info into MachineState"), we can get
NUMA information completely from MachineState::numa_state.
Remove PCMachineState::numa_nodes and PCMachineState::node_mem,
since they are just copied from MachineState::numa_state.
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jingqi Liu <jingqi.liu@intel.com>
Message-Id: <20210823011254.28506-1-jingqi.liu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
OBJECT_CHECK(PciHostState, ..., TYPE_PCI_HOST_BRIDGE) is exactly
what the PCI_HOST_BRIDGE macro does. We can just use the macro
instead of using OBJECT_CHECK manually.
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210805193431.307761-7-ehabkost@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that we have "acpi-pci-hotplug-with-bridge-support" PIIX4 PM property being
used for both q35 and i440fx machine types, it is better that we defined this
property string at a single place within a header file like other PIIX4
properties. We can then use this single definition at all the places that needs
it instead of duplicating the string everywhere. While at it, this change also
adds a definition for "acpi-root-pci-hotplug" PIIX4 PM property and uses
this definition at all places that were formally using the string value.
Signed-off-by: Ani Sinha <ani@anisinha.ca>
Message-Id: <20210816083214.105740-1-ani@anisinha.ca>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add 6.2 machine types for arm/i440fx/q35/s390x/spapr.
Signed-off-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
arch_init.h only defines the QEMU_ARCH_* enumeration and the
arch_type global. Don't include it in files that don't use those.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20210730105947.28215-8-peter.maydell@linaro.org
Commit [1] switched PCI hotplug from native to ACPI one by default.
That however breaks hotplug on following CLI that used to work:
-nodefaults -machine q35 \
-device pcie-root-port,id=pcie-root-port-0,multifunction=on,bus=pcie.0,addr=0x1,chassis=1 \
-device pcie-root-port,id=pcie-root-port-1,port=0x1,addr=0x1.0x1,bus=pcie.0,chassis=2
where PCI device is hotplugged to pcie-root-port-1 with error on guest side:
ACPI BIOS Error (bug): Could not resolve symbol [^S0B.PCNT], AE_NOT_FOUND (20201113/psargs-330)
ACPI Error: Aborting method \_SB.PCI0.PCNT due to previous error (AE_NOT_FOUND) (20201113/psparse-531)
ACPI Error: Aborting method \_GPE._E01 due to previous error (AE_NOT_FOUND) (20201113/psparse-531)
ACPI Error: AE_NOT_FOUND, while evaluating GPE method [_E01] (20201113/evgpe-515)
cause is that QEMU's ACPI hotplug never supported functions other then 0
and due to bug it was generating notification entries for not described
functions.
Technically there is no reason not to describe cold-plugged bridges
(root ports) on functions other then 0, as they similarly to bridge
on function 0 are unpluggable.
So since we need to describe multifunction devices iterate over
fuctions as well. But describe only cold-plugged bridges[root ports]
on functions other than 0 as well.
1)
Fixes: 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reported-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20210723090424.2092226-1-imammedo@redhat.com>
Fixes: 17858a1695 (hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35)<br>
Signed-off-by: Igor Mammedov <<a href="mailto:imammedo@redhat.com" target="_blank">imammedo@redhat.com</a>><br>
Reported-by: Laurent Vivier <<a href="mailto:lvivier@redhat.com" target="_blank">lvivier@redhat.com</a>><br>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
When building the 'microvm' machine stand-alone we get:
$ qemu-system-x86_64 -M microvm
**
ERROR:qom/object.c:714:object_new_with_type: assertion failed: (type != NULL)
Bail out! ERROR:qom/object.c:714:object_new_with_type: assertion failed: (type != NULL)
Aborted (core dumped)
Looking at the backtrace:
(gdb) bt
#3 0x00007ff2330492ff in g_assertion_message_expr () at /lib64/libglib-2.0.so.0
#4 0x000055a878c18341 in object_new_with_type (type=<optimized out>) at qom/object.c:714
#5 0x000055a878c18399 in object_new (typename=typename@entry=0x55a878dec36a "isa-pit") at qom/object.c:747
#6 0x000055a878cc8146 in qdev_new (name=name@entry=0x55a878dec36a "isa-pit") at hw/core/qdev.c:153
#7 0x000055a878a8b439 in isa_new (name=name@entry=0x55a878dec36a "isa-pit") at hw/isa/isa-bus.c:160
#8 0x000055a878adb782 in i8254_pit_init (base=64, isa_irq=0, alt_irq=0x0, bus=0x55a87ab38760) at include/hw/timer/i8254.h:54
#9 microvm_devices_init (mms=0x55a87ac36800) at hw/i386/microvm.c:263
#10 microvm_machine_state_init (machine=<optimized out>) at hw/i386/microvm.c:471
#11 0x000055a878a944ab in machine_run_board_init (machine=machine@entry=0x55a87ac36800) at hw/core/machine.c:1239
The "isa-pit" type (TYPE_I8254) is missing. Add it.
Fixes: 0ebf007dda ("hw/i386: Introduce the microvm machine type")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210616204328.2611406-24-philmd@redhat.com>
Check bypass_iommu to exclude the devices which will bypass iommu.
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Message-Id: <1625748919-52456-9-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In DMAR table, the drhd is set to cover all PCI devices when intel_iommu
is on. To support bypass iommu feature, we need to walk the PCI bus with
bypass_iommu disabled and add explicit scope data in DMAR drhd structure.
/mnt/sdb/wxg/qemu-next/qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-machine q35,accel=kvm,default_bus_bypass_iommu=true \
-cpu host \
-m 16G \
-smp 36,sockets=2,cores=18,threads=1 \
-device pxb-pcie,bus_nr=0x10,id=pci.10,bus=pcie.0,addr=0x3 \
-device pxb-pcie,bus_nr=0x20,id=pci.20,bus=pcie.0,addr=0x4,bypass_iommu=true \
-device pcie-root-port,port=0x1,chassis=1,id=pci.11,bus=pci.10,addr=0x0 \
-device pcie-root-port,port=0x2,chassis=2,id=pci.21,bus=pci.20,addr=0x0 \
-device virtio-scsi-pci,id=scsi0,bus=pci.11,addr=0x0 \
-device virtio-scsi-pci,id=scsi1,bus=pci.21,addr=0x0 \
-drive file=/mnt/sdb/wxg/fedora-48g.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none,aio=native \
-device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 \
-device intel-iommu \
-nographic \
And we get the guest configuration:
~ lspci -vt
-+-[0000:20]---00.0-[21]----00.0 Red Hat, Inc. Virtio SCSI
+-[0000:10]---00.0-[11]----00.0 Red Hat, Inc. Virtio SCSI
\-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
+-01.0 Device 1234:1111
+-02.0 Intel Corporation 82574L Gigabit Network Connection
+-03.0 Red Hat, Inc. QEMU PCIe Expander bridge
+-04.0 Red Hat, Inc. QEMU PCIe Expander bridge
+-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Controller
+-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
\-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller
With bypass_iommu enabled on root bus, the attached devices will bypass iommu:
/sys/class/iommu/dmar0
├── devices
│ ├── 0000:10:00.0 -> ../../../../pci0000:10/0000:10:00.0
│ └── 0000:11:00.0 -> ../../../../pci0000:10/0000:10:00.0/0000:11:00.0
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Message-Id: <1625748919-52456-8-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add a default_bus_bypass_iommu pc machine option to enable/disable
bypass_iommu for default root bus. The option is disabled by default
and can be enabled with:
$QEMU -machine q35,default_bus_bypass_iommu=true
Signed-off-by: Xingang Wang <wangxingang5@huawei.com>
Message-Id: <1625748919-52456-5-git-send-email-wangxingang5@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Q35 has three different types of PCI devices hot-plug: PCIe Native,
SHPC Native and ACPI hot-plug. This patch changes the default choice
for cold-plugged bridges from PCIe Native to ACPI Hot-plug with
ability to use SHPC and PCIe Native for hot-plugged bridges.
This is a list of the PCIe Native hot-plug issues that led to this
change:
* no racy behavior during boot (see 110c477c2e)
* no delay during deleting - after the actual power off software
must wait at least 1 second before indicating about it. This case
is quite important for users, it even has its own bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1594168
* no timer-based behavior - in addition to the previous example,
the attention button has a 5-second waiting period, during which
the operation can be canceled with a second press. While this
looks fine for manual button control, automation will result in
the need to queue or drop events, and the software receiving
events in all sort of unspecified combinations of attention/power
indicator states, which is racy and uppredictable.
* fixes:
* https://bugzilla.redhat.com/show_bug.cgi?id=1752465
* https://bugzilla.redhat.com/show_bug.cgi?id=1690256
To return to PCIe Native hot-plug:
-global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=off
Known issue: older linux guests need the following flag
to allow hotplugged pci express devices to use io:
-device pcie-root-port,io-reserve=4096.
io is unusual for pci express so this seems minor.
We'll fix this by a follow up patch.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210713004205.775386-6-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Instead of changing the hot-plug type in _OSC register, do not
set the 'Hot-Plug Capable' flag. This way guest will choose ACPI
hot-plug if it is preferred and leave the option to use SHPC with
pcie-pci-bridge.
The ability to control hot-plug for each downstream port is retained,
while 'hotplug=off' on the port means all hot-plug types are disabled.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210713004205.775386-4-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add acpi_pcihp to ich9_pm as part of
'acpi-pci-hotplug-with-bridge-support' option. Set default to false.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210713004205.775386-3-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Implement notifications and gpe to support q35 ACPI PCI hot-plug.
Use 0xcc4 - 0xcd7 range for 'acpi-pci-hotplug' io ports.
Signed-off-by: Julia Suvorova <jusual@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Message-Id: <20210713004205.775386-2-jusual@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Introduce the X86_FW_OVMF Kconfig symbol for OVMF-specific code.
Move the OVMF-specific code from pc_sysfw.c to pc_sysfw_ovmf.c,
adding a pair of stubs.
Update MAINTAINERS to reach OVMF maintainers when these new
files are modified.
This fixes when building the microvm machine standalone:
/usr/bin/ld: libqemu-i386-softmmu.fa.p/target_i386_monitor.c.o: in
function `qmp_sev_inject_launch_secret':
target/i386/monitor.c:749: undefined reference to `pc_system_ovmf_table_find'
Fixes: f522cef9b3 ("sev: update sev-inject-launch-secret to make gpa optional")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-Id: <20210616204328.2611406-22-philmd@redhat.com>
Add assertion in pc_system_ovmf_table_find that verifies that the flash
was indeed previously parsed (looking for the OVMF table) by
pc_system_parse_ovmf_flash.
Now pc_system_ovmf_table_find distinguishes between "no one called
pc_system_parse_ovmf_flash" (which will abort due to assertion failure)
and "the flash was parsed but no OVMF table was found, or it is invalid"
(which will return false).
Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Dov Murik <dovmurik@linux.ibm.com>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210701052749.934744-2-dovmurik@linux.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Currently all built-in CPUs report cache information via CPUID leaves 2
and 4, but these have never been defined for AMD. In the case of
SEV-SNP this can cause issues with CPUID enforcement. Address this by
allowing CPU types to suppress these via a new "x-vendor-cpuid-only"
CPU property, which is true by default, but switched off for older
machine types to maintain compatibility.
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: zhenwei pi <pizhenwei@bytedance.com>
Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Michael Roth <michael.roth@amd.com>
Message-Id: <20210708003623.18665-1-michael.roth@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Commit [1] moved _SUN variable from only hot-pluggable to
all devices. This made linux kernel enumerate extra slots
that weren't present before. If extra slot happens to be
be enumerated first and there is a device in th same slot
but on other bridge, linux kernel will add -N suffix to
slot name of the later, thus changing NIC name compared to
QEMU 5.2. This in some case confuses systemd, if it is
using SLOT NIC naming scheme and interface name becomes
not the same as it was under QEMU-5.2.
Reproducer QEMU CLI:
-M pc-i440fx-5.2 -nodefaults \
-device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x3 \
-device virtio-net-pci,id=nic1,bus=pci.1,addr=0x1 \
-device virtio-net-pci,id=nic2,bus=pci.1,addr=0x2 \
-device virtio-net-pci,id=nic3,bus=pci.1,addr=0x3
with RHEL8 guest produces following results:
v5.2:
kernel: virtio_net virtio0 ens1: renamed from eth0
kernel: virtio_net virtio2 ens3: renamed from eth2
kernel: virtio_net virtio1 enp1s2: renamed from eth1
(slot 2 is assigned to empty bus 0 slot and virtio1
is assigned to 2-2 slot, and renaming falls back,
for some reason, to path based naming scheme)
v6.0:
kernel: virtio_net virtio0 ens1: renamed from eth0
kernel: virtio_net virtio2 ens3: renamed from eth2
systemd-udevd[299]: Error changing net interface name 'eth1' to 'ens3': File exists
systemd-udevd[299]: could not rename interface '3' from 'eth1' to 'ens3': File exists
(with commit [1] kernel assigns virtio2 to 3-2 slot
since bridge advertises _SUN=0x3 and kernel assigns
slot 3 to bridge. Still it manages to rename virtio2
correctly to ens3, however systemd gets confused with virtio1
where slot allocation exactly the same (2-2) as in 5.2 case
and tries to rename it to ens3 which is rightfully taken by
virtio2)
I'm not sure what breaks in systemd interface renaming (it probably
should be investigated), but on QEMU side we can safely revert
_SUN to 5.2 behavior (i.e. avoid cold-plugged bridges and non
hot-pluggable device classes), without breaking acpi-index, which uses
slot numbers but it doesn't have to use _SUN, it could use an arbitrary
variable name that has the same slot value).
It will help existing VMs to keep networking with non trivial
configs in working order since systemd will do its interface
renaming magic as it used to do.
1)
Fixes: b7f23f62e4 (pci: acpi: add _DSM method to PCI devices)
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210624204229.998824-3-imammedo@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: John Sucaet <john.sucaet@ekinops.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As part of converting -smp to a property with a QAPI type, define
the struct and use it to do the actual parsing. machine_smp_parse
takes care of doing the QemuOpts->QAPI conversion by hand, for now.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-10-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Clean up the smp_parse functions to use Error** instead of exiting.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-9-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Most of smp_parse and pc_smp_parse is guarded by an "if (opts)"
conditional, and the rest is common to both function. Move the
conditional and the common code to the caller, machine_smp_parse.
Move the replay_add_blocker call after all errors are checked for.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-8-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to make SMP configuration a Machine property, we need a getter as
well as a setter. To simplify the implementation put everything that the
getter needs in the CpuTopology struct.
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210617155308.928754-7-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some machines use floppy controllers via the SysBus interface,
and don't need to pull in all the ISA code.
Extract the ISA specific code to a new unit: fdc-isa.c, and
add a new Kconfig symbol: "FDC_ISA".
This allows us to remove the FIXME from commit dd0ff8191a
("isa: express SuperIO dependencies with Kconfig").
Reviewed-by: John Snow <jsnow@redhat.com>
Acked-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210614193220.2007159-5-philmd@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>
A bus lock is acquired through either split locked access to writeback
(WB) memory or any locked access to non-WB memory. It is typically >1000
cycles slower than an atomic operation within a cache and can also
disrupts performance on other cores.
Virtual Machines can exploit bus locks to degrade the performance of
system. To address this kind of performance DOS attack coming from the
VMs, bus lock VM exit is introduced in KVM and it can report the bus
locks detected in guest. If enabled in KVM, it would exit to the
userspace to let the user enforce throttling policies once bus locks
acquired in VMs.
The availability of bus lock VM exit can be detected through the
KVM_CAP_X86_BUS_LOCK_EXIT. The returned bitmap contains the potential
policies supported by KVM. The field KVM_BUS_LOCK_DETECTION_EXIT in
bitmap is the only supported strategy at present. It indicates that KVM
will exit to userspace to handle the bus locks.
This patch adds a ratelimit on the bus locks acquired in guest as a
mitigation policy.
Introduce a new field "bus_lock_ratelimit" to record the limited speed
of bus locks in the target VM. The user can specify it through the
"bus-lock-ratelimit" as a machine property. In current implementation,
the default value of the speed is 0 per second, which means no
restrictions on the bus locks.
As for ratelimit on detected bus locks, simply set the ratelimit
interval to 1s and restrict the quota of bus lock occurence to the value
of "bus_lock_ratelimit". A potential alternative is to introduce the
time slice as a property which can help the user achieve more precise
control.
The detail of bus lock VM exit can be found in spec:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210521043820.29678-1-chenyi.qiang@intel.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210614191335.1968807-2-stefanb@linux.ibm.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Headers should be included from the 'include/' directory,
not from the root directory.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Message-Id: <20210516205034.694788-1-f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* Bump minimum versions of some requirements after removing CentOS 7 support
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmC3L1IRHHRodXRoQHJl
ZGhhdC5jb20ACgkQLtnXdP5wLbU8wA//aRAIn0Qa3xoZdsT434P99GHauCZ5ePq3
meY4co69c+TfkQZ/b0xlyvT3+7bd9Ni92CQL7n/LtX5bs3pIhRjHNIsm50e4x6Da
0jb422cKWfRIltXlCfdL/dnFRtJluH83M0seRGMvmhveuWPZ19oSIasEiQjeg//e
KMNK5z4HEnJ1czb3Bf8p38bmmY/O/QEAA5wAqk7iJkJHz6T/GlqImLYYwpFPlHj6
JQttm0aWsHrsqEXxnuV0/DT1yHyXDB6S4iuAvABZWhv/M/nCaXo0ib0gW5NPtRPo
Yf3HO163F9/fewJCc1AUsBe1C/2UwmSWRhEtxpr9uuW2Mv9qEl3hkJwd4k6sEOvh
U4i+GONC4eElPcmECKUfHA9EP+7faDs6xnM6Ev/PIEp+cPJ2QRfklZv4qpMUWYtb
3KkADchOyVZAsdB8cGmnznDEVmno1Dt0adVUq8CF6uW6MwD3pb4838arrfwIfwOp
g4yTI1AQQykkxxOaR7GSNoxWRti7TH4fzLhx/xXDd9TKIOWuOepiyuhB7+Q48WAJ
6EW/JUIzOe7k3GsI0iBsk+y67ED2tpATiRKWWw4QS1BwIhNEMUqgZkucFfKE6Ze1
B1Xw+Di+1CzrLHV6lCjtdLVmqWqcGcO43HqhSK4sodE87BDGGwy9UxE6oBVt2C1x
HPQ6/omDlFk=
=RAkI
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/thuth-gitlab/tags/pull-request-2021-06-02' into staging
* Update the references to some doc files (use *.rst instead of *.txt)
* Bump minimum versions of some requirements after removing CentOS 7 support
# gpg: Signature made Wed 02 Jun 2021 08:12:18 BST
# gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg: issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg: aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5
* remotes/thuth-gitlab/tags/pull-request-2021-06-02:
configure: bump min required CLang to 6.0 / XCode 10.0
configure: bump min required GCC to 7.5.0
configure: bump min required glib version to 2.56
tests/docker: drop CentOS 7 container
tests/vm: convert centos VM recipe to CentOS 8
crypto: drop used conditional check
crypto: bump min gnutls to 3.5.18, dropping RHEL-7 support
crypto: bump min gcrypt to 1.8.0, dropping RHEL-7 support
crypto: drop back compatibility typedefs for nettle
crypto: bump min nettle to 3.4, dropping RHEL-7 support
patchew: move quick build job from CentOS 7 to CentOS 8 container
block/ssh: Bump minimum libssh version to 0.8.7
docs: fix references to docs/devel/s390-dasd-ipl.rst
docs: fix references to docs/specs/tpm.rst
docs: fix references to docs/devel/build-system.rst
docs: fix references to docs/devel/atomics.rst
docs: fix references to docs/devel/tracing.rst
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Commit e50caf4a5c ("tracing: convert documentation to rST")
converted docs/devel/tracing.txt to docs/devel/tracing.rst.
We still have several references to the old file, so let's fix them
with the following command:
sed -i s/tracing.txt/tracing.rst/ $(git grep -l docs/devel/tracing.txt)
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210517151702.109066-2-sgarzare@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
The 'qemu64' CPUID currently reports a family/model/stepping that
approximately corresponds to an AMD K7 vintage architecture.
The K7 series predates the introduction of 64-bit support by AMD
in the K8 series. This has been reported to lead to LLVM complaints
about generating 64-bit code for a 32-bit CPU target
LLVM ERROR: 64-bit code requested on a subtarget that doesn't support it!
It appears LLVM looks at the family/model/stepping, despite qemu64
reporting it is 64-bit capable.
This patch changes 'qemu64' to report a CPUID with the family, model
and stepping taken from a
AMD Athlon(tm) 64 X2 Dual Core Processor 4000+
which is one of the first 64-bit AMD CPUs.
Closes https://gitlab.com/qemu-project/qemu/-/issues/191
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210507133650.645526-2-berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Per the kconfig.rst:
A device should be listed [...] ``imply`` if (depending on
the QEMU command line) the board may or may not be started
without it.
This is the case with the NVDIMM device, so use the 'imply'
weak reverse dependency to select the symbol.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20210511155354.3069141-2-philmd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes all over the place. Faster boot for virtio. ioeventfd support for
mmio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmCeiMEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpqsIH/A49Av5Bv8huL75lf9GzCx3E1a/z2W9Fphik
OcQ1ahR+7CRDARub+vTG40MBmZBVefIWjLAj3BwBWzFGPX0DZq0zeI102VzlEVKY
OeUx8ixuiKOSLcS+QxE7ZXIBL2Pn7l+MFUi4nLMYKti7c/kola7zlB57qsmXh+VD
AOQ7Utj6NWoi6QocWJsMSCyHCh3Fk9QzcStLlr6/MkSJa1zqv8l22+8oWH07Fk2M
wZfhrm9k094on28iSejsFYL5e4ROeXUajbOdfyMIxWvAB7boC9Jxk/e0oAbuSB4y
2f71Gfk3mU6irS7PvrxcKbk6BVD2zxM2WumOchZJgxFAujDO6yg=
=fvkT
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
pc,pci,virtio: bugfixes, improvements
Fixes all over the place. Faster boot for virtio. ioeventfd support for
mmio.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 14 May 2021 15:27:13 BST
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream:
Fix build with 64 bits time_t
vhost-vdpa: Make vhost_vdpa_get_device_id() static
hw/virtio: enable ioeventfd configuring for mmio
hw/smbios: support for type 41 (onboard devices extended information)
checkpatch: Fix use of uninitialized value
virtio-scsi: Configure all host notifiers in a single MR transaction
virtio-scsi: Set host notifiers and callbacks separately
virtio-blk: Configure all host notifiers in a single MR transaction
virtio-blk: Fix rollback path in virtio_blk_data_plane_start()
pc-dimm: remove unnecessary get_vmstate_memory_region() method
amd_iommu: fix wrong MMIO operations
virtio-net: Constify VirtIOFeature feature_sizes[]
virtio-blk: Constify VirtIOFeature feature_sizes[]
hw/virtio: Pass virtio_feature_get_config_size() a const argument
x86: acpi: use offset instead of pointer when using build_header()
amd_iommu: Fix pte_override_page_mask()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/arm/virt.c
Type 41 defines the attributes of devices that are onboard. The
original intent was to imply the BIOS had some level of control over
the enablement of the associated devices.
If network devices are present in this table, by default, udev will
name the corresponding interfaces enoX, X being the instance number.
Without such information, udev will fallback to using the PCI ID and
this usually gives ens3 or ens4. This can be a bit annoying as the
name of the network card may depend on the order of options and may
change if a new PCI device is added earlier on the commande line.
Being able to provide SMBIOS type 41 entry ensure the name of the
interface won't change and helps the user guess the right name without
booting a first time.
This can be invoked with:
$QEMU -netdev user,id=internet
-device virtio-net-pci,mac=50:54:00:00:00:42,netdev=internet,id=internet-dev \
-smbios type=41,designation='Onboard LAN',instance=1,kind=ethernet,pcidev=internet-dev
The PCI segment is assumed to be 0. This should hold true for most
cases.
$ dmidecode -t 41
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x2900, DMI type 41, 11 bytes
Onboard Device
Reference Designation: Onboard LAN
Type: Ethernet
Status: Enabled
Type Instance: 1
Bus Address: 0000:00:09.0
$ ip -brief a
lo UNKNOWN 127.0.0.1/8 ::1/128
eno1 UP 10.0.2.14/24 fec0::5254:ff:fe00:42/64 fe80::5254:ff:fe00:42/64
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Message-Id: <20210401171138.62970-1-vincent@bernat.ch>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Address was swapped with value when writing MMIO registers, so the user
saw garbage in lot of cases. The interrupt status was not correctly set.
Signed-off-by: Roman Kapl <rka@sysgo.com>
Message-Id: <20210427110504.10878-1-rka@sysgo.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Ram block notifiers are currently not aware of resizes. To properly
handle resizes during migration, we want to teach ram block notifiers about
resizeable ram.
Introduce the basic infrastructure but keep using max_size in the
existing notifiers. Supply the max_size when adding and removing ram
blocks. Also, notify on resizes.
Acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Cc: xen-devel@lists.xenproject.org
Cc: haxm-team@intel.com
Cc: Paul Durrant <paul@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Wenchao Wang <wenchao.wang@intel.com>
Cc: Colin Xu <colin.xu@intel.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210429112708.12291-3-david@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* i386 page walk unification
* Fix detection of gdbus-codegen
* Misc refactoring
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCblEEUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroObnQgAj10pRDODY9hIiUiYj2sTcEQTly3p
DC+ZWDaup67z3WV2C/vAS/x31RGIus+7bzji3fgtUcnGOr7sbuOCcs7CPY8mam5Y
GMPNrsUk2sZ5z9SVTq2vjEa61tjxtMpYXx9pnhgJzJAO4NJzNuX74ZdpA+oV5aTC
CvZDk8lC7BLU16MfeLcbw44xE4Oy05wWwaoP2pvhdOg47y85t/S9Il1yBCYi3y8C
pTOBBCYmHGPj/r7i4MhUGrAjIyGQu1w7av8nZXouRegoeVl28RKR8+pl7TfFpt+E
cp95yE8dPuNFJnCiZ3Kv01eBcSUcyp4gVb5H2Oa/nkkYRLpnONbUzYFtIA==
=zR5U
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into staging
* AccelCPUClass and sysemu/user split for i386 (Claudio)
* i386 page walk unification
* Fix detection of gdbus-codegen
* Misc refactoring
# gpg: Signature made Wed 12 May 2021 09:39:29 BST
# gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg: issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1
# Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83
* remotes/bonzini-gitlab/tags/for-upstream: (32 commits)
coverity-scan: list components, move model to scripts/coverity-scan
configure: fix detection of gdbus-codegen
qemu-option: support accept-any QemuOptsList in qemu_opts_absorb_qdict
main-loop: remove dead code
target/i386: use mmu_translate for NPT walk
target/i386: allow customizing the next phase of the translation
target/i386: extend pg_mode to more CR0 and CR4 bits
target/i386: pass cr3 to mmu_translate
target/i386: extract mmu_translate
target/i386: move paging mode constants from SVM to cpu.h
target/i386: merge SVM_NPTEXIT_* with PF_ERROR_* constants
accel: add init_accel_cpu for adapting accel behavior to CPU type
accel: move call to accel_init_interfaces
i386: make cpu_load_efer sysemu-only
target/i386: gdbstub: only write CR0/CR2/CR3/EFER for sysemu
target/i386: gdbstub: introduce aux functions to read/write CS64 regs
i386: split off sysemu part of cpu.c
i386: split seg_helper into user-only and sysemu parts
i386: split svm_helper into sysemu and stub-only user
i386: separate fpu_helper sysemu-only parts
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
i386 is the first user of AccelCPUClass, allowing to split
cpu.c into:
cpu.c cpuid and common x86 cpu functionality
host-cpu.c host x86 cpu functions and "host" cpu type
kvm/kvm-cpu.c KVM x86 AccelCPUClass
hvf/hvf-cpu.c HVF x86 AccelCPUClass
tcg/tcg-cpu.c TCG x86 AccelCPUClass
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[claudio]:
Rebased on commit b8184135 ("target/i386: allow modifying TCG phys-addr-bits")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
Message-Id: <20210322132800.7470-5-cfontana@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Because Coverity complains about it and this is one leak that Valgrind
reports.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Paul Durrant <paul@xen.org>
Message-Id: <20210430163742.469739-1-anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
When we're replacing the existing mapping there is possibility of a race
on memory map with other threads doing mmap operations - the address being
unmapped/re-mapped could be occupied by another thread in between.
Linux mmap man page recommends keeping the existing mappings in place to
reserve the place and instead utilize the fact that the next mmap operation
with MAP_FIXED flag passed will implicitly destroy the existing mappings
behind the chosen address. This behavior is guaranteed by POSIX / BSD and
therefore is portable.
Note that it wouldn't make the replacement atomic for parallel accesses to
the replaced region - those might still fail with SIGBUS due to
xenforeignmemory_map not being atomic. So we're still not expecting those.
Tested-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <1618889702-13104-1-git-send-email-igor.druzhinin@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Do the same as in commit
(4d027afeb3 Virt: ACPI: fix qemu assert due to re-assigned table data address)
for remaining tables that happen to use saved at
the beginning pointer to build header to avoid assert
when table_data is relocated due to implicit re-size.
In this case user is trying to start Windows 10 and getting assert at
hw/acpi/bios-linker-loader.c:239:
bios_linker_loader_add_checksum: Assertion `start_offset < file->blob->len' failed.
Fixes: https://bugs.launchpad.net/bugs/1923497
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210414084356.3792113-1-imammedo@redhat.com>
Cc: mst@redhat.com, qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
AMD IOMMU PTEs have a special mode allowing to specify an arbitrary page
size. Quoting the AMD IOMMU specification: "When the Next Level bits [of
a pte] are 7h, the size of the page is determined by the first zero bit
in the page address, starting from bit 12."
So if the lowest bits of the page address is 0, the page is 8kB. If the
lowest bits are 011, the page is 32kB. Currently pte_override_page_mask()
doesn't compute the right value for this page size and amdvi_translate()
can return the wrong guest-physical address. With a Linux guest, DMA
from SATA devices accesses the wrong memory and causes probe failure:
qemu-system-x86_64 ... -device amd-iommu -drive id=hd1,file=foo.bin,if=none \
-device ahci,id=ahci -device ide-hd,drive=hd1,bus=ahci.0
[ 6.613093] ata1.00: qc timeout (cmd 0xec)
[ 6.615062] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Fix the page mask.
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20210421084007.1190546-1-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stop including exec/address-spaces.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-5-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including cpu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-4-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including hw/boards.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-3-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Stop including sysemu/sysemu.h in files that don't need it.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20210416171314.2074665-2-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Many files include hw/irq.h without needing it. Remove the superfluous
include statements.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20210327050236.2232347-1-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
The code that sets/gets oem fields is duplicated in both PC and MICROVM
variants. This commit moves it to X86MachineState so that all x86
variants can use it and duplication is removed.
Signed-off-by: Marian Postevca <posteuca@mutex.one>
Message-Id: <20210221001737.24499-2-posteuca@mutex.one>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We want to have safety margins for all tables based on the table type.
Let's move the maximum size logic into acpi_add_rom_blob() and make it
dependent on the table name, so we don't have to replicate for each and
every instance that creates such tables.
Suggested-by: Laszlo Ersek <lersek@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-4-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Let's just reuse ACPI_BUILD_LOADER_FILE.
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-3-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The resizeable memory region / RAMBlock that is created for the cmd blob
has a maximum size of whole host pages (e.g., 4k), because RAMBlocks
work on full host pages. In addition, in i386 ACPI code:
acpi_align_size(tables->linker->cmd_blob, ACPI_BUILD_ALIGN_SIZE);
makes sure to align to multiples of 4k, padding with 0.
For example, if our cmd_blob is created with a size of 2k, the maximum
size is 4k - we cannot grow beyond that. Growing might be required
due to guest action when rebuilding the tables, but also on incoming
migration.
This automatic generation of the maximum size used to be sufficient,
however, there are cases where we cross host pages now when growing at
runtime: we exceed the maximum size of the RAMBlock and can crash QEMU when
trying to resize the resizeable memory region / RAMBlock:
$ build/qemu-system-x86_64 --enable-kvm \
-machine q35,nvdimm=on \
-smp 1 \
-cpu host \
-m size=2G,slots=8,maxmem=4G \
-object memory-backend-file,id=mem0,mem-path=/tmp/nvdimm,size=256M \
-device nvdimm,label-size=131072,memdev=mem0,id=nvdimm0,slot=1 \
-nodefaults \
-device vmgenid \
-device intel-iommu
Results in:
Unexpected error in qemu_ram_resize() at ../softmmu/physmem.c:1850:
qemu-system-x86_64: Size too large: /rom@etc/table-loader:
0x2000 > 0x1000: Invalid argument
In this configuration, we consume exactly 4k (32 entries, 128 bytes each)
when creating the VM. However, once the guest boots up and maps the MCFG,
we also create the MCFG table and end up consuming 2 additional entries
(pointer + checksum) -- which is where we try resizing the memory region
/ RAMBlock, however, the maximum size does not allow for it.
Currently, we get the following maximum sizes for our different
mutable tables based on behavior of resizeable RAMBlock:
hw table max_size
------- ---------------------------------------------------------
virt "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
virt "etc/table-loader" HOST_PAGE_ALIGN(initial_size)
virt "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size)
i386 "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
i386 "etc/table-loader" HOST_PAGE_ALIGN(initial_size)
i386 "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size)
microvm "etc/acpi/tables" ACPI_BUILD_TABLE_MAX_SIZE (0x200000)
microvm "etc/table-loader" HOST_PAGE_ALIGN(initial_size)
microvm "etc/acpi/rsdp" HOST_PAGE_ALIGN(initial_size)
Let's set the maximum table size for "etc/table-loader" to 64k, so we
can properly grow at runtime, which should be good enough for the future.
Migration is not concerned with the maximum size of a RAMBlock, only
with the used size - so existing setups are not affected. Of course, we
cannot migrate a VM that would have crash when started on older QEMU from
new QEMU to older QEMU without failing early on the destination when
synchronizing the RAM state:
qemu-system-x86_64: Size too large: /rom@etc/table-loader: 0x2000 > 0x1000: Invalid argument
qemu-system-x86_64: error while loading state for instance 0x0 of device 'ram'
qemu-system-x86_64: load of migration failed: Invalid argument
We'll refactor the code next, to make sure we get rid of this implicit
behavior for "etc/acpi/rsdp" as well and to make the code easier to
grasp.
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Cc: Alistair Francis <alistair.francis@xilinx.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Shannon Zhao <shannon.zhaosl@gmail.com>
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210304105554.121674-2-david@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Implement _DSM according to:
PCI Firmware Specification 3.1
4.6.7. DSM for Naming a PCI or PCI Express Device Under
Operating Systems
and wire it up to cold and hot-plugged PCI devices.
Feature depends on ACPI hotplug being enabled (as that provides
PCI devices descriptions in ACPI and MMIO registers that are
reused to fetch acpi-index).
acpi-index should work for
- cold plugged NICs:
$QEMU -device e1000,acpi-index=100
=> 'eno100'
- hot-plugged
(monitor) device_add e1000,acpi-index=200,id=remove_me
=> 'eno200'
- re-plugged
(monitor) device_del remove_me
(monitor) device_add e1000,acpi-index=1
=> 'eno1'
Windows also sees index under "PCI Label Id" field in properties
dialog but otherwise it doesn't seem to have any effect.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-6-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In x86/ACPI world, linux distros are using predictable
network interface naming since systemd v197. Which on
QEMU based VMs results into path based naming scheme,
that names network interfaces based on PCI topology.
With itm on has to plug NIC in exactly the same bus/slot,
which was used when disk image was first provisioned/configured
or one risks to loose network configuration due to NIC being
renamed to actually used topology.
That also restricts freedom to reshape PCI configuration of
VM without need to reconfigure used guest image.
systemd also offers "onboard" naming scheme which is
preferred over PCI slot/topology one, provided that
firmware implements:
"
PCI Firmware Specification 3.1
4.6.7. DSM for Naming a PCI or PCI Express Device Under
Operating Systems
"
that allows to assign user defined index to PCI device,
which systemd will use to name NIC. For example, using
-device e1000,acpi-index=100
guest will rename NIC to 'eno100', where 'eno' is default
prefix for "onboard" naming scheme. This doesn't require
any advance configuration on guest side to com in effect
at 'onboard' scheme takes priority over path based naming.
Hope is that 'acpi-index' it will be easier to consume by
management layer, compared to forcing specific PCI topology
and/or having several disk image templates for different
topologies and will help to simplify process of spawning
VM from the same template without need to reconfigure
guest NIC.
This patch adds, 'acpi-index'* property and wires up
a 32bit register on top of pci hotplug register block
to pass index value to AML code at runtime.
Following patch will add corresponding _DSM code and
wire it up to PCI devices described in ACPI.
*) name comes from linux kernel terminology
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20210315180102.3008391-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The 'scsi-hd' and 'scsi-cd' devices provide suitable alternatives.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
The 'ide-hd' and 'ide-cd' devices provide suitable alternatives.
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Currently get_naturally_aligned_size() is used by the intel iommu
to compute the maximum invalidation range based on @size which is
a power of 2 while being aligned with the @start address and less
than the maximum range defined by @gaw.
This helper is also useful for other iommu devices (virtio-iommu,
SMMUv3) to make sure IOMMU UNMAP notifiers only are called with
power of 2 range sizes.
Let's move this latter into dma-helpers.c and rename it into
dma_aligned_pow2_mask(). Also rewrite the helper so that it
accomodates UINT64_MAX values for the size mask and max mask.
It now returns a mask instead of a size. Change the caller.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-id: 20210309102742.30442-3-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
With -Werror=maybe-uninitialized configuration we get
../hw/i386/intel_iommu.c: In function ‘vtd_context_device_invalidate’:
../hw/i386/intel_iommu.c:1888:10: error: ‘mask’ may be used
uninitialized in this function [-Werror=maybe-uninitialized]
1888 | mask = ~mask;
| ~~~~~^~~~~~~
Add a g_assert_not_reached() to avoid the error.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20210309102742.30442-2-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The 'running' argument from VMChangeStateHandler does not require
other value than 0 / 1. Make it a plain boolean.
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Message-Id: <20210111152020.1422021-3-philmd@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
There are 23 files that include the "sysemu/qtest.h",
but they do not use any qtest functions.
Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20210226081414.205946-1-kuhn.chenqun@huawei.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>